Ground truth validation
Validation now runs primarily on a locked hold-out of historical events, scored by a frozen oracle. Live market predictions are a secondary signal — hash-locked before publication and labeled by reflexivity class. Everything is scored by Brier (lower is better).
26
hold-out events scored
0.291
ensemble Brier (hold-out)
1
market predictions resolved
1/1
beat market
Primary validation · frozen-scored hold-out
Scored by the frozen oracle on a locked hold-out of 26 historical events including 10 negative controls (high-stress societies that did NOT collapse). The set is deliberately crisis-skewed, so the real test is resolution (discrimination), not a low average. With ten variants now tried, PBO is 0.70 and the binding number is the EVT-deflated best Brier, which sits exactly at the chance line (0.25) — i.e. no deflated evidence of skill yet. New variants are auto-experimental and excluded from the official ensemble until they pass a pre-registered severe test.
| Model | Family | Brier | Resolution | Neg-ctrl | Tier |
|---|---|---|---|---|---|
| ensemble | equal-weight | 0.291 | 0.096 | 0.068 | T0 |
| pitf_logitexcl. | regime_logit | 0.175 | 0.141 | 0.256 | T2 |
| hierarchical_bayesexcl. | empirical_bayes | 0.219 | 0.084 | 0.278 | T1 |
| hazard_splineexcl. | hazard_spline | 0.220 | 0.095 | 0.405 | T1 |
| conformal_wrapperexcl. | calibration_meta | 0.221 | 0.121 | 0.158 | T1 |
| sdt_turchinexcl. | structural_demographic | 0.230 | 0.173 | 0.222 | T1 |
| train_freq | empirical_frequency | 0.234 | 0.064 | 0.161 | T1 |
| firth_logitexcl. | penalised_logit | 0.269 | 0.192 | 0.297 | T0 |
| gbm_honestexcl. | gradient_boosting | 0.281 | 0.115 | 0.144 | T0 |
| reign_logitexcl. | duration_logit | 0.330 | 0.060 | 0.209 | T0 |
| null_baseline | null | 0.370 | 0.095 | 0.038 | T0 |
The headline is a falsification and a lead, not a victory. The fixed-prior pitf_logit posts the lowest raw Brier (0.175) but its pre-registered F1 ablation FAILED a third time as feature coverage widened — so the fixed-beta PITF hypothesis is FALSIFIED, and that low Brier is a calibration artifact (its discrimination dropped and its negative-control error rose). The genuinely interesting result is firth_logit: the SAME PITF channels, but with coefficients fitted on the country-year panel, reach the board's highest discrimination ever (resolution 0.192) — the Philosopher-sanctioned re-entry route, now awaiting its own pre-registered severe test. The admitted ensemble (just the two baselines) stays conservative on purpose. The binding honest number: EVT-deflated best Brier = 0.25, exactly chance. No validated skill yet — admission flows only through the six pre-registered gates.
Secondary signal
Published pre-resolution and labeled by reflexivity class. Live markets are reflexive — predicting them can change them — so they are a secondary check, not the primary validation. The single resolved bet so far (N=1) is not statistically meaningful on its own.
01
Structural events only
The formula prices collective and structural phenomena — regime transitions, economic phase shifts, war dynamics, institutional collapse. It cannot predict individual behavior or personal decisions.
02
Hash-locked & labeled
The probability is hash-locked and pre-registered before the event resolves, then published labeled by reflexivity class. No retroactive changes.
03
Brier score calculated
When the market resolves, both predictions are scored. Brier = (prediction - outcome)². Lower is better. Market Brier > our Brier = win.
| Market | Our P | Market P | Confidence | Date Made | Status |
|---|---|---|---|---|---|
Will the UK exit NATO by June 30, 2026? Formula: 2.5% [0.5%, 8%]. Market started at 10%, drifted to 4.75% (S17), then through 4.65% / 4.60% / 4.55% / 4.65% / 4.60% / 3.20% / 3.40% / 3.15% over Sessions 18-26. Best convergence reading 79.06% at Session 26 (second-best in series history). API_BLOCKED at Session 27 and Session 28; live signal carries forward from Session 26. NDAA Section 1250A and 180-day notification requirement make structural exit physically implausible at this horizon. Direction CORRECT across the full S15-S26 live-signal arc. | 2.5% | 4.8% | HIGH — formula predicts structural stability of institutional anchors in Zone 3 OECD states | April 7, 2026 | Pending |
Will Keir Starmer be out as UK Prime Minister by December 31, 2026? Formula: 58% [40%, 73%]. UK May 7-8 local elections occurred during the API-blocked window. Pre-registered 24-72h post-vote re-score window expired 2026-05-11 with direct Polymarket API still blocked at the sandbox boundary (sessions 27-34 all blocked on `*.polymarket.com` DNS). Session 33 indirect WebSearch fallback recovered a Starmer-OUT trading band ≈ [0.49, 0.69] on a $18M volume — implying Starmer-IN ≈ [0.31, 0.51]. Project estimate (0.58) is now +7pp above the upper bound to +27pp above the lower bound, but does NOT cross the formula CI lower bound (0.40), so the soft warning flag did not escalate. Live time-series resolution permanently lost for the catalyst window; conversion from live test to historical-price backfill conditional on operator-level API allowlist edit. | 58.0% | 65.0% | LOW — qualitative institutional-constraint assessment. Zone 3 OECD — PSI validity 0.25. | April 16, 2026 | Pending |
Will a Russia-Ukraine ceasefire be officially announced before January 1, 2027? Formula: 42% [22%, 63%] — wide CI reflects low PSI validity in active conflict. War-termination literature (Richardson power law + Weibull duration model): ~38% base rate at this conflict duration. Economic constraint vector for Russia elevated. Uncertainty flag: endogenous US pressure variable not formally modeled. | 42.0% | 38.0% | LOW — PSI validity in active-conflict zones: 0.15. Prediction driven primarily by cliodynamic war-termination base rates and economic-constraint analysis, not full formula composite. | April 19, 2026 | Pending |
Will the IMF declare a global recession for calendar year 2026? Formula: 28% [14%, 44%] — above market consensus at 22%. Key drivers: (1) global trade network fragility elevated post-tariff shock (percolation model: 0.34 fragility index), (2) debt-cycle phase for 7/10 largest economies at late-expansion, (3) Minsky moment probability elevated at 0.19 for 2026. Market may be underweighting tail risk from trade-network cascade effects. | 28.0% | 22.0% | MEDIUM — Econophysics + cliodynamics modules. Global macro-state vector S_t elevated on trade-stress and debt-cycle components. Mean-field approximation applied across G20 economies. | April 19, 2026 | Pending |
Will China launch a military invasion of Taiwan before January 1, 2030? Formula: 12% [4%, 28%]. Drivers: elite overproduction index in PRC elevated (0.019), economic growth deceleration increasing internal pressure, but deterrence calculus (nuclear shadow + US commitment) constrains probability. Structural geopolitical dynamics — formula and market in rough agreement; small divergence driven by formula weighting internal-pressure vector more heavily. | 12.0% | 10.0% | LOW — Long-horizon prediction (3.7 years). Full formula composite active but confidence degrades with horizon: CI [4%, 28%]. Cliodynamics + political-scientist + network-scientist modules primary. | April 19, 2026 | Pending |
Will any EU member state begin formal exit proceedings before January 1, 2030? Formula: 18% [8%, 32%]. Driven by EU institutional fragility analysis: Hungary and Poland have elevated elite overproduction indices (0.017, 0.014) and persistent rule-of-law disputes creating structural exit pressure. Historical base rate for bloc fragmentation at this cohesion level: ~15%. Formula adds 3% above market from network percolation model of EU solidarity bonds — threshold not yet reached but trending upward. | 18.0% | 14.0% | LOW — Long-horizon (3.7 years). Political-scientist + network-scientist modules. PSI validity for Hungary: 0.40; Poland: 0.35. Institutional fragility index elevated in both. | April 19, 2026 | Pending |
Will Brazil experience a significant democratic backsliding event (suspension of legislature or judiciary by executive) before January 1, 2028? Formula: 14% [5%, 27%]. Drivers: (1) post-2022 polarization index elevated at 0.71, (2) military institutional friction persists above pre-2016 baseline, (3) Turchin secular cycle places Brazil in 'integrative phase fragility' zone. Market at 11% — formula adds 3% from structural polarization dynamics. Historical analogues: Turkey 2016, Bolivia 2019 — similar PSI trajectories resolved without backsliding in ~75% of cases. | 14.0% | 11.0% | LOW — Zone 1 (emerging). PSI validity 0.30. Cliodynamics + political-scientist modules. Secular cycle analysis active: Brazil in post-crisis consolidation phase with elevated polarization index. | April 19, 2026 | Pending |
Will China's average annual GDP growth rate fall below 3% over the 2027–2030 period? Formula: 32% [14%, 52%] — above market consensus at 25%. Key structural drivers: (1) real estate sector debt overhang equivalent to ~28% of GDP, consistent with Japan 1991 and Korea 1997 pre-correction trajectories, (2) demographic compression (working-age population decline accelerating from 2025), (3) Minsky moment probability for property sector at 0.31 over the 2026–2028 window. Formula flags: these are macro-structural forces acting on aggregate economic behavior — not a prediction about policy response. | 32.0% | 25.0% | LOW — Long-horizon (4+ years). Econophysics + cliodynamics modules. Debt cycle dynamics and demographic transition vector both active. Wide CI [14%, 52%] reflects horizon uncertainty. | April 19, 2026 | Pending |
Will any G20 member experience a sovereign debt default or restructuring before January 1, 2028? Formula: 22% [10%, 36%]. Structural drivers: Argentina (debt restructuring cycle: 0.62 recurrence probability over 5-year horizon), Turkey (external debt vulnerability index: 0.58), and South Africa (fiscal fragility index: 0.44) are primary contributors to the aggregate probability. Network contagion analysis: sovereign debt percolation threshold not breached but single-node default would elevate systemic risk. Market at 17% — formula adds 5% from debt-cycle phase analysis. | 22.0% | 17.0% | MEDIUM — Econophysics + political-scientist modules. Debt cycle phase analysis across G20. PSI composite stress index elevated for 3 members. Mean-field approximation across sovereign debt network. | April 19, 2026 | Pending |
Will global trade as a share of world GDP fall below its 2010 level by January 1, 2029? Formula: 35% [18%, 52%]. Structural deglobalization pressure: (1) trade network percolation analysis shows fragility index at 0.41 (above 2008 crisis level of 0.38), (2) tariff shock propagation model predicts 8–14% bilateral trade reduction in affected dyads, (3) supply chain reshoring dynamics now self-reinforcing above critical threshold. This is a collective structural phenomenon driven by aggregate firm behavior and state policy interactions — not reducible to any single actor's decision. | 35.0% | 28.0% | LOW — Long-horizon (2.7 years). Econophysics + network-scientist modules. Trade network topology fragility index elevated post-2018. Mean-field analysis across bilateral trade relationships. | April 19, 2026 | Pending |
Will any G7 country experience a mass civil-unrest event (>1M participants in a single protest or protest wave) before January 1, 2028? Formula: 38% [22%, 56%]. Structural drivers: (1) PSI composite elevated in 4/7 G7 members above 0.55 threshold, (2) elite overproduction index averaged across G7 at 0.016 (post-1970 high), (3) Granovetter-threshold simulation with current opinion-dynamics parameters places cascade probability at 0.31 per G7 country per 24-month window, aggregated to 0.38 at least-one-occurrence. Pure collective-action prediction — no individual actor or specific party framing. Replaces withdrawn Trump-impeachment prediction (individual-event, out of scope under v1.0 protocol). | 38.0% | 30.0% | MEDIUM — Long-horizon (~20 months). Comp-sociologist + cliodynamicist + political-scientist modules. Collective-action threshold model + PSI composite across G7 states. | April 19, 2026 | Pending |
| Market | Our P | Market P | Confidence | Date Made | Result |
|---|---|---|---|---|---|
Hungary: Will Tisza win the 2026 parliamentary election? FIRST FORMULA WIN. Market Brier: 0.0729. Our Brier: 0.0400. Elite overproduction + PSI elevation correctly identified structural fragility of Orbán's regime. | 80.0% | 74.0% | HIGH — Zone 2 (industrializing) PSI validity 0.45 | April 7, 2026 | Beat marketBrier: 0.0400 |