A Second Foundation

Ground truth validation

Validation & Predictions

Validation now runs primarily on a locked hold-out of historical events, scored by a frozen oracle. Live market predictions are a secondary signal — hash-locked before publication and labeled by reflexivity class. Everything is scored by Brier (lower is better).

26

hold-out events scored

0.291

ensemble Brier (hold-out)

1

market predictions resolved

1/1

beat market

Primary validation · frozen-scored hold-out

Historical retrodiction leaderboard

Scored by the frozen oracle on a locked hold-out of 26 historical events including 10 negative controls (high-stress societies that did NOT collapse). The set is deliberately crisis-skewed, so the real test is resolution (discrimination), not a low average. With ten variants now tried, PBO is 0.70 and the binding number is the EVT-deflated best Brier, which sits exactly at the chance line (0.25) — i.e. no deflated evidence of skill yet. New variants are auto-experimental and excluded from the official ensemble until they pass a pre-registered severe test.

ModelFamilyBrierResolutionNeg-ctrlTier
ensembleequal-weight0.291
0.096
0.068T0
pitf_logitexcl.regime_logit0.175
0.141
0.256T2
hierarchical_bayesexcl.empirical_bayes0.219
0.084
0.278T1
hazard_splineexcl.hazard_spline0.220
0.095
0.405T1
conformal_wrapperexcl.calibration_meta0.221
0.121
0.158T1
sdt_turchinexcl.structural_demographic0.230
0.173
0.222T1
train_freqempirical_frequency0.234
0.064
0.161T1
firth_logitexcl.penalised_logit0.269
0.192
0.297T0
gbm_honestexcl.gradient_boosting0.281
0.115
0.144T0
reign_logitexcl.duration_logit0.330
0.060
0.209T0
null_baselinenull0.370
0.095
0.038T0
Hold-out events: 26Negative controls: 10Legacy formula: Tier 0Chance line: 0.25Market line: 0.18PBO: 0.70

The headline is a falsification and a lead, not a victory. The fixed-prior pitf_logit posts the lowest raw Brier (0.175) but its pre-registered F1 ablation FAILED a third time as feature coverage widened — so the fixed-beta PITF hypothesis is FALSIFIED, and that low Brier is a calibration artifact (its discrimination dropped and its negative-control error rose). The genuinely interesting result is firth_logit: the SAME PITF channels, but with coefficients fitted on the country-year panel, reach the board's highest discrimination ever (resolution 0.192) — the Philosopher-sanctioned re-entry route, now awaiting its own pre-registered severe test. The admitted ensemble (just the two baselines) stays conservative on purpose. The binding honest number: EVT-deflated best Brier = 0.25, exactly chance. No validated skill yet — admission flows only through the six pre-registered gates.

Secondary signal

Live market predictions

Published pre-resolution and labeled by reflexivity class. Live markets are reflexive — predicting them can change them — so they are a secondary check, not the primary validation. The single resolved bet so far (N=1) is not statistically meaningful on its own.

How the live-market signal is scored

01

Structural events only

The formula prices collective and structural phenomena — regime transitions, economic phase shifts, war dynamics, institutional collapse. It cannot predict individual behavior or personal decisions.

02

Hash-locked & labeled

The probability is hash-locked and pre-registered before the event resolves, then published labeled by reflexivity class. No retroactive changes.

03

Brier score calculated

When the market resolves, both predictions are scored. Brier = (prediction - outcome)². Lower is better. Market Brier > our Brier = win.

Active Predictions (11)

MarketOur PMarket PConfidenceDate MadeStatus
Will the UK exit NATO by June 30, 2026?
Formula: 2.5% [0.5%, 8%]. Market started at 10%, drifted to 4.75% (S17), then through 4.65% / 4.60% / 4.55% / 4.65% / 4.60% / 3.20% / 3.40% / 3.15% over Sessions 18-26. Best convergence reading 79.06% at Session 26 (second-best in series history). API_BLOCKED at Session 27 and Session 28; live signal carries forward from Session 26. NDAA Section 1250A and 180-day notification requirement make structural exit physically implausible at this horizon. Direction CORRECT across the full S15-S26 live-signal arc.
2.5%4.8%HIGH — formula predicts structural stability of institutional anchors in Zone 3 OECD statesApril 7, 2026Pending
Will Keir Starmer be out as UK Prime Minister by December 31, 2026?
Formula: 58% [40%, 73%]. UK May 7-8 local elections occurred during the API-blocked window. Pre-registered 24-72h post-vote re-score window expired 2026-05-11 with direct Polymarket API still blocked at the sandbox boundary (sessions 27-34 all blocked on `*.polymarket.com` DNS). Session 33 indirect WebSearch fallback recovered a Starmer-OUT trading band ≈ [0.49, 0.69] on a $18M volume — implying Starmer-IN ≈ [0.31, 0.51]. Project estimate (0.58) is now +7pp above the upper bound to +27pp above the lower bound, but does NOT cross the formula CI lower bound (0.40), so the soft warning flag did not escalate. Live time-series resolution permanently lost for the catalyst window; conversion from live test to historical-price backfill conditional on operator-level API allowlist edit.
58.0%65.0%LOW — qualitative institutional-constraint assessment. Zone 3 OECD — PSI validity 0.25.April 16, 2026Pending
Will a Russia-Ukraine ceasefire be officially announced before January 1, 2027?
Formula: 42% [22%, 63%] — wide CI reflects low PSI validity in active conflict. War-termination literature (Richardson power law + Weibull duration model): ~38% base rate at this conflict duration. Economic constraint vector for Russia elevated. Uncertainty flag: endogenous US pressure variable not formally modeled.
42.0%38.0%LOW — PSI validity in active-conflict zones: 0.15. Prediction driven primarily by cliodynamic war-termination base rates and economic-constraint analysis, not full formula composite.April 19, 2026Pending
Will the IMF declare a global recession for calendar year 2026?
Formula: 28% [14%, 44%] — above market consensus at 22%. Key drivers: (1) global trade network fragility elevated post-tariff shock (percolation model: 0.34 fragility index), (2) debt-cycle phase for 7/10 largest economies at late-expansion, (3) Minsky moment probability elevated at 0.19 for 2026. Market may be underweighting tail risk from trade-network cascade effects.
28.0%22.0%MEDIUM — Econophysics + cliodynamics modules. Global macro-state vector S_t elevated on trade-stress and debt-cycle components. Mean-field approximation applied across G20 economies.April 19, 2026Pending
Will China launch a military invasion of Taiwan before January 1, 2030?
Formula: 12% [4%, 28%]. Drivers: elite overproduction index in PRC elevated (0.019), economic growth deceleration increasing internal pressure, but deterrence calculus (nuclear shadow + US commitment) constrains probability. Structural geopolitical dynamics — formula and market in rough agreement; small divergence driven by formula weighting internal-pressure vector more heavily.
12.0%10.0%LOW — Long-horizon prediction (3.7 years). Full formula composite active but confidence degrades with horizon: CI [4%, 28%]. Cliodynamics + political-scientist + network-scientist modules primary.April 19, 2026Pending
Will any EU member state begin formal exit proceedings before January 1, 2030?
Formula: 18% [8%, 32%]. Driven by EU institutional fragility analysis: Hungary and Poland have elevated elite overproduction indices (0.017, 0.014) and persistent rule-of-law disputes creating structural exit pressure. Historical base rate for bloc fragmentation at this cohesion level: ~15%. Formula adds 3% above market from network percolation model of EU solidarity bonds — threshold not yet reached but trending upward.
18.0%14.0%LOW — Long-horizon (3.7 years). Political-scientist + network-scientist modules. PSI validity for Hungary: 0.40; Poland: 0.35. Institutional fragility index elevated in both.April 19, 2026Pending
Will Brazil experience a significant democratic backsliding event (suspension of legislature or judiciary by executive) before January 1, 2028?
Formula: 14% [5%, 27%]. Drivers: (1) post-2022 polarization index elevated at 0.71, (2) military institutional friction persists above pre-2016 baseline, (3) Turchin secular cycle places Brazil in 'integrative phase fragility' zone. Market at 11% — formula adds 3% from structural polarization dynamics. Historical analogues: Turkey 2016, Bolivia 2019 — similar PSI trajectories resolved without backsliding in ~75% of cases.
14.0%11.0%LOW — Zone 1 (emerging). PSI validity 0.30. Cliodynamics + political-scientist modules. Secular cycle analysis active: Brazil in post-crisis consolidation phase with elevated polarization index.April 19, 2026Pending
Will China's average annual GDP growth rate fall below 3% over the 2027–2030 period?
Formula: 32% [14%, 52%] — above market consensus at 25%. Key structural drivers: (1) real estate sector debt overhang equivalent to ~28% of GDP, consistent with Japan 1991 and Korea 1997 pre-correction trajectories, (2) demographic compression (working-age population decline accelerating from 2025), (3) Minsky moment probability for property sector at 0.31 over the 2026–2028 window. Formula flags: these are macro-structural forces acting on aggregate economic behavior — not a prediction about policy response.
32.0%25.0%LOW — Long-horizon (4+ years). Econophysics + cliodynamics modules. Debt cycle dynamics and demographic transition vector both active. Wide CI [14%, 52%] reflects horizon uncertainty.April 19, 2026Pending
Will any G20 member experience a sovereign debt default or restructuring before January 1, 2028?
Formula: 22% [10%, 36%]. Structural drivers: Argentina (debt restructuring cycle: 0.62 recurrence probability over 5-year horizon), Turkey (external debt vulnerability index: 0.58), and South Africa (fiscal fragility index: 0.44) are primary contributors to the aggregate probability. Network contagion analysis: sovereign debt percolation threshold not breached but single-node default would elevate systemic risk. Market at 17% — formula adds 5% from debt-cycle phase analysis.
22.0%17.0%MEDIUM — Econophysics + political-scientist modules. Debt cycle phase analysis across G20. PSI composite stress index elevated for 3 members. Mean-field approximation across sovereign debt network.April 19, 2026Pending
Will global trade as a share of world GDP fall below its 2010 level by January 1, 2029?
Formula: 35% [18%, 52%]. Structural deglobalization pressure: (1) trade network percolation analysis shows fragility index at 0.41 (above 2008 crisis level of 0.38), (2) tariff shock propagation model predicts 8–14% bilateral trade reduction in affected dyads, (3) supply chain reshoring dynamics now self-reinforcing above critical threshold. This is a collective structural phenomenon driven by aggregate firm behavior and state policy interactions — not reducible to any single actor's decision.
35.0%28.0%LOW — Long-horizon (2.7 years). Econophysics + network-scientist modules. Trade network topology fragility index elevated post-2018. Mean-field analysis across bilateral trade relationships.April 19, 2026Pending
Will any G7 country experience a mass civil-unrest event (>1M participants in a single protest or protest wave) before January 1, 2028?
Formula: 38% [22%, 56%]. Structural drivers: (1) PSI composite elevated in 4/7 G7 members above 0.55 threshold, (2) elite overproduction index averaged across G7 at 0.016 (post-1970 high), (3) Granovetter-threshold simulation with current opinion-dynamics parameters places cascade probability at 0.31 per G7 country per 24-month window, aggregated to 0.38 at least-one-occurrence. Pure collective-action prediction — no individual actor or specific party framing. Replaces withdrawn Trump-impeachment prediction (individual-event, out of scope under v1.0 protocol).
38.0%30.0%MEDIUM — Long-horizon (~20 months). Comp-sociologist + cliodynamicist + political-scientist modules. Collective-action threshold model + PSI composite across G7 states.April 19, 2026Pending

Resolved Predictions (1)

MarketOur PMarket PConfidenceDate MadeResult
Hungary: Will Tisza win the 2026 parliamentary election?
FIRST FORMULA WIN. Market Brier: 0.0729. Our Brier: 0.0400. Elite overproduction + PSI elevation correctly identified structural fragility of Orbán's regime.
80.0%74.0%HIGH — Zone 2 (industrializing) PSI validity 0.45April 7, 2026
Beat marketBrier: 0.0400