Statistical Evidence

Brieuc edited this page Mar 12, 2026 · 1 revision

GIFT Statistical Evidence

Version: 3.3.24 Validation Date: March 2026 Scripts: bulletproof_validation_v33.py (7-component), exhaustive_validation_v33.py (3M+ configs)

Note (v3.3.24): The detailed statistics below were computed with NuFIT 5.3 experimental values (0.21% mean deviation). With the v3.3.24 update to NuFIT 6.0 and improved neutrino formulas (θ12 = arctan(2/3), θ23 = arctan(√(14/11))), the current mean deviation is 0.24% (32 well-measured observables) / 0.57% (all 33 incl. δ_CP). The qualitative conclusions (significance > 4.2σ, unique optimum among 3M+ configs, decisive Bayes factors) remain unchanged.

Executive Summary

Key Metrics

Metric	Value
Mean deviation (custom)	0.21%
Mean deviation (relative)	0.41%
Null model p-value	< ×ばつ10−5 (σ > 4.2)
Westfall-Young global p	×ばつ10−3
Best Bayes factor	4,738 (decisive)
Pre-registered test p	×ばつ10−5 (σ = 4.0)
Configs tested (exhaustive)	3,070,396
Better than GIFT	0

Results by Precision Tier (Relative Deviation)

Tier	Observables	Threshold	Interpretation
Excellent	14/33 (42%)	< 0.1%	Precision match
Good	29/33 (88%)	< 1%	Strong agreement
Acceptable	33/33 (100%)	< 5%	Within tolerance
Needs work	0/33 (0%)	> 5%	None

Interpretation

100% of predictions agree with experiment within 5%
88% of predictions agree within 1%
GIFT is uniquely optimal among all 3,070,396 tested configurations
All three null model families reject at p < ×ばつ10−5
Westfall-Young maxT permutation FWER confirms 11/33 individually significant after correlation-aware correction
Bayes factors range from 304 to 4,738 across four prior specifications (all decisive)

1. Methodology

1.1 Primary Metric: Custom Deviation

The GIFT validation uses a custom deviation metric that captures goodness-of-fit across heterogeneous observables (angles, ratios, coupling constants):

$$\text{Dev} = \frac{|\text{pred} - \text{exp}|}{|\text{exp}|} \times 100%$$

averaged uniformly over all 33 observables. This avoids the σ-pull pathology where extraordinarily precise measurements (α−1 with σ = ×ばつ10−5) dominate the aggregate.

1.2 Why Not Chi-Squared?

Observable	Rel. Dev.	Pull (σ)	Issue
m_μ/m_e	0.12%	52,951σ	σ_exp = ×ばつ10−6
α−1	0.002%	128σ	σ_exp = ×ばつ10−5

The relative deviation correctly identifies these as excellent predictions (~0.1%), while pulls are misleadingly large due to extraordinary experimental precision and the absence of theoretical uncertainty estimates.

1.3 Seven-Component Validation

The bullet-proof validation covers seven independent components:

Pre-registration manifest: SHA-256 hash locking observables/formulas before testing
Three null model families: Permutation, structure-preserved, adversarial
Per-observable p-values: With Bonferroni, Holm, Benjamini-Hochberg, and Westfall-Young maxT corrections
Held-out cross-prediction: Leave-one-sector-out + pre-registered dev/test split
Robustness analysis: Weight perturbations, noise MC, jackknife, leave-k-out, noise sensitivity curve
Multi-seed replication: 10 independent seeds + alternative metric (χ2)
Bayesian analysis: Multi-prior Bayes factors, 4-statistic PPC, WAIC comparison

2. Null Model Families

Three independent null model families each reject at the resolution limit of 50,000 permutations:

Null Family	p-value	σ	Description
Permutation	×ばつ10−5	4.27	Random (b2, b3) assignment; null mean 82.6% vs GIFT 0.21%
Structure-preserved	×ばつ10−5	4.27	0/50,000 configs match or beat GIFT
Adversarial	×ばつ10−5	4.27	Best adversary achieves 65.8% vs GIFT 0.21%

All three null families produce mean deviations ×ばつ worse than GIFT.

3. Multiple Testing Corrections

3.1 Per-Observable Significance (α = 0.05)

Correction	Significant	Method
Raw	21/33	Uncorrected empirical p-values
Bonferroni	0/33	Conservative (divides α by 33)
Holm	0/33	Step-down, still very conservative
Benjamini-Hochberg	20/33	FDR control (less conservative)
Westfall-Young maxT	11/33	Permutation FWER respecting correlations

3.2 Westfall-Young maxT

The Westfall-Young step-down maxT procedure is the gold standard for family-wise error rate (FWER) control because it:

Respects the correlation structure between test statistics (unlike Bonferroni)
Uses the joint distribution of max statistics under permutation
Provides exact FWER control

Result: Global p = ×ばつ10−3, with 11/33 observables individually significant. This is the definitive answer to "how many observables survive rigorous multiple-testing correction while accounting for inter-observable correlations?"

3.3 Look-Elsewhere Effect

Explicit LEE trial count: 23,167,200 (all (b2, b3, gauge, holonomy) combinations). Even after LEE correction, the framework's performance remains significant.

4. Cross-Prediction (Held-Out Tests)

4.1 Leave-One-Sector-Out

Each physics sector is held out in turn; GIFT's (b2, b3) is tested on the held-out sector without retuning:

Sector	Held-out obs.	Test dev.	p-value	σ
Gauge couplings	3	0.17%	×ばつ10−3	3.3
Leptons	4	0.06%	×ばつ10−4	3.9
Quarks	9	0.24%	×ばつ10−2	2.6
PMNS mixing	4	0.23%	×ばつ10−4	3.9
CKM matrix	6	0.59%	×ばつ10−4	3.8
Bosons	3	0.13%	×ばつ10−4	3.7
Cosmology	3	0.19%	×ばつ10−5	4.1

All non-trivial sectors achieve p < 0.05, confirming cross-sector prediction holds.

4.2 Pre-Registered Dev/Test Split

Set	N	Deviation
Development (16 obs.)	16	0.10%
Test (17 obs.)	17	0.32%
Test p-value	,	×ばつ10−5 (σ = 4.0)

The held-out test set achieves σ = 4.0, confirming that GIFT's accuracy is not an artifact of fitting to a particular subset.

5. Robustness and Sensitivity

5.1 Weight Perturbation

Weighting	Mean Dev.	Conclusion
Uniform	0.21%	Reference
Uncertainty-weighted	0.00%	Precision-dominated
Inverse-range	0.62%	Worst case
Random (100 trials)	0.21% ± 0.02%	Stable

All weighting schemes yield < 1%.

5.2 Jackknife & Leave-k-Out

Jackknife: Maximum influence of any single observable is 0.029% (sin2θ23 CKM). No single observable dominates the result.
Leave-k-out stability:

k removed	Mean Dev.	Range
1	0.212% ± 0.008%	[0.18, 0.22]
3	0.212% ± 0.015%	[0.14, 0.23]
5	0.212% ± 0.020%	[0.13, 0.25]

The result is remarkably stable under systematic removal.

5.3 Noise Sensitivity Curve

Sweeping Gaussian noise of amplitude σ_factor ×ばつ σ_exp over 200 trials per point:

Noise factor	Mean Dev.	Std. Dev.
×ばつ	0.21%	0.00%
×ばつ	0.46%	0.09%
×ばつ	0.82%	0.18%
×ばつ	1.17%	0.23%
×ばつ	1.57%	0.36%
×ばつ	2.34%	0.55%
×ばつ	3.09%	0.73%
×ばつ	4.61%	1.18%

Interpretation: At ×ばつ published experimental uncertainties, the mean deviation rises from 0.21% to 1.57%. This is the physical precision floor: the framework's 0.21% agreement is already within a factor of ~7 of what measurement noise alone would produce. Improving the framework's predictions further would require experimental measurements to become more precise.

5.4 Noise Monte Carlo

Over 1,000 trials with ×ばつ published uncertainties:

Mean: 1.50% ± 0.35%
Only 5% of trials remain below 1%

This confirms the noise sensitivity curve: the 0.21% GIFT result sits well below the noise floor.

6. Multi-Seed Replication

Metric	Value
Seeds tested	10
p-value range	×ばつ10−5, ×ばつ10−4]
σ range	[3.8, 4.1]
All significant at α=0.05	Yes
Alternative metric (χ2)	p = ×ばつ10−5 (σ = 4.1)
Cross-metric consistent	Yes

Results are invariant to PRNG seed and hold under alternative metric (relative χ2).

7. Bayesian Analysis

7.1 Bayes Factors (4 Prior Specifications)

Prior	BF	Interpretation
Skeptical (uniform)	304	Decisive for H1
Reference (half-normal)	397	Decisive for H1
Jeffreys	2,423	Decisive for H1
Enthusiastic (uniform ≤1%)	4,738	Decisive for H1

All four priors yield decisive evidence (BF > 100) for GIFT over the null. The skeptical prior, which grants the null maximum latitude, still yields BF = 304.

7.2 Posterior Predictive Checks (4 Statistics)

Statistic	Observed	Replicated mean	PPC p	Status
T1: Mean deviation	0.21%	1.53%	1.000	↑ Superior
T2: Max deviation	1.13%	12.04%	1.000	↑ Superior
T3: Count > 1%	1	12.1	1.000	↑ Superior
T4: Worst sector	0.59%	4.28%	1.000	↑ Superior

Status: superior_to_noise: The framework fits significantly better than measurement noise predicts across all four test statistics. Replicated datasets (adding noise at published uncertainty levels) consistently show ×ばつ larger deviations than GIFT achieves. This is consistent with genuine physical content rather than numerical coincidence.

Note: PPC p ≈ 1.0 does not indicate model misfit. In the PPC framework, p near 0 indicates systematic underfitting, p near 0.5 indicates perfect calibration to the noise model, and p near 1 indicates the model surpasses noise expectations. The result confirms that GIFT's precision exceeds what measurement uncertainties alone would predict.

7.3 WAIC Model Comparison

Model	WAIC	Interpretation
GIFT	29.9	Preferred
Null	580.2	,
ΔWAIC	550.3	Strongly favors GIFT

8. Exhaustive Configuration Search

8.1 Betti Number Variations (3,070,396 configs)

Metric	Value
b2 range	[5, 100]
b3 range	[40, 200]
Configs tested	3,070,396
Better than GIFT	0
95% CI (Clopper-Pearson)	[0, ×ばつ10−5]

8.2 Gauge Group Comparison

Rank	Gauge Group	Mean Dev.
1	×ばつE8	0.41%
2	×ばつE8	8.8%
3	×ばつE8	15.5%

×ばつE8 achieves ×ばつ better agreement than the next best alternative.

8.3 Holonomy Group Comparison

Rank	Holonomy	dim	Mean Dev.
1	G2	14	0.41%
2	SU(4)	15	1.5%
3	SU(3)	8	4.4%
4	Spin(7)	21	5.4%

G2 achieves ×ばつ better agreement than Calabi-Yau (SU(3)).

9. Results by Physics Category

Category	N	Mean Dev.	Max Dev.	<0.1%	<1%	<5%
Structural	1	0.00%	0.00%	1/1	1/1	1/1
Electroweak	4	0.36%	0.90%	1/4	4/4	4/4
Lepton Mass Ratios	4	0.06%	0.12%	2/4	4/4	4/4
Quark Mass Ratios	4	0.34%	1.21%	2/4	3/4	4/4
PMNS Mixing	7	0.94%	4.81%	3/7	5/7	7/7
CKM Mixing	3	0.74%	1.50%	0/3	2/3	3/3
Boson Mass Ratios	3	0.12%	0.29%	2/3	3/3	3/3
Cosmological	7	0.19%	0.48%	3/7	7/7	7/7
TOTAL	33	0.41%	,	14/33	29/33	33/33

10. Honest Caveats

10.1 What This Validation Establishes

Statistical significance: p < ×ばつ10−5 against three independent null families (σ > 4.2)
Multiple-testing robustness: 11/33 survive Westfall-Young maxT FWER (global p = 0.008)
Cross-prediction: All non-trivial sectors and the pre-registered test split are significant
Bayesian confirmation: BF 304–4,738 across four prior specifications, all decisive
Stability: Invariant to weighting, seed, metric choice, and observable removal

10.2 What This Validation Does NOT Establish

Formula justification: Statistical optimality does not explain why these formulas were chosen. The derivations in S2 provide the theoretical motivation, but statistical agreement alone is not proof of physical correctness.
Physical truth: Excellent agreement ≠ correct underlying physics. The framework could be a very effective parameterization that captures patterns without the proposed geometric mechanism being the correct explanation.
Completeness: Only TCS G2-manifolds with specific gauge/holonomy groups were tested.

10.3 PPC Superior-to-Noise Status

The posterior predictive checks show PPC p = 1.0 across all four test statistics. This means the framework's predictions are more precise than measurement noise alone would predict. Possible explanations:

The framework captures genuine physical structure (the GIFT claim)
Published experimental uncertainties are conservative
There are correlations between observables not captured by the noise model

This is a strength of the framework, not a weakness, but it means the PPC cannot distinguish between these explanations.

10.4 Noise Sensitivity as Physical Limit

At ×ばつ published uncertainties, mean deviation rises from 0.21% to 1.57%. This defines the measurement precision floor: the framework's predictions are already within ×ばつ of what the best current measurements can distinguish from perfect agreement. Further validation requires more precise experiments.

10.5 Bonferroni/Holm Yielding Zero

Bonferroni and Holm corrections yield 0/33 significant observables because they divide α by 33, which is extremely conservative for correlated tests. This is why the Westfall-Young maxT procedure is the correct correction: it respects the correlation structure and yields a meaningful result (11/33 significant, global p = 0.008).

11. Falsification Predictions

Prediction	GIFT Value	Current Exp.	Target	Experiment	Timeline
δ_CP	197°	197° ± 24°	±5°	DUNE	2034-2039
sin2θ_W	3/13	0.2312 ± ×ばつ10−5	±10−5	FCC-ee	2040s
Ω_DM/Ω_b	43/8	5.375 ± 0.1	±0.01	CMB-S4	2030s
m_s/m_d	20	20 ± 1	±0.3	Lattice QCD	2030

12. How to Reproduce

Bullet-proof validation (7 components)

cd publications/validation
python3 bulletproof_validation_v33.py

Requirements: Python 3.8+, no external dependencies Output: bulletproof_validation_v33_results.json Runtime: ~15 seconds

Exhaustive search (3M+ configurations)

cd publications/validation
python3 exhaustive_validation_v33.py

Runtime: ~2–5 minutes

13. Conclusions

Primary Finding

GIFT achieves 0.21% mean deviation (0.41% relative) across 33 observables. Among 3,070,396 tested configurations, zero perform better. This result survives:

Three independent null model families (p < ×ばつ10−5)
Westfall-Young maxT FWER correction (global p = 0.008, 11/33 individually significant)
Pre-registered dev/test split (test p = ×ばつ10−5)
Four Bayesian prior specifications (BF 304–4,738, all decisive)
Weight perturbation, jackknife, and leave-k-out stability analysis
Multi-seed and cross-metric replication

Statistical Summary

Metric	Value
Within 0.1%	42% (14/33)
Within 1%	88% (29/33)
Within 5%	100% (33/33)
Mean deviation	0.21% (custom), 0.41% (relative)
Null model p	< ×ばつ10−5 (σ > 4.2)
Westfall-Young global p	0.008
Best Bayes factor	4,738
Configs tested	3,070,396
Better than GIFT	0

References

Joyce, D.D. Compact Manifolds with Special Holonomy (2000)
Westfall, P.H. & Young, S.S. Resampling-Based Multiple Testing (1993)
Particle Data Group (2024), Review of Particle Physics
Planck Collaboration (2020), Cosmological parameters
NuFIT 5.3 (2024), Neutrino oscillation parameters
CODATA 2022, Fundamental physical constants

GIFT Framework v3.3.24: Bullet-Proof Statistical Evidence Validation: March 2026 | 7-component analysis | Mean deviation: 0.24% (32 well-measured) / 0.57% (all 33)

Home

Start Here

By Audience

Papers

Reference

Blog Posts

Project

Statistical Evidence

GIFT Statistical Evidence

Executive Summary

Key Metrics

Results by Precision Tier (Relative Deviation)

Interpretation

1. Methodology

1.1 Primary Metric: Custom Deviation

1.2 Why Not Chi-Squared?

1.3 Seven-Component Validation

2. Null Model Families

3. Multiple Testing Corrections

3.1 Per-Observable Significance (α = 0.05)

3.2 Westfall-Young maxT

3.3 Look-Elsewhere Effect

4. Cross-Prediction (Held-Out Tests)

4.1 Leave-One-Sector-Out

4.2 Pre-Registered Dev/Test Split

5. Robustness and Sensitivity

5.1 Weight Perturbation

5.2 Jackknife & Leave-k-Out

5.3 Noise Sensitivity Curve

5.4 Noise Monte Carlo

6. Multi-Seed Replication

7. Bayesian Analysis

7.1 Bayes Factors (4 Prior Specifications)

7.2 Posterior Predictive Checks (4 Statistics)

7.3 WAIC Model Comparison

8. Exhaustive Configuration Search

8.1 Betti Number Variations (3,070,396 configs)

8.2 Gauge Group Comparison

8.3 Holonomy Group Comparison

9. Results by Physics Category

10. Honest Caveats

10.1 What This Validation Establishes

10.2 What This Validation Does NOT Establish

10.3 PPC Superior-to-Noise Status

10.4 Noise Sensitivity as Physical Limit

10.5 Bonferroni/Holm Yielding Zero

11. Falsification Predictions

12. How to Reproduce

Bullet-proof validation (7 components)

Exhaustive search (3M+ configurations)

13. Conclusions

Primary Finding

Statistical Summary

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally