Chapter 7: Statistical Testing & Error Control


I. Scope & Objectives


II. Terms & Symbols


III. Postulates & Minimal Equations


IV. Data & Manifest Conventions


V. Algorithms & Implementation Bindings

  1. Mapping to I50-*
    Multiple testing: I50-6 sequential_test (when type = alpha-spending), I50-9 gate_release (consuming FDR/FWER reports & evidence bundle).
  2. Statistical computation extensions
    • I50-11 adjust_pvalues(p:list, method:str, q_or_alpha:float) -> {p_adj:list, reject:list}
    • I50-12 plan_sample_size(spec:dict) -> {n_per_group:int, power:float}
    • I50-13 tost_equivalence(x:any, y:any, delta_equiv:float, alpha_sig:float) -> Verdict
  3. Reference flow (BH step-up)
    • Input p[1..m], q_star; sort to p_(i).
    • Compute thresholds tau_i = ( i / m ) * q_star.
    • k = max{ i : p_(i) ≤ tau_i }; set reject[1..k] = true, others false.
    • Produce adjusted p-values:
      p_adj_(i) = min_{j ≥ i} ( m / j ) * p_(j ), then map back to original indices.
  4. Reference flow (Holm step-down)
    • Sort p_(i); for i = 1..m, test
      p_(i) ≤ alpha_sig / ( m - i + 1 ).
    • If the first failure occurs at i*, reject {1..i*-1} and accept {i*..m}; if none fail, reject {1..m}.
  5. Reference flow (SPRT)
    • Initialize A, B; update Lambda_n per observation.
    • If Lambda_n ≥ A → reject; if Lambda_n ≤ B → accept; if n ≥ N_cap → stop = hold.
    • Output {decision, n_used, alpha_spent ≈ P_H0( reject )}.

VI. Metrology Flows & Run Diagram


VII. Verification & Test Matrix

  1. Type-I calibration (null simulations)
    • Under H0, repeat B times (B ≥ 10^4) to estimate P( reject ); require | P( reject ) - alpha_sig | ≤ tau_calib.
    • In multiple-testing settings, estimate FDR/FWER; verify they do not exceed budgets.
  2. Power & sample-size backchecks
    • Under H1, estimate power_hat; require power_hat ≥ power_target - tau_power.
    • CI coverage: two-sided 1 - alpha_sig intervals cover at 1 - alpha_sig ± tau_cov.
  3. Sequential robustness
    Optional stopping / data-peeking simulations: under alpha_spend constraints, verify no Type-I inflation; compare expected sample size of SPRT against N_cap.
  4. Assumption checks & robustness
    When normality/homoscedasticity fail, use permutation or bootstrap for p_value and CIs; record deviations.

VIII. Cross-References & Dependencies


IX. Risks, Limitations & Open Questions


X. Deliverables & Versioning

  1. Deliverables
    HypothesisRegistry.json, TestPlan.card, alpha_budget.yaml, p_table.csv, adj_p.csv, decision.log, power_check.json, ci_table.csv, SeqTest.rule, SeqTest.log, Evidence.bundle (with hash(•) and fingerprint).
  2. Versioning policy
    • Adjusting alpha_sig / beta_err / power_target or the family-control method → minor bump; changing the significance-budgeting or sequential rules → major bump.
    • All changes require updated signatures and Appendix C history entries.