The EFT Average-Gravity Framework vs a minimal cold dark matter (DM) NFW baseline
0 Executive Summary
This document is the Zenodo-archived full report (archive edition), covering the complete auditable chain: data, parameter ledger, fairness constraints, closure tests, robustness audits, and reproducibility bundles. Appendix B (P1A) provides standardized DM-baseline stress tests (more standard halo modeling + a key lensing-systematics nuisance) as a robustness supplement to the main P1 results.
Four cite-ready takeaways (see §2.4 for the canonical wording):
(1) In RC-only fits, the EFT family consistently outperforms the minimal DM_RAZOR baseline; a typical gain is Δlog𝓛_RC ~ 10^3 (Table S1a).
(2) In the RC→GGL closure test, EFT shows stronger cross-probe transferability: the closure strength Δlog𝓛_closure (True−Perm) is substantially higher than DM_RAZOR, and remains robust to covariance shrinkage, R_min, and σ_int scans (Fig. S3; Table S1b).
(3) In the joint RC+GGL fit, EFT retains a stable advantage, while the negative control (breaking the shared mapping) collapses the advantage—supporting the interpretation that the effect comes from the shared mapping rather than accidental overfitting (Fig. S4).
(4) Appendix B (P1A) strengthens the DM side with low-dimensional, auditable modules (hierarchical c–M scatter + prior, a 1-parameter core proxy, and a lensing shear-calibration nuisance m, plus a combined DM_STD) to address common critiques such as a “strawman baseline” or “systematics mistaken as physics”. These upgrades do not remove EFT’s closure advantage (Table B1; Fig. B1).
Data & code availability: Report Concept DOI https://doi.org/10.5281/zenodo.18526334; Full reproduction bundles Concept DOI https://doi.org/10.5281/zenodo.18526286. For Appendix B (P1A), tags are run_tag=20260213_151233, closure_tag=20260213_161731, joint_tag=20260213_195428.
1 Abstract
We perform a reproducible, quantitative comparison of two theoretical frameworks under the same data and the same statistical protocol: the “average-gravity modification” models proposed by Energy Filament Theory (EFT; not to be confused with the common abbreviation for Effective Field Theory), and a cold dark matter (DM) baseline model using an NFW halo (DM_RAZOR). DM_RAZOR is intentionally minimal (an NFW halo with a fixed mean c–M relation and no halo-to-halo scatter), serving as an auditable baseline rather than an exhaustive ΛCDM model family. We treat the EFT family here as a phenomenological, MOND-like effective-response parameterization to be tested under a unified statistical protocol, rather than deriving a microscopic first-principles theory in this paper.
The data comprise: (i) SPARC rotation curves (RC), uniformly preprocessed and binned into 2,295 velocity points (104 galaxies; 20 RC bins), and (ii) KiDS-1000 galaxy–galaxy weak lensing (GGL) excess surface density ΔΣ(R) from Brouwer et al. (2021) (4 stellar-mass bins × 15 R-points per bin, 60 points total, with the full covariance).
We sequentially perform RC-only inference, an RC→GGL closure test, GGL-only inference, and joint RC+GGL inference, and apply consistency audits to guarantee that every quoted number is traceable. Under a strict parameter ledger and shared-mapping constraints (DM: 20 log M200_bin; EFT: 20 log V0_bin + one global log ℓ), the EFT family outperforms DM_RAZOR in the joint fit: ΔlogL_total = 1155–1337 (relative to DM_RAZOR). More importantly, the closure test shows that the RC posterior has non-trivial predictive power for GGL: EFT achieves a closure strength ΔlogL_closure = 172–281, higher than DM_RAZOR’s 127; after randomly shuffling the RC-bin→GGL-bin grouping, the closure signal collapses to 6–23, confirming that it is neither a statistical coincidence nor an implementation artifact. Across systematic sweeps of σ_int, R_min, and covariance shrinkage, EFT’s relative advantage remains positive and stable in magnitude. To address common critiques such as a “strawman DM baseline” or “systematics-as-physics”, Appendix B (P1A) provides a standardized-yet-auditable DM-baseline stress-test suite: it retains three legacy one-parameter upgrades (SCAT/AC/FB) and adds (i) hierarchical c–M scatter + a mass–concentration prior, (ii) a 1-parameter core proxy inspired by coreNFW/DC14-style thinking, and (iii) a key lensing-systematics nuisance parameter (shear-calibration m), plus a combined model DM_STD. Under the same closure protocol, these upgrades do not remove EFT’s closure advantage (Table B1 / Fig. B1).
Keywords: rotation curves; galaxy–galaxy weak lensing; closure test; EFT; cold dark matter; Bayesian inference
2 Introduction and overview
Rotation curves (RC) and galaxy–galaxy weak lensing (GGL) probe gravity on complementary regimes: RC constrains the in-plane dynamical potential (and the radial-acceleration relation), while GGL measures the projected mass distribution on halo scales. A viable framework should not merely fit each dataset in isolation, but provide a consistent cross-dataset explanation under shared mappings and constraints.
We therefore adopt a closure-test–centered protocol: we first infer the RC-only posterior and forward-predict GGL, and then compare against negative controls in which the RC-bin→GGL-bin grouping is permuted or shuffled. This directly tests cross-dataset predictive transferability and helps rule out statistical coincidences or implementation artifacts.
Scope and positioning: we do not attempt to derive EFT from microscopic first principles in this manuscript. Rather, we treat EFT as a low-dimensional, MOND-like effective-response parameterization (specified by a kernel family f(x) and a global scale ℓ), and we test its cross-dataset consistency via the RC→GGL closure protocol under a strict parameter ledger.
Program context and scope statement: this paper is part of a continuing P-series observational search. We search existing galaxy-scale data for two possible effective backgrounds: (i) a deterministic, coarse-grained mean-response “gravity floor”, and (ii) a stochastic “noise floor” associated with microscopic fluctuations. In P1 we focus exclusively on the former: we use the strict RC→GGL closure protocol to look for observational signatures of a mean-response component, while keeping microphysical origins out of scope and comparing against an auditable DM baseline. As a heuristic micro-picture, if short-lived degrees of freedom existed in a quasi-steady manner, decay/annihilation would convert rest mass into other degrees of freedom carrying energy–momentum, which at the effective level motivates a mean-plus-fluctuations decomposition; we do not model this micro-picture quantitatively here.
To avoid over-interpretation, we set the following boundaries:
• What we do: under a strict parameter ledger and shared mapping, we compare EFT average-response models against DM baselines and use closure strength to quantify cross-dataset predictive transferability.
• What we do not do: we do not build a particle model, do not discuss production mechanisms, abundances/lifetimes, or cosmological constraints, and we do not model the stochastic “noise floor”.
• What we do not claim: we do not aim to falsify dark matter; P1 does not deliver a final verdict on whether such a floor exists, but reports stage-wise evidence—within the observationally stable regime considered, the data prefer models that include an average-response component.
On the DM side, we explicitly position DM_RAZOR as a minimal, auditable NFW baseline (fixed c–M without scatter; no adiabatic contraction, feedback-driven cores, halo non-sphericity, or environmental terms). Accordingly, the main-text claims are conditional: EFT outperforms this minimal baseline under a strict parameter ledger and shared mapping. To address a common question—whether a more standard ΛCDM baseline and a key lensing-systematics nuisance could change the conclusion—we collect such low-dimensional, auditable upgrades in Appendix B (P1A: standardized DM-baseline stress tests), under the same shared mapping and closure protocol (Table B1 / Fig. B1).
2.1 Tables S1a–S1b: Summary of key metrics (Strict)
Table S1a reports the primary comparison metrics for the joint fit (RC+GGL) (logL, ΔlogL, AICc, BIC). Table S1b reports the closure-test and robustness-scan metrics (closure, shuffle negative control, and the ranges under σ_int / R_min / cov-shrink sweeps). All values are taken from the strict master summary Tab_Z1_master_summary and can be traced item-by-item in the release archive.
Table S1a | Primary metrics for the joint fit (RC+GGL, Strict).
Model (workspace) | W kernel | k | Joint logL_total (best) | ΔlogL_total vs DM | AICc | BIC |
DM_RAZOR | none | 20 | -16927.763 | 0.0 | 33895.885 | 34010.811 |
EFT_BIN | none | 21 | -15590.552 | 1337.21 | 31223.501 | 31344.155 |
EFT_WEXP | exponential | 21 | -15668.83 | 1258.932 | 31380.057 | 31500.711 |
EFT_WYUK | yukawa | 21 | -15772.936 | 1154.827 | 31588.268 | 31708.922 |
EFT_WPOW | powerlaw_tail | 21 | -15633.321 | 1294.442 | 31309.038 | 31429.692 |
Table S1b | Closure and robustness metrics (Strict).
Model (workspace) | Closure ΔlogL (true−perm) | ΔlogL after shuffle (neg. control) | ΔlogL range under σ_int sweep | ΔlogL range under R_min sweep | ΔlogL range under cov-shrink sweep |
DM_RAZOR | 126.678 | 22.725 | — | — | — |
EFT_BIN | 231.611 | 14.984 | 459–1548 | 1243–1289 | 1337–1351 |
EFT_WEXP | 171.977 | 6.04 | 408–1471 | 1169–1207 | 1259–1277 |
EFT_WYUK | 179.808 | 14.688 | 380–1341 | 1065–1099 | 1155–1166 |
EFT_WPOW | 280.513 | 6.672 | 457–1500 | 1203–1247 | 1294–1308 |
2.2 Fig. S3: Closure strength (RC-only → predicting GGL)
Closure strength is defined as ΔlogL_closure ≡ ⟨logL_true⟩ − ⟨logL_perm⟩: we forward-predict GGL on RC-only posterior samples and compare against a negative control in which the RC-bin→GGL-bin mapping is permuted.

Fig. S3 | Closure strength (higher is better): mean log-likelihood advantage of RC-only → GGL prediction.
2.3 Fig. S4: Main comparison for the joint fit (RC+GGL)
The joint-fit advantage is defined as ΔlogL_total ≡ logL_total(model) − logL_total(DM_RAZOR). Under the same data, the same mapping, and nearly the same number of parameters, EFT variants achieve substantially higher best logL_total than DM_RAZOR.

Fig. S4 | Joint-fit advantage (higher is better): best logL_total of RC+GGL relative to DM_RAZOR.
2.4 Four take-home statements (ready to cite)
(1) Under a unified joint analysis of SPARC rotation curves and KiDS-1000 weak lensing, EFT average-gravity framework models systematically outperform the DM_RAZOR baseline under a strict like-for-like protocol: ΔlogL_total = 1155–1337 (relative to DM_RAZOR).
(2) The RC→GGL closure test shows that EFT has stronger predictive consistency: ΔlogL_closure = 172–281 versus 127 for DM_RAZOR; after shuffling the RC-bin→GGL-bin grouping, the closure signal collapses to 6–23, supporting a physically correlated (non-artifactual) closure signal.
(3) Systematic sweeps over σ_int, R_min, and covariance shrinkage preserve both the sign and the order of magnitude of “EFT > DM_RAZOR”, indicating robustness to common sources of systematic uncertainty.
(4) Appendix B (P1A) strengthens the DM side in a standardized yet auditable way: it retains three legacy one-parameter branches (SCAT/AC/FB) and adds hierarchical c–M scatter + prior, a 1-parameter core proxy, and a lensing shear-calibration nuisance m (plus a combined DM_STD). Only the feedback/core branch yields a modest net increase in closure strength (122.21→129.45; ΔΔlogL_closure≈+7.25), while the other upgrades do not improve closure. Therefore, the main conclusions do not hinge on DM_RAZOR being an overly weak baseline.
3 Data and preprocessing
We use two public datasets and perform downloading, sha256 verification, and preprocessing with fully traceable scripts in the project repository. To ensure fair comparison across models, all workspaces (EFT_BIN / EFT_WEXP / EFT_WYUK / EFT_WPOW / DM_RAZOR) share the same preprocessed inputs and the same binning protocol.
3.1 Rotation curves (RC, SPARC)
RC data are taken from the SPARC Rotmod_LTG release (175 rotmod files). After preprocessing, 104 galaxies are included in the modeling set, yielding 2,295 RC velocity points aggregated into 20 RC bins.
3.2 Weak lensing (GGL, KiDS-1000 / Brouwer et al. 2021)
GGL data follow Brouwer et al. (2021) (KiDS-1000), using Fig. 3 excess surface density ΔΣ(R): 4 stellar-mass bins with 15 radial points per bin (60 points total), with the full covariance matrices.
3.3 RC-bin → GGL-bin mapping and total sample size
The 4 GGL stellar-mass bins are linked to the 20 RC bins by a fixed mapping: each GGL bin corresponds to 5 RC bins, and RC bins are aggregated with galaxy-count weights. This shared mapping is held fixed across all models, and is the core constraint that makes the closure test meaningful.
4 Models and statistical methodology
4.1 Minimal mathematical specification of EFT and DM (implementation-linked; auditable)
This section provides a minimal mathematical specification that maps directly to the implementation.
(a) Rotation-curve (RC) model
For each RC data point (r, V_obs, σ_obs), we model the total speed via component addition: V_mod²(r) = V_bar²(r) + V_extra²(r), where V_bar is the baryonic contribution provided by SPARC mass modeling, and V_extra is the additional contribution from either EFT or DM.
(b) EFT average-gravity framework
The EFT extra term is parameterized in an “average squared speed” form: V_extra²(r) = V0_bin² · f(r/ℓ). Here V0_bin is a per-RC-bin amplitude parameter, and ℓ is a global scale parameter.
- none: f(x)=x/(1+x)
- exponential: f(x)=1−exp(−x)
- yukawa: f(x)=1−exp(−x)·(1+0.5x)
- powerlaw_tail: f(x)=1−(1+x)^(−1/2)
- (optional control) gaussian: f(x)=erf(x/√2) (not included in the main conclusion set)
Physical motivation (expanded): EFT interprets the additional gravitational response on galactic scales as an effective, scale-dependent response that can arise after coarse-graining more microscopic interactions. In this paper we do not assume a specific microscopic mechanism; instead, we adopt a minimal, auditable parameterization to enable controlled statistical comparison under a shared protocol.
For intuition, the implied extra acceleration can be written as a_extra(r)=V_extra²(r)/r=(V0_bin²/r)·f(r/ℓ). For r≫ℓ, f→1 so V_extra approaches a constant amplitude V0_bin, yielding an approximately flat contribution to the outer rotation curve. For r≪ℓ, kernels with f(x)≈x introduce a characteristic acceleration scale a0,bin≈V0_bin²/ℓ (up to an O(1) kernel-dependent factor), providing a MOND-like scaling intuition for the inner-to-outer transition.
The discrete kernel choices used here (none/exponential/yukawa/powerlaw_tail) act as low-dimensional proxies for different turn-on rates and long-range tails (e.g., Yukawa-like screening versus longer-tailed responses) and are intended as stress tests rather than an exhaustive model space. For weak lensing we construct an equivalent enclosed-mass and density profile from V_avg(r) and project it to ΔΣ(R); this equivalent density should be interpreted as an effective description of the lensing potential within the assumed spherical, weak-field mapping (full details are moved to Appendix A).
All kernels satisfy f(x)→1 as x→∞ (i.e., saturation V_extra²→V0²). For x≪1 they yield linear or sub-linear growth; e.g., exponential: f≈x, and powerlaw_tail: f≈x/2.
The EFT prediction for weak-lensing ΔΣ(R) is obtained by inverting the effective speed V_avg(r) to an enclosed mass and density profile, followed by projection: M_enc(r)=r·V_avg²(r)/G, ρ(r)=(1/4πr²)·dM_enc/dr, and ΔΣ(R) computed via standard line-of-sight integration.
(c) DM_RAZOR: an NFW cold dark matter halo baseline
DM_RAZOR uses an NFW density profile as the minimal DM baseline. Each RC bin is parameterized by M200_bin (20 parameters), and the concentration c(M) is set by a fixed c–M relation. We emphasize that DM_RAZOR is used as a controlled, auditable reference point, not as the most general ΛCDM implementation. Appendix B (P1A) implements standardized-yet-low-dimensional DM upgrades—(i) hierarchical c–M scatter via a prior, (ii) a one-parameter adiabatic-contraction strength, and (iii) a low-dimensional feedback/core proxy—plus a key lensing-systematics nuisance parameter, and quantifies their impact on both closure strength and joint-fit performance.
4.2 Parameter ledger and fair comparison (shared mapping = closure definition)
The main comparison set uses: DM_RAZOR with k=20 parameters; EFT variants with k=21 (an extra global log ℓ). All models share: the same RC data, the same GGL data with the same covariance, the same RC-bin and GGL-bin definitions, and the same fixed RC-bin→GGL-bin mapping. This shared structure defines a like-for-like comparison and underpins the closure test.
4.3 Likelihood, priors, and sampler
For RC we use a diagonal Gaussian likelihood with σ_eff² = σ_obs² + σ_int²; the main results fix σ_int=5 km/s and Run-5 scans σ_int. For GGL we use a multivariate Gaussian likelihood with the published covariance. Priors are broad and weakly informative on log-amplitudes and log-scales.
We employ an adaptive block Metropolis random-walk sampler: each step updates a random sub-block of parameters to improve acceptance in high dimensions, with mild windowed adaptation of step sizes targeting an acceptance rate of ~0.2.
4.4 Closure test and negative control (definitions)
The closure test (Run-2) evaluates whether the RC-only posterior can predict GGL without refitting to GGL. Specifically, we forward-generate GGL predictions for each of the 4 mass bins from RC-only posterior samples and compute logL_true. As a negative control, we permute the RC-bin→GGL-bin grouping while keeping the RC-only posterior fixed, and compute logL_perm. The closure strength is ΔlogL_closure = ⟨logL_true⟩ − ⟨logL_perm⟩.
5 Main results and interpretation
5.1 Main joint-fit results (RC+GGL)
The best joint logL_total and the relative advantage ΔlogL_total (vs DM_RAZOR) are summarized in Table S1a and Fig. S4. Within the main comparison set, EFT_BIN achieves the largest ΔlogL_total, while EFT_WYUK (the roadmap-required Yukawa kernel) remains strongly favored over DM_RAZOR.
Note that the joint improvement is dominated by the RC term: in the joint decomposition, ΔlogL_RC ≈ 1065 (≈80% of ΔlogL_total), consistent with a modest per-point gain (Δχ² ≈ 0.90 per datum) accumulated over N = 2295 RC points under the same diagonal Gaussian likelihood. Meanwhile, GGL and the closure protocol provide independent cross-dataset constraints, and the ranking remains stable under σ_int / R_min / covariance-shrink stress tests (Section 6; Table S1b).
5.2 Closure-test results (RC-only → GGL)
Closure strengths ΔlogL_closure are summarized in Table S1b and Fig. S3. EFT variants yield ΔlogL_closure = 171.977–280.513, higher than DM_RAZOR’s 126.678, indicating stronger predictive consistency from RC to GGL.
The negative control further supports the physical relevance of the closure signal: after shuffling the RC-bin→GGL-bin grouping, EFT closure strengths drop to 6–15 (depending on kernel), and the mean logL_true drops markedly, consistent with a mapping-dependent signal.

Fig. R1 | Negative control: the closure signal drops strongly after shuffling the grouping (derived from Tab_Z1 metrics).
5.3 Interpretation and limitations
Our conclusion is conditional: “on this dataset and under this protocol, EFT average-gravity modification models outperform the tested DM_RAZOR baseline.” Importantly, the DM side is represented only by a minimal NFW baseline with a fixed c(M) relation; we do not claim to exhaust the space of DM halo models. Likewise, EFT is tested through a minimal, auditable parameterization of an effective response; more microscopic EFT mechanisms are not assumed here.
To address this concern, we performed an independent extension experiment P1A (Appendix B). Without changing the data products, the RC-bin→GGL-bin shared mapping, or the audit framework, P1A strengthens the DM baseline in a low-dimensional, defensible way: it retains the legacy one-parameter branches SCAT/AC/FB, and adds hierarchical c–M scatter + a mass–concentration prior (DM_HIER_CMSCAT), a 1-parameter core proxy (DM_CORE1P), and a key lensing-systematics nuisance parameter (shear-calibration m; DM_RAZOR_M), plus a combined model DM_STD; EFT_BIN is kept as a reference control.
• DM_RAZOR_SCAT (c–M scatter): introduce a log-normal halo-to-halo concentration scatter parameter σ_logc around the mean c(M);
• DM_RAZOR_AC (adiabatic contraction): add a single strength parameter (e.g., α_AC) to interpolate between no-contraction and standard contraction;
• DM_RAZOR_FB (feedback/core): adopt a one-parameter core prescription (e.g., log r_core) to capture inner-halo core formation, while keeping the lensing-scale mapping NFW-like.
The quantitative scoreboard is summarized in Appendix B (Table B1 / Fig. B1; auto-generated from Tab_S1_P1A_scoreboard). In terms of closure, DM_RAZOR_FB yields a modest net gain (122.21→129.45; +7.25), while the other upgrades do not improve closure. On the joint-fit side, hierarchical c–M scatter (DM_HIER_CMSCAT) and the combined model DM_STD can substantially improve the joint best-fit logL, yet without improving closure—suggesting that their main benefit is added joint-fit flexibility rather than cross-probe transferability. Therefore, the core claims of this report are robust to these standardized DM-baseline upgrades and to the inclusion of a key lensing nuisance. The P1A reproducibility bundles (supplement_figs_tables and full_fit_runpack) will be deposited as additional files under the same Zenodo Concept DOI as the main full_fit_runpack (https://doi.org/10.5281/zenodo.18526286), maintaining a single archival entry point.
6 Robustness and control experiments
6.1 σ_int sweep (Run-5)
We systematically scan the intrinsic RC scatter σ_int. For each σ_int we repeat the joint inference and compute ΔlogL_total relative to DM_RAZOR. Across the scan range, EFT models remain favored with stable order-of-magnitude advantages.

Fig. R2 | Range of ΔlogL_total under the σ_int sweep (higher is better).
6.2 R_min sweep (Run-6)
To assess the impact of central-systematics (e.g., non-circular motions, resolution limits, or imperfect baryonic modeling), we apply an RC cutoff R_min and repeat the joint inference. EFT’s advantage remains positive and stable across tested R_min values.

Fig. R3 | Range of ΔlogL_total under the R_min sweep (higher is better).
6.3 cov-shrink sweep (Run-7)
To probe uncertainties in the GGL covariance, we apply shrinkage to each mass-bin covariance matrix: C_α = (1−α)C + α·diag(C), scanning α. Results show EFT remains favored and the relative advantage is stable over reasonable shrinkage levels.

Fig. R4 | Range of ΔlogL_total under the cov-shrink sweep (higher is better).
6.4 Ablation ladder (Run-8)
Within EFT_BIN we perform a nested ablation ladder: from a minimal model (no free parameters), to reduced degrees of freedom, up to the full model with 20-bin amplitudes plus a global scale. AICc/BIC favor the full EFT_BIN, consistent with the joint-fit gains being supported by the data.

Fig. R5 | Ablation ladder within EFT_BIN (AICc; lower is better).
6.5 Leave-one-bin-out prediction (Run-9)
We further perform a leave-one-bin-out (LOO) test on the 4 GGL mass bins: each time we hold out one bin, fit to the remaining bins (and all RC data), and evaluate the held-out bin likelihood. LOO behavior provides an additional check against overfitting.

Fig. R6 | LOO: log-likelihood distribution for the held-out bin (from Run-9 artifacts).
6.6 Negative control: RC-bin shuffle (Run-10)
Run-10 randomly regroups the 20 RC bins into 4×5 and recomputes closure while keeping the RC-only posterior fixed. Compared with the original mapping, shuffling leads to a substantial drop in mean logL_true, providing a negative-control confirmation that the closure signal depends on the physically motivated bin mapping.

Fig. R7 | Negative control: shuffling the mapping causes a clear drop in mean logL_true (from Run-10 artifacts).
7 Provenance and consistency audits
All quoted numerical results in this manuscript are traceable to the archived release tables and audit logs. To keep the main text readable, the full provenance trail (tags, audit tables, checksum manifests, and verification recipes) is provided in Appendix A.
8 Reproducibility and Zenodo archiving (Reproducibility & Archive)
Data and code availability: the SPARC rotation-curve data and the KiDS-1000 GGL measurements used here are publicly available. The report is archived on Zenodo (Concept DOI: https://doi.org/10.5281/zenodo.18526334) and the full reproducibility runpack on Zenodo (Concept DOI: https://doi.org/10.5281/zenodo.18526286). Step-by-step commands, environment specifications, package inventory, and integrity hashes are provided in Appendix A.
Under the same full reproducibility Concept DOI (https://doi.org/10.5281/zenodo.18526286), we provide two complementary reproducibility entries:
• P1 (main text) full_fit_runpack: reproduces the EFT vs DM_RAZOR RC-only / closure / joint results and the robustness scans, and regenerates the main tables/figures.
• P1A (Appendix B) full_fit_runpack: reproduces the standardized DM-baseline stress tests (DM 7+1 + DM_STD, with an EFT_BIN control) and regenerates Table B1 and Fig. B1.
The P1A packages (supplement_figs_tables and full_fit_runpack) will be uploaded as additional files under the same Concept DOI once verified, keeping a single archival entry point.
9 Acknowledgements and statements
9.1 Acknowledgements
We thank the SPARC and KiDS-1000 teams for making their data and documentation publicly available, and we thank all participants involved in the reconstruction and audit workflow of this project.
9.2 Author contributions (CRediT)
Guanglin Tu conceived the study, designed the methodology, implemented the engineering pipeline, organized the data, performed the formal analysis, implemented and audited the reproducibility workflow, and wrote the manuscript.
9.3 Funding
This work was self-funded by Guanglin Tu (no external funding; no grant number).
9.4 Competing interests
Guanglin Tu is affiliated with the EFT Working Group, Shenzhen Energy Filament Science Research Co., Ltd., China; no other competing interests are declared.
9.5 AI assistance
OpenAI GPT-5.2 Pro and Gemini 3 Pro were used for language polishing, structured editing, and organizing the reproducibility workflow; they were not used to generate or modify data, results, figures, or code; they were not used to generate citations. The author assumes full responsibility for the content and citation accuracy.
10 References
- Lelli, F., McGaugh, S. S., & Schombert, J. M. (2016). SPARC: Mass Models for 175 Disk Galaxies with Spitzer Photometry and Accurate Rotation Curves. The Astronomical Journal, 152, 157. DOI: 10.3847/0004-6256/152/6/157.
- Brouwer, M. M., Oman, K. A., Valentijn, E. A., et al. (2021). The weak lensing radial acceleration relation: Constraining modified gravity and cold dark matter theories with KiDS-1000. Astronomy & Astrophysics, 650, A113. DOI: 10.1051/0004-6361/202040108.
- Wright, C. O., & Brainerd, T. G. (2000). Gravitational Lensing by Navarro–Frenk–White Halos. The Astrophysical Journal, 534, 34–40.
- Navarro, J. F., Frenk, C. S., & White, S. D. M. (1997). A Universal Density Profile from Hierarchical Clustering. Astrophysical Journal, 490, 493. DOI: https://doi.org/10.1086/304888
- Dutton, A. A., & Macciò, A. V. (2014). Cold dark matter haloes in the Planck era: evolution of structural parameters for NFW haloes. Monthly Notices of the Royal Astronomical Society, 441, 3359–3374. DOI: https://doi.org/10.1093/mnras/stu742
- Blumenthal, G. R., Faber, S. M., Flores, R., & Primack, J. R. (1986). Contraction of dark matter galactic halos due to baryonic infall. Astrophysical Journal, 301, 27. DOI: https://doi.org/10.1086/163867
- Di Cintio, A., Brook, C. B., Dutton, A. A., et al. (2014). A mass-dependent density profile for dark matter haloes including the influence of galaxy formation. Monthly Notices of the Royal Astronomical Society, 441, 2986–2995. DOI: https://doi.org/10.1093/mnras/stu729
- Read, J. I., Agertz, O., & Collins, M. L. M. (2016). Dark matter cores all the way down. Monthly Notices of the Royal Astronomical Society, 459, 2573–2590. DOI: https://doi.org/10.1093/mnras/stw713
- Energy Filament Theory. Zenodo DOI: https://doi.org/10.5281/zenodo.18517411
Appendix A. Provenance and reproducibility details
This appendix summarizes the long-term archival provenance and reproducibility information (run tags, audit results, package inventory, and cross-check points) so that readers can verify and reproduce the results on demand.
A.1 Full provenance and audit details
For long-term traceability, each run and output directory is stamped with a timestamped tag, and past artifacts are kept without overwriting. All core numbers quoted in this manuscript are taken from the strict compilation (compile_tag=20260205_035929) and are traceable through release manifests and audits.
• All stage tables are stamped with run_tag and stage-specific tags; the strict compilation selects canonical table sources that are complete and internally consistent.
• Values in Tab_Z1_master_summary and Tab_Z2_conclusion_highlights are cross-checked item-by-item against the selected canonical stage tables.
• During PDF compilation, a tag-audit is performed on referenced tables/figures to ensure no legacy artifacts are mixed in.
Key tags (to locate all intermediate artifacts): run_tag=20260204_122515; closure_tag=20260204_124721; joint_tag=20260204_152714; sigma_sweep_tag=20260204_161852; rmin_sweep_tag=20260204_195247; covshrink_tag=20260204_203219; ablation_tag=20260204_214642; LOO_tag=20260204_224827; negctrl_tag=20260204_234528; strict_compile_tag=20260205_035929; release_tag=20260205_112442.
Audit result: Tab_AUDIT_checks_strict shows pass=9, fail=0, skip=0 (see the release packet for details).
A.2 Full reproducibility instructions and package inventory
We adopt a reproducibility system consisting of a publication-ready report, a Tables & Figures Supplement, and a fully rerunnable pipeline package (full_fit_runpack). Readers can verify every cited table/figure via the Supplement, and can reproduce all numbers end-to-end using full_fit_runpack (which includes reference tables and an automated post-run comparison script for numerical consistency checks).
A.2.1 Reproducibility quickstart (RUN_FULL, Windows PowerShell)
This section provides a short reproducibility path (Windows PowerShell). For quick checking, use the Tables & Figures Supplement to verify every cited table/figure. For end-to-end reproduction, use full_fit_runpack: follow the in-package README/ONE_PAGE_REPRO_CHECKLIST and run verify_checksums.ps1 and RUN_FULL.ps1 (recommended Mode=full) to regenerate all tables/figures cited in the main text and Appendix B.
Zenodo entry (Concept DOI): https://doi.org/10.5281/zenodo.18526286.
Main tags: run_tag=20260204_122515, strict compile_tag=20260205_035929, release_tag: 20260205_112442
A.2.2 Archive packages and key checks (Packages & checks)
The Zenodo archive provides three complementary package types: (1) the publication-ready report (this manuscript, v1.1; including Appendix B: P1A standardized DM-baseline stress tests); (2) Tables & Figures Supplement (all table/figure assets referenced in this manuscript, including Appendix B; provided for both P1 and P1A); and (3) full_fit_runpack (end-to-end reproduction from scratch: download data and rerun the full pipeline; provided for both P1 and P1A). Packages (1)–(2) support rapid reading and independent checking, while (3) provides full reproducibility.
Artifact / package | Purpose (reader’s view) | Entry point / verification |
Publication-ready report (English and Chinese) | P1_RC_GGL_report_EN_PUBLICATION_V1_1.pdf | Zenodo-archived full report; includes Appendix B (P1A standardized DM-baseline stress tests). |
Tables & Figures Supplement (P1) | P1_RC_GGL_supplement_figs_tables_V1_1.zip | All CSV/PNG assets referenced in the main text, with generator scripts and tags. |
Tables & Figures Supplement (P1A) | P1A_supplement_figs_tables_v1.zip | All CSV/PNG assets referenced in Appendix B, including Tab_S1_P1A_scoreboard and Fig_S1_P1A_scoreboard. |
full_fit_runpack (P1) | P1_RC_GGL_full_fit_runpack_v1_1.zip | End-to-end reproduction from scratch: download data and rerun RC-only / closure / joint + robustness sweeps. |
full_fit_runpack (P1A) | P1A_RC_GGL_full_fit_runpack_v1.zip | End-to-end reproduction (Appendix B): rerun DM 7+1 + DM_STD (with an EFT_BIN control) and regenerate the appendix assets; includes reference tables and a post-run comparison script for numerical consistency checks. |
Citation guidance: when citing this manuscript or the accompanying reproducibility materials, please include the Zenodo Concept DOI (https://doi.org/10.5281/zenodo.18526334)
After reproduction, the following key artifacts should appear and can be cross-checked:
- report/tables/Tab_D_closure_summary__20260204_122515__*.csv (closure summary)
- report/tables/Tab_F_joint_summary__20260204_122515__*.csv (joint-fit summary)
- report/tables/Tab_G_joint_sigma_sweep__20260204_122515__*.csv (σ_int sweep)
- report/tables/Tab_H_joint_rmin_sweep__20260204_122515__*.csv (R_min sweep)
- report/tables/Tab_I_joint_covshrink_sweep__20260204_122515__*.csv (cov-shrink sweep)
- report/tables/Tab_R2_ablation_ladder__20260204_122515__*.csv (ablation)
- report/tables/Tab_R3_leave_one_bin_out__20260204_122515__*.csv (LOO)
- report/tables/Tab_R4_negctrl_rcbin_shuffle__20260204_122515__*.csv (negative control)
- report/final/Tab_Z1_master_summary__20260204_122515__20260205_035929.csv (Strict master table; matches Tables S1a/S1b and in-text numbers)
- report/final/P1_RC_GGL_final_bundle__20260204_122515__20260205_035929.pdf (final PDF bundle for rapid reading and citation)
Appendix B: P1A—Standardized DM-baseline stress tests (DM 7+1 + DM_STD; with an EFT control)
This appendix reports P1A, an extension experiment that keeps the main-text closure protocol unchanged while strengthening the DM baseline in a standardized, low-dimensional, auditable way. Its purpose is to reduce common critique points (e.g., “strawman baseline” and “systematics mistaken as physics”) without turning the DM side into an unconstrained high-dimensional fitter. P1A retains the legacy one-parameter branches SCAT/AC/FB and adds hierarchical c–M scatter + a mass–concentration prior, a 1-parameter core proxy, and a key lensing-systematics nuisance parameter (shear calibration m), plus a combined model DM_STD; EFT_BIN is included as a reference control.
Note: closure-strength metrics in Appendix B (P1A) are evaluated with a larger Monte Carlo budget (e.g., ndraw=400, nperm=24) than the quick screening budget used in some main-text closure summaries (e.g., ndraw=60, nperm=12) when covering the full EFT kernel family. Therefore the absolute values can shift at the O(10) level, but within each table all models share the same budget, and the sign and order of magnitude of the relative advantage remain stable.
B.1 Purpose and positioning (Why P1A, and why as an Appendix)
P1A does not attempt to exhaust all possible ΛCDM halo modeling choices (e.g., non-sphericity, environment dependence, high-dimensional galaxy–halo connection, or detailed baryonic physics). Instead, P1A follows a low-dimensional, auditable design: each upgrade module adds ≤1 key effective parameter, and all modules remain bound by the same three hard constraints as the main text: (i) an explicit parameter ledger (reported alongside AICc/BIC), (ii) the same RC-bin→GGL-bin shared mapping, and (iii) validation through the RC→GGL closure test (not merely RC-only fit quality).
B.2 The three enhancement branches: definitions and intent
P1A provides multiple DM workspaces (DM 7+1) that capture common DM-halo and lensing-systematics effects while remaining low-dimensional and auditable. In addition to the legacy one-parameter branches (SCAT/AC/FB), P1A adds hierarchical c–M scatter + prior, a 1-parameter core proxy, and a shear-calibration nuisance m, plus a combined DM_STD model; EFT_BIN is included as a reference control.
• DM_RAZOR_SCAT: add one hyper-parameter σ_logc for halo-to-halo (log-normal) concentration scatter around the mean c(M).
• DM_RAZOR_AC: add one strength parameter α_AC to interpolate between no contraction and standard adiabatic contraction.
• DM_RAZOR_FB: add one core scale parameter (e.g., log r_core) to capture feedback-driven core formation (with the lensing-scale mapping kept NFW-like over the R-range used here).
B.3 Same statistical protocol as the main text
P1A reuses the same data products, shared mapping, and audit framework as the main text, and follows the same output conventions: (1) RC-only inference (posterior_samples + metrics), (2) RC→GGL closure test (closure_summary + permuted baseline), and (3) joint RC+GGL fit (joint_summary). All quoted numbers in this appendix come from an auto-generated scoreboard table (Tab_S1_P1A_scoreboard) and can be validated after a full rerun using the reference-table comparison script bundled in the P1A full_fit_runpack.
B.4 Main results (scoreboard)
Table B1 | P1A scoreboard (larger is better; parentheses show differences relative to the DM_RAZOR baseline).
For convenience, the main-text EFT–vs–DM comparison (Tables S1a–S1b) reports ΔlogL_total ≈ 1155–1337 and ΔlogL_closure ≈ 172–281 relative to DM_RAZOR under the same strict protocol. Appendix B (P1A) makes the DM baseline harder to beat (standardized upgrades + a key lensing nuisance). Among DM upgrades, the largest closure gain remains modest (+7.25 for the FB/core branch), which is insufficient to change the main ranking or conclusions.
Table B1 | P1A scoreboard (higher is better; parentheses show differences relative to the DM_RAZOR baseline).
Model branch (workspace) | Δk | RC-only best logL_RC (Δ) | Closure strength ΔlogL_closure (Δ) | Joint best logL_total (Δ) |
DM_RAZOR | 0 | -15702.654 (+0.000) | 122.205 (+0.000) | -27347.068 (+0.000) |
DM_RAZOR_SCAT | 1 | -15702.294 (+0.361) | 121.236 (-0.969) | -23153.311 (+4193.758) |
DM_RAZOR_AC | 1 | -15703.689 (-1.035) | 121.531 (-0.674) | -23982.557 (+3364.511) |
DM_RAZOR_FB | 1 | -15496.046 (+206.609) | 129.454 (+7.249) | -27478.531 (-131.463) |
DM_HIER_CMSCAT | 1 | -15702.644 (+0.010) | 121.978 (-0.227) | -23153.160 (+4193.908) |
DM_CORE1P | 1 | -15723.158 (-20.504) | 122.056 (-0.149) | -27336.258 (+10.810) |
DM_RAZOR_M | 0 (+m) | -15702.654 (+0.000) | 122.205 (+0.000) | -27340.451 (+6.617) |
DM_STD | 2 (+m) | -15832.203 (-129.549) | 105.690 (-16.515) | -22984.445 (+4362.623) |
EFT_BIN | 1 | -14631.537 (+1071.117) | 204.620 (+82.415) | -19001.142 (+8345.926) |
Fig. B1 | P1A scoreboard: closure and joint ΔlogL relative to baseline (larger is better).

B.5 Run tags, asset entry points, and archiving plan (same DOI)
The example P1A tags used to generate the scoreboard are: run_tag = 20260213_151233; closure_tag = 20260213_161731; joint_tag = 20260213_195428. The canonical scoreboard assets are:
• Tab_S1_P1A_scoreboard__20260213_151233__20260213_161731__20260213_195428.csv
• Fig_S1_P1A_scoreboard__20260213_151233__20260213_161731__20260213_195428.png
After verification, the P1A reproducibility bundles (supplement_figs_tables / full_fit_runpack) will be uploaded as additional files under the same Zenodo Concept DOI as the main full_fit_runpack: https://doi.org/10.5281/zenodo.18526286.
B.6 Citation note
When citing the DM-baseline stress tests beyond the main-text conclusions, please cite this manuscript and add: “See Appendix B (P1A) for standardized DM-baseline stress tests under the same closure protocol.” If citing a specific table/figure asset, include the run_tag / closure_tag / joint_tag embedded in the filename to support item-by-item verification.