P1 Report Explainer — From Rotation Curves to Weak Lensing: Testing the Mean Gravitational Response of Energy Filament Theory (EFT)

A public-facing explainer based on P1_RC_GGL: A Strict Closure Test of Galaxy Dynamics and Weak Lensing (v1.1)

Original report by Guanglin Tu | Version basis: P1 v1.1 | Positioning: public explainer / not a peer-reviewed paper
Related archives: Report DOI 10.5281/zenodo.18526334 | Reproducibility package DOI 10.5281/zenodo.18526286

Reading Notes

This is an explainer, not another academic report. It is based on the original P1 report, preserves the key figures and tables, and adds public-facing explanations of “what this means” at each key step.

This article explains only the conclusions P1 reaches under its specified data sets, parameter ledger, and statistical protocol: in the joint test of galaxy rotation curves (RC) and galaxy-galaxy weak lensing (GGL), EFT’s mean gravitational response model clearly leads the minimal DM_RAZOR baseline tested here.

This article does not read P1 as a conclusion that “dark matter has been overturned.” P1 is only the first step in the P-series experiments. It tests one observable layer within EFT—the “mean gravity floor”—rather than the entire EFT theory.

0 | Understand P1 in Five Minutes: What Is This Study Actually Testing?

P1 can be read as a cross-probe validation experiment. It does not merely ask whether a model can fit one data set; it puts two very different gravitational readouts on the same audit table: rotation curves (RC) read the dynamics inside galactic disks, while galaxy-galaxy weak lensing (GGL) reads the projected gravitational response on larger scales.

RC is like a speedometer: it tells us how fast gas and stars orbit at different radii within a galactic disk.
GGL is like a scale: by measuring how foreground galaxies slightly bend background light, it infers the larger-scale average gravitational/mass distribution around galaxies.
P1’s core question is this: can the same model learn a pattern from RC and still make sense when that pattern is transferred to GGL?

P1’s Core Takeaway

P1 raises the comparison threshold from “does it fit one probe well?” to “does it close across probes?” Good performance under the correct mapping, followed by signal collapse when the mapping is shuffled, suggests that the model may have captured a gravitational structure shared by RC and GGL.

Table 0 | P1’s Core Numbers and How to Read Them

Metric	How P1 / P1A Reads It	Plain-Language Reading
Joint fit ΔlogL_total	Main-text comparison: EFT is 1155–1337 above DM_RAZOR	Total score gap across the two data sets; larger means a better overall explanation.
Closure strength ΔlogL_closure	Main-text comparison: EFT is 172–281, while DM_RAZOR is 127	Ability to predict GGL after inference from RC only; larger means more cross-probe self-consistency.
Negative-control shuffle	After shuffling RC-bin→GGL-bin, the EFT closure signal drops to 6–23	If the correct correspondence is broken, the advantage should disappear; the more it disappears, the more false signals are ruled out.
P1A multi-DM stress test	DM 7+1 + DM_STD, with EFT_BIN retained as a comparator	P1A does not look only at the minimal DM_RAZOR; it puts multiple low-dimensional, auditable DM enhancement branches into the same closure protocol.

1 | Why P1 Was Needed: Where Galaxy-Scale Cosmology Gets Stuck

The galaxy-scale problem has remained difficult because the need for “extra gravity/mass” is not just a rotation-curve phenomenon. A large body of observations shows a tight link between visible baryonic matter in galaxies and actual dynamical/lensing readouts. For the dark-matter route, this means dark halos, baryonic feedback, galaxy-formation histories, and observational systematics must be coordinated with great precision. For non-DM gravity routes, it means a model cannot merely look good on RC; it also has to hold up under weak lensing, population scaling relations, and negative controls.

That is P1’s motivation. It does not start from “dark matter is wrong” or “EFT must be right.” It puts one testable claim under scrutiny: can EFT’s mean gravitational response leave a reproducible and transferable signal in RC→GGL cross-probe closure?

External Literature Context: Why the RC+GGL Window Matters

McGaugh, Lelli, and Schombert (2016) proposed the radial acceleration relation (RAR), showing a tight relation with small scatter between the observed acceleration traced by rotation curves and the acceleration predicted from baryonic matter. This makes baryon–gravity response coupling an unavoidable issue for galaxy-scale theory.

Brouwer et al. (2021) used KiDS-1000 weak lensing to extend the RAR to lower accelerations and larger radii, comparing MOND, Verlinde emergent gravity, and LambdaCDM models. They also noted that early-/late-type galaxy differences, gas halos, and galaxy-halo connections remain key explanatory issues.

Mistele et al. (2024) further used weak lensing to infer circular-velocity curves for isolated galaxies and reported no clear decline out to hundreds of kpc and even about 1 Mpc, consistent with the BTFR. This shows that weak lensing is becoming an important external readout for galaxy-scale gravitational response.

P1’s value, therefore, is not that it is the first to discuss RC and GGL together. Its value lies in placing them inside an auditable protocol built from a fixed mapping, a parameter ledger, RC-only→GGL closure, shuffle negative controls, and P1A multi-DM stress tests.

2 | What Does EFT Mean in P1? It Does Not Mean Effective Field Theory

Here, EFT means Energy Filament Theory, not the effective field theory familiar in physics. In the P1 technical report, EFT is used very conservatively: it does not enter as a complete final theory, but is first compressed into an observable, ready-to-fit, falsifiable parameterization of a “mean gravitational response.”

In plain terms, P1 does not try to discuss all microscopic sources of extra gravity, nor does it try to prove the whole EFT framework in one step. It asks a narrower and harder question: if some kind of mean extra gravitational response exists at galaxy scales, can it first explain RC and then transfer to predict GGL?

What Part of EFT Does P1 Test?

P1 tests the “mean gravity floor”: a statistically stable, transferable mean contribution.

P1 does not yet treat the “stochastic / noise floor”: the random terms, object-to-object differences, or extra scatter that may arise from more microscopic fluctuation processes.

P1 also does not discuss the complete microscopic mechanism, abundances, lifetimes, or global cosmological constraints. It is the first step in the P-series experiments, not the final verdict.

3 | The P-Series Plan: Why Start with the “Mean Floor”?

The P series can be understood as EFT’s observational retrieval program. It does not lay every claim on the table at once. Instead, it isolates the easiest piece to test with public data. P1 starts with the mean term: if the mean gravitational response cannot close from RC to GGL, then there is no solid entry point for discussing more complex noise terms or microscopic mechanisms.

Table 1 | Layered Positioning of the P Series

Layer	Question	Location in P1
P1	Can the mean gravitational response close from RC to GGL?	The current report’s main question
P1A	If the DM side is strengthened, does the conclusion remain stable?	Appendix B: DM 7+1 + DM_STD stress test
Future P-Series Work	Can this extend to more data, more probes, and more complex systematics?	Future direction
Deeper Issues	How do the mean term, noise term, and microscopic mechanisms connect?	Outside P1’s conclusion range

4 | What Are the Data? What Do RC and GGL Each Tell Us?

4.1 Rotation Curves (RC): A Speedometer Inside Galactic Disks

Rotation curves record how fast gas and stars orbit at different radii from a galaxy’s center. Higher orbital speeds imply a stronger required centripetal force, and therefore a stronger effective gravitational response. P1 uses the SPARC database; after preprocessing, it includes 104 galaxies, 2,295 velocity data points, and 20 RC bins.

4.2 Weak Lensing (GGL): A Larger-Scale “Gravity Scale”

Galaxy-galaxy weak lensing measures how foreground galaxies slightly bend the light of background galaxies. It corresponds to a larger, halo-scale projected gravitational response and does not depend on the gas-dynamical details of galaxy disks. P1 uses public GGL data from KiDS-1000 / Brouwer et al. (2021): four stellar-mass bins, 15 radial points per bin, for a total of 60 data points, with the full covariance.

4.3 Fixed Mapping: Why 20 RC Bins → 4 GGL Bins Matters

P1 connects 20 RC bins to 4 GGL bins through a fixed rule: each GGL bin corresponds to five RC bins, averaged with galaxy-count weights. This mapping is held fixed for every model. It is a hard constraint for closure testing and fair comparison.

Why Not Tune the Mapping After the Fact?

If one were allowed to choose after the fact which RC bins correspond to which GGL bins, a model could manufacture closure by rearranging the correspondence. P1 locks the 20→4 mapping in advance and deliberately breaks it with a shuffle negative control precisely to test whether the closure signal truly depends on a physically reasonable correspondence.

5 | Models and Methods: What Is P1 Actually Comparing?

5.1 EFT Side: A Low-Dimensional Mean Gravitational Response

On the EFT side, a low-dimensional extra-velocity term describes the mean gravitational response. The shape of the extra term is controlled by a dimensionless kernel function f(r/ℓ), where ℓ is a global scale, while amplitudes are assigned by RC bin. Different kernels encode different initial slopes, transition behavior, and long-range tails, serving as robustness stress tests.

5.2 DM Side: The Main-Text Comparison and Appendix P1A Must Be Read Separately

In the main-text comparison, DM_RAZOR is a minimized, auditable NFW baseline: it fixes the c–M relation and includes no halo-to-halo scatter, adiabatic contraction, feedback core, non-sphericity, or environmental term. The advantage of this design is controlled degrees of freedom and easy reproducibility; its limitation is that it does not represent every LambdaCDM or dark-matter halo model.

Therefore, Appendix B (P1A) turns the DM side into a standardized stress test. Without changing the shared mapping or closure protocol, it progressively adds low-dimensional enhancement branches such as SCAT, AC, FB, HIER_CMSCAT, CORE1P, lensing m, and the combined baseline DM_STD, while retaining EFT_BIN as a comparator. A good way to read P1A is this: it is not comparing EFT only against one minimal DM baseline; it puts a set of common, auditable DM mechanisms under the same closure criterion.

The Precise Conclusion Used Here

Main text: the EFT family significantly outperforms the minimal DM_RAZOR in the main comparison.

Appendix B / P1A: across multiple low-dimensional, auditable DM enhancement branches and the DM_STD stress test, some DM joint fits improve, but the closure strength does not erase EFT_BIN’s advantage.

The safest statement is therefore: within P1/P1A’s data, mapping, parameter ledger, and closure protocol, EFT’s mean gravitational response shows stronger cross-data consistency. This does not amount to excluding all dark-matter models.

5.3 Closure Test: P1’s Most Important Experimental Logic

1. Fit RC only and obtain a set of RC-only posterior samples.

2. Do not retune on GGL; use the RC posterior directly to predict GGL.

3. Use the full covariance to compute the GGL prediction score logL_true under the correct mapping.

4. Randomly permute the RC-bin→GGL-bin correspondence and compute the negative-control score logL_perm.

5. Subtract the two to obtain closure strength: ΔlogL_closure = <logL_true> − <logL_perm>.

Plain-Language Analogy

The closure test is like a cross-exam retake: the model first learns a rule in the RC exam room, then answers in the GGL exam room. If it has learned a shared rule rather than a local trick, it should still do well in the second room; if the exam-room correspondence is deliberately shuffled, the advantage should vanish.

5.4 Before Reading the Technical Tables: Four Entry Points

Table 5.4 | How to Read the Next Set of Landscape Technical Tables

Entry Point	What to Look At	Why It Matters
Table S1a	RC+GGL total joint-fit score	Answers: “Across both data sets, whose overall explanation is stronger?”
Table S1b	Closure strength, shuffle, robustness scans	Answers: “Can what was learned from RC transfer to GGL?”
Table B0	Definitions of multiple DM enhancement branches in P1A	Prevents P1 from being reduced to “only compared with the minimal DM_RAZOR.”
Table B1	P1A closure and joint scoreboard	Checks whether enhanced DM erases the closure advantage.

Layout Note

The next page switches to landscape orientation so the wide tables from the original report can be preserved without deleting columns or compressing them into illegibility. The main text has already given a plain-language reading; the landscape technical tables are for readers who need to verify numbers and model branches.

Figure 0.1 | P1’s Closure-Test Workflow at a Glance

Note: the upper chain is the “closure test” (fit RC only → use the RC posterior to predict GGL); the lower chain is the “joint fit” (score RC+GGL together). The right side compares the true mapping with shuffled mappings to obtain the closure strength ΔlogL.

6 | Key Technical Tables: Main Report Tables and P1A Tables

Table S1a | Main joint-fit comparison metrics (RC+GGL, Strict; retained from the original report)

Model (workspace)	W kernel	k	Joint logL_total (best)	ΔlogL_total vs DM	AICc	BIC
DM_RAZOR	none	20	-16927.763	0.0	33895.885	34010.811
EFT_BIN	none	21	-15590.552	1337.21	31223.501	31344.155
EFT_WEXP	exponential	21	-15668.83	1258.932	31380.057	31500.711
EFT_WYUK	yukawa	21	-15772.936	1154.827	31588.268	31708.922
EFT_WPOW	powerlaw_tail	21	-15633.321	1294.442	31309.038	31429.692

Table S1b | Closure and robustness metrics (Strict; retained from the original report)

Model (workspace)	Closure ΔlogL (true-perm)	ΔlogL after negative-control shuffle	σ_int scan ΔlogL range	R_min scan ΔlogL range	cov-shrink scan ΔlogL range
DM_RAZOR	126.678	22.725	—	—	—
EFT_BIN	231.611	14.984	459–1548	1243–1289	1337–1351
EFT_WEXP	171.977	6.04	408–1471	1169–1207	1259–1277
EFT_WYUK	179.808	14.688	380–1341	1065–1099	1155–1166
EFT_WPOW	280.513	6.672	457–1500	1203–1247	1294–1308

Table B0 | DM enhancement-branch definitions in P1A (retained from Appendix B of the original report)

Workspace	dm_model	New parameters (≤1)	Physical motivation (core)	Implementation rule (audit-friendly)
DM_RAZOR	NFW (fixed c–M, no scatter)	—	Minimal, auditable LambdaCDM halo baseline; used as a strict comparator for EFT	Shared mapping fixed; strict parameter ledger; used as a baseline only for relative comparison
DM_RAZOR_SCAT	NFW + c–M scatter (legacy)	σ_logc	The c–M relation has scatter; approximated with a one-parameter log-normal scatter	≤1 new parameter; still uses the shared mapping; closure gain is the acceptance criterion
DM_RAZOR_AC	NFW + Adiabatic Contraction (legacy)	α_AC	Baryonic infall may induce halo adiabatic contraction; approximated with one strength parameter	≤1 new parameter; mapping unchanged; reports AICc/BIC changes and closure gain
DM_RAZOR_FB	NFW + feedback core (legacy)	log r_core	Feedback can form a core in the inner region; approximated with one core-scale parameter	≤1 new parameter; same closure/negative-control protocol; RC-only improvement is not the sole target
DM_HIER_CMSCAT	Hierarchical c–M scatter + prior	σ_logc (hier)	A more standard hierarchical c_i∼logN(c(M_i), σ_logc); affects the RC and GGL joint posterior simultaneously	Explicit prior; latent c_i marginalized; remains low-dimensional and auditable
DM_CORE1P	1‑parameter core proxy (coreNFW/DC14‑inspired)	log r_core	Uses a one-parameter core proxy for the main baryonic-feedback effect, avoiding high-dimensional star-formation details	References standard literature; ≤1 new parameter; tied to the closure test
DM_RAZOR_M	NFW + lensing shear‑calibration nuisance	m_shear (GGL)	Absorbs a key weak-lensing-side systematic as an effective parameter, reducing the risk of treating systematics as physics	Nuisance is explicitly accounted for; not allowed to feed back into RC; results judged mainly by closure robustness
DM_STD	Standardized DM baseline (HIER_CMSCAT + CORE1P + m)	σ_logc + log r_core (+ m_shear)	Puts three common classes of objections into a still low-dimensional standardized baseline	Reports parameter ledger and information criteria together; closure is the main metric; used as the strongest DM defense comparator

Table B1 | P1A Scoreboard (higher is better; retained from Appendix B of the original report)

Model branch (workspace)	Δk	RC-only best logL_RC (Δ)	Closure strength ΔlogL_closure (Δ)	Joint best logL_total (Δ)
DM_RAZOR	0	-15702.654 (+0.000)	122.205 (+0.000)	-27347.068 (+0.000)
DM_RAZOR_SCAT	1	-15702.294 (+0.361)	121.236 (-0.969)	-23153.311 (+4193.758)
DM_RAZOR_AC	1	-15703.689 (-1.035)	121.531 (-0.674)	-23982.557 (+3364.511)
DM_RAZOR_FB	1	-15496.046 (+206.609)	129.454 (+7.249)	-27478.531 (-131.463)
DM_HIER_CMSCAT	1	-15702.644 (+0.010)	121.978 (-0.227)	-23153.160 (+4193.908)
DM_CORE1P	1	-15723.158 (-20.504)	122.056 (-0.149)	-27336.258 (+10.810)
DM_RAZOR_M	0 (+m)	-15702.654 (+0.000)	122.205 (+0.000)	-27340.451 (+6.617)
DM_STD	2 (+m)	-15832.203 (-129.549)	105.690 (-16.515)	-22984.445 (+4362.623)
EFT_BIN	1	-14631.537 (+1071.117)	204.620 (+82.415)	-19001.142 (+8345.926)

How to Read Table B1 (P1A Scoreboard)

• Δk: added degrees of freedom (larger means a more complex model; more complex does not mean better).

• Focus on two columns: closure strength ΔlogL_closure(Δ) (larger means more transfer self-consistency) and Joint best logL_total(Δ) (total joint-fit score).

• The (Δ) in parentheses is the difference relative to DM_RAZOR, making direct comparison easy.

• The main question this table asks is: if the DM baseline is “reasonably enhanced,” does the closure advantage disappear?

• Reading note: DM_STD improves the joint score substantially, but closure strength actually declines; EFT_BIN still maintains a higher closure strength.

One-sentence summary: within this low-dimensional, auditable range of DM enhancements, improving the joint fit does not automatically produce stronger closure; closure (transferability) remains the key criterion.

7 | How Should the Main Results Be Read?

7.1 Joint Fit: Across Both Data Sets, the EFT Main Comparison Scores Higher

Table S1a and Figure S4 show that, under the same data, the same shared mapping, and nearly the same parameter scale, the EFT family has a joint ΔlogL_total of 1155–1337 relative to DM_RAZOR. For general readers, this means that under one scoring rule combining RC and GGL, the EFT main-comparison models receive a higher total score.

7.2 Closure Test: P1’s Main Emphasis Is Transferability

High closure strength means that a model can infer parameters from RC alone and, without looking at GGL again, predict GGL better. In the P1 report, EFT has ΔlogL_closure = 172–281, while DM_RAZOR has 127. This matters more than saying that “each fit looks fine,” because it restricts the model’s freedom on the second data set.

7.3 Negative Control: Why Is “Signal Collapse” a Good Thing?

After P1 randomly shuffles the RC-bin→GGL-bin grouping correspondence, EFT’s closure signal drops to the 6–23 range. For general readers, this is an anti-cheating step: if the closure advantage came merely from code, units, covariance choices, or fitting accident, then shuffled correspondences might still show an advantage. Instead, the advantage collapses, showing that it depends on the correct mapping.

Figure S3 | Closure strength (higher is better): average log-likelihood advantage for RC-only → GGL prediction.

How to Read This Figure

This figure is the core of P1. The taller the bar, the better the information a model learned from RC transfers to GGL.

The EFT family as a whole stands above DM_RAZOR, indicating stronger cross-probe closure in the “learn RC first, then predict GGL” experiment.

Figure S4 | Joint-fit advantage (higher is better): RC+GGL best logL_total relative to DM_RAZOR.

How to Read This Figure

This figure shows the total score after RC and GGL are combined.

All EFT variants sit well above 0, showing that the EFT advantage in the main comparison is not a local one-point effect but the overall behavior of the joint analysis.

Figure R1 | Negative control: closure signal drops sharply after shuffled grouping.

How to Read This Figure

This figure shows that once the correct RC↔GGL binning relationship is shuffled, the closure signal drops sharply.

This makes the P1 result look more like real consistency in a cross-data mapping, rather than a numerical coincidence obtainable under arbitrary mappings.

8 | Robustness and Controls: How Does P1 Avoid “It Just Fits Better”?

A technical report is most vulnerable to the concern that its advantage may come from a particular noise setting, a central-region data choice, covariance handling, or overfitting. P1 answers that concern with multiple stress tests.

Table 2 | How to Read P1’s Robustness Tests and Negative Controls

Test	Concern It Tries to Rule Out	How to Read It
σ_int scan	If RC contains extra unknown scatter, does the conclusion remain stable?	After loosening RC errors, the EFT ranking and advantage scale remain stable.
R_min scan	If the central galaxy region is not fully trusted, does the conclusion remain stable?	After trimming the central region, EFT still retains a positive advantage.
cov-shrink scan	If the GGL covariance estimate is uncertain, does the conclusion remain stable?	After shrinking the covariance toward a diagonal matrix, the advantage is not sensitive.
Ablation ladder	Is EFT forcing a fit through unnecessary complexity?	The full EFT_BIN is necessary under the information criteria.
LOO held-out prediction	Does the model only explain data it has already seen?	After a GGL bin is held out, the model still shows strong generalization.
RC-bin shuffle	Does closure come from the real mapping?	Closure falls after grouping is shuffled, supporting mapping dependence.

Figure R2 | Range of ΔlogL_total under the σ_int scan (higher is better).

How to Read This Figure

Tests whether EFT’s lead remains after changes to the RC intrinsic-scatter setting.

Figure R3 | Range of ΔlogL_total under the R_min scan (higher is better).

How to Read This Figure

Tests whether EFT’s advantage remains stable after trimming the complex central region.

Figure R4 | Range of ΔlogL_total under the cov-shrink scan (higher is better).

How to Read This Figure

Tests whether the ranking is sensitive to changes in weak-lensing covariance handling.

Figure R5 | Ablation ladder for EFT_BIN (AICc; lower is better).

How to Read This Figure

Tests whether the full EFT_BIN is necessary for explaining the data, rather than merely adding parameters.

Figure R6 | LOO: distribution of log likelihood for held-out bins.

How to Read This Figure

Tests whether the model still predicts well on an unseen GGL bin.

Figure R7 | Negative control: shuffled mapping causes a clear drop in mean logL_true.

How to Read This Figure

Further shows, from the perspective of mean logL_true, that closure depends on the correct cross-data mapping.

9 | P1A: Why the Multiple DM Models in the Appendix Matter

This section does not ask, “Did EFT only beat one minimal DM_RAZOR?” It asks whether the conclusions from closure testing and joint fitting change when the DM baseline is strengthened within a low-dimensional, reproducible, clearly accounted parameter ledger (P1A). In other words, P1A is meant to reduce the objection that the comparison used an overly weak DM baseline and to move the discussion toward whether closure performance still differs under a set of auditable DM enhancements.

P1A does not try to exhaust every possible LambdaCDM halo-modeling option, nor does it turn the DM side into a high-dimensional, unauditable fitting machine. It selects low-dimensional, reproducible, ledger-clear enhancements: concentration scatter, adiabatic contraction, feedback core, hierarchical c–M scatter prior, one-parameter core proxy, weak-lensing shear-calibration nuisance, and the combined DM_STD baseline.

Main Reading of P1A

Among the three legacy branches, only feedback/core brings a small net gain in closure strength; SCAT and AC do not bring a net closure gain.

DM_HIER_CMSCAT, DM_RAZOR_M, and DM_CORE1P have little effect on closure strength or show no significant net gain.

DM_STD can substantially improve joint logL, but closure strength falls, suggesting that it mainly increases joint-fit flexibility rather than RC→GGL transfer-prediction power.

In P1A Table B1, EFT_BIN still maintains higher closure strength and a joint-fit advantage. P1’s core claim therefore should not be simplified to “it only beat the minimal DM_RAZOR.”

Figure B1 | P1A scoreboard: closure and joint ΔlogL relative to baseline (higher is better).

How to Read This Figure

This figure shows how multiple DM enhancement branches perform relative to the baseline.

Its meaning is not “all DM is ruled out.” It shows that, within the low-dimensional, auditable DM enhancement range selected in P1A, enhanced DM does not erase EFT_BIN’s closure advantage.

10 | Why the P1 Experiment Matters

10.1 Methodological Significance: Place Cross-Probe Closure Above Single-Probe Fitting

Galaxy-scale theory often gets stuck in arguments over whether a model can fit one set of rotation curves. P1 raises the bar: can parameters learned from RC predict weak lensing without retuning on GGL? That turns P1 from a fitting contest into a transfer-prediction test.

10.2 Transparency Significance: Make the Reproducible Chain Part of the Result

One important contribution of P1 is that it releases the data, tables and figures, run labels, negative controls, reproducibility package, and audit trail together. That matters to supporters and critics alike: the discussion can return to the same public data, the same mapping, the same scripts, and the same metrics, rather than comparing slogans.

10.3 Physical Significance: A Strong Stress Test for Non-DM Gravity

In non-DM gravity directions, many models can explain part of the rotation-curve or RAR phenomenology. The harder task is to pass weak-lensing readouts at the same time and to show under negative controls that the signal depends on the correct mapping. P1’s significance is that it places EFT’s mean gravitational response into a protocol resembling an external exam: RC is the training field, GGL is the transfer field, and shuffle is the anti-cheating field.

10.4 Is This an Important Experiment for the Non-DM Gravity Field?

Cautiously stated: if P1’s data processing, reproducibility package, and closure protocol hold up under external replication, then it can be considered an RC+GGL closure experiment worth taking seriously within non-DM gravity / modified-gravity research. Its importance does not lie in the phrase “dark matter has been overturned,” but in providing a cross-probe criterion that can be copied, challenged, and extended.

Is There Already a Comparably Strong RC+GGL Prediction-Closure Framework?

Relevant frameworks and observational traditions already exist: MOND/RAR organizes a large body of rotation-curve phenomena very well; KiDS-1000 weak-lensing RAR work has also compared MOND, Verlinde emergent gravity, and LambdaCDM models; LambdaCDM can also explain some weak-lensing/dynamical phenomena through galaxy-halo connections, gas halos, and feedback modeling.

But P1’s precise claim is not “no other framework in the world can explain RC+GGL.” Its claim is that, under P1’s own public protocol—fixed mapping, RC-only→GGL closure, shuffle negative control, parameter ledger, and P1A multi-DM stress tests—EFT reports stronger closure performance.

In other words, the part of P1 most worth external testing is the concrete and reproducible comparison protocol it proposes. Whether MOND/RAR, LambdaCDM/HOD, hydrodynamical simulations, or other modified-gravity frameworks can reach the same or higher closure score under the same protocol is a very valuable next step.

11 | What Can P1 Conclude? What Can It Not Conclude?

Table 3 | Boundaries of P1’s Conclusions

Can Conclude	Under P1’s RC+GGL data, fixed mapping, and main comparison protocol, the EFT family has higher joint-fit and closure strength than the minimal DM_RAZOR.
Can Conclude	Within P1A’s low-dimensional, auditable DM enhancement range, multiple DM enhancements do not erase EFT_BIN’s closure advantage.
Can Conclude	The shuffle negative control shows that the closure signal depends on the correct cross-data mapping, not on arbitrary mappings.
Cannot Conclude	It cannot conclude that P1 has overturned all dark-matter models. P1A still does not exhaust non-sphericity, environmental dependence, complex galaxy-halo connections, high-dimensional feedback, or full cosmological simulations.
Cannot Conclude	It cannot conclude that the full EFT theory has been proven from first principles. P1 tests only the phenomenological layer of mean gravitational response.
Cannot Conclude	It cannot conclude that all systematics have been ruled out. P1 provides robustness evidence only within the listed stress tests and audit range.

12 | Frequently Asked Questions

Q1: Is this saying that “dark matter does not exist”?

No. P1’s conclusion must be limited to the data, protocol, and comparison models used in this article. P1A goes beyond the minimal DM_RAZOR baseline, but it still does not represent every possible dark-matter model.

Q2: Is this saying that “EFT has already been proven”?

Also no. P1 tests EFT as a parameterized mean gravitational response and shows stronger performance in RC→GGL closure; microscopic mechanisms and the full theory are not conclusions of P1.

Q3: Why not state a significance in σ directly?

P1 uses a unified likelihood score, information criteria, and closure differences. ΔlogL is a relative advantage under the same scoring rule; it is not the same as a single σ value.

Q4: Why shuffle RC-bin→GGL-bin?

This is the negative control. A real cross-probe signal should depend on the correct mapping. If the shuffled case remained equally strong, that would instead suggest an implementation bias or a statistical false signal.

Q5: What should P1 do next?

Extend the same protocol to more data, more DM controls, more complex systematics, and more modified-gravity frameworks, especially in ways that allow external teams to replicate the same closure metric.

13 | Mini Glossary

Table 4 | Mini Glossary

Term	One-Sentence Explanation
Rotation curve (RC)	The radius–velocity relation in a galactic disk, used to infer effective gravity within the disk.
Weak lensing (GGL)	Measures the average gravitational/mass distribution around foreground galaxies through the statistical distortion of background-galaxy shapes.
Closure test	Uses the RC posterior to predict GGL and compares it against a shuffled-mapping negative control.
Negative control	Deliberately breaks a key structure to see whether the signal disappears; used to rule out false signals.
NFW halo	A dark-matter halo density profile commonly used in cold-dark-matter models.
c–M relation	The relation between a dark-matter halo’s concentration c and mass M; allowing scatter changes model flexibility.
DM_STD	In P1A, a standardized DM stress-test branch combining multiple low-dimensional DM enhancements and a lensing nuisance.
ΔlogL	Difference in log likelihood between two models under the same scoring rule; positive values indicate the former performs better.
Covariance	A matrix description of correlations among data points; weak-lensing data generally require the full covariance.

14 | Suggested Reading Route and Citation Entry Points

1. First read Sections 0–2 to establish P1’s question and the restrained role assigned to EFT in P1.

2. Then look at Figures S3 and S4 and Tables S1a/S1b to understand closure strength, joint fit, and negative controls.

3. If you are concerned that the DM baseline is too weak, go directly to Section 9 and Table B1 / Figure B1.

4. For technical replication, return to the P1 technical report v1.1, the Tables & Figures Supplement, and full_fit_runpack.

Main Archive Entry Points

P1 technical report (publication-grade, Concept DOI): 10.5281/zenodo.18526334

P1 full reproducibility package (Concept DOI): 10.5281/zenodo.18526286

EFT structured knowledge base (optional, Concept DOI): 10.5281/zenodo.18853200

License note: the technical report uses CC BY-NC-ND 4.0; the full reproducibility package uses CC BY 4.0 (subject to the technical report and Zenodo archive records).

15 | References and External Background

McGaugh, S. S., Lelli, F., & Schombert, J. M. (2016). The Radial Acceleration Relation in Rotationally Supported Galaxies. Physical Review Letters, 117, 201101. DOI: 10.1103/PhysRevLett.117.201101.

Famaey, B., & McGaugh, S. S. (2012). Modified Newtonian Dynamics (MOND): Observational Phenomenology and Relativistic Extensions. Living Reviews in Relativity, 15, 10. DOI: 10.12942/lrr-2012-10.

Brouwer, M. M., Oman, K. A., Valentijn, E. A., et al. (2021). The weak lensing radial acceleration relation: Constraining modified gravity and cold dark matter theories with KiDS-1000. Astronomy & Astrophysics, 650, A113. DOI: 10.1051/0004-6361/202040108.

Mistele, T., McGaugh, S., Lelli, F., Schombert, J., & Li, P. (2024). Indefinitely Flat Circular Velocities and the Baryonic Tully-Fisher Relation from Weak Lensing. The Astrophysical Journal Letters, 969, L3 / arXiv:2406.09685.

Bullock, J. S., & Boylan-Kolchin, M. (2017). Small-Scale Challenges to the LambdaCDM Paradigm. Annual Review of Astronomy and Astrophysics, 55, 343–387. DOI: 10.1146/annurev-astro-091916-055313.

Lelli, F., McGaugh, S. S., & Schombert, J. M. (2016). SPARC: Mass Models for 175 Disk Galaxies with Spitzer Photometry and Accurate Rotation Curves. The Astronomical Journal, 152, 157. DOI: 10.3847/0004-6256/152/6/157.

Navarro, J. F., Frenk, C. S., & White, S. D. M. (1997). A Universal Density Profile from Hierarchical Clustering. Astrophysical Journal, 490, 493.

Dutton, A. A., & Macciò, A. V. (2014). Cold dark matter haloes in the Planck era: evolution of structural parameters for NFW haloes. Monthly Notices of the Royal Astronomical Society, 441, 3359–3374.

0 | Understand P1 in Five Minutes: What Is This Study Actually Testing?

1 | Why P1 Was Needed: Where Galaxy-Scale Cosmology Gets Stuck

2 | What Does EFT Mean in P1? It Does Not Mean Effective Field Theory

3 | The P-Series Plan: Why Start with the “Mean Floor”?

4 | What Are the Data? What Do RC and GGL Each Tell Us?

4.1 Rotation Curves (RC): A Speedometer Inside Galactic Disks

4.2 Weak Lensing (GGL): A Larger-Scale “Gravity Scale”

4.3 Fixed Mapping: Why 20 RC Bins → 4 GGL Bins Matters

5 | Models and Methods: What Is P1 Actually Comparing?

5.1 EFT Side: A Low-Dimensional Mean Gravitational Response

5.2 DM Side: The Main-Text Comparison and Appendix P1A Must Be Read Separately

5.3 Closure Test: P1’s Most Important Experimental Logic

5.4 Before Reading the Technical Tables: Four Entry Points

6 | Key Technical Tables: Main Report Tables and P1A Tables

122.205 (+0.000)

121.236 (-0.969)

121.531 (-0.674)

129.454 (+7.249)

121.978 (-0.227)

122.056 (-0.149)

122.205 (+0.000)

105.690 (-16.515)

204.620 (+82.415)

7 | How Should the Main Results Be Read?

7.1 Joint Fit: Across Both Data Sets, the EFT Main Comparison Scores Higher

7.2 Closure Test: P1’s Main Emphasis Is Transferability

7.3 Negative Control: Why Is “Signal Collapse” a Good Thing?

8 | Robustness and Controls: How Does P1 Avoid “It Just Fits Better”?

9 | P1A: Why the Multiple DM Models in the Appendix Matter

10 | Why the P1 Experiment Matters

10.1 Methodological Significance: Place Cross-Probe Closure Above Single-Probe Fitting

10.2 Transparency Significance: Make the Reproducible Chain Part of the Result

10.3 Physical Significance: A Strong Stress Test for Non-DM Gravity

10.4 Is This an Important Experiment for the Non-DM Gravity Field?

11 | What Can P1 Conclude? What Can It Not Conclude?

12 | Frequently Asked Questions

13 | Mini Glossary

14 | Suggested Reading Route and Citation Entry Points

15 | References and External Background