Chapter 6 Data Validation & Quality Gates


I. Chapter Purpose & Scope

specifications in pipelines: rule types, sampling & significance, blocking vs. warning levels, exception handling, auditing & exports; ensure alignment with Σ_in/Σ_out contracts, splits/coverage, metrology, and citation anchors.DQ gates and data validationFix

II. Terminology & Dependencies


III. Fields & Structure (Normative)

stage:

name: "schema.check|dq.scan|leakage.audit"

type: "validate.schema|validate.dq|validate.leakage"

impl: "I16-2.schema_check|I16-7.dq_scan|I16-8.leakage_audit"

inputs: ["<upstream_artifact>"]

outputs: ["<clean_rows>|<dq_report>|<leakage_report>"]

schema_ref: "contracts/<name>@vX.Y"

dq:

sample: {rows: 50000, strategy: "head|random|stratified"}

significance: {alpha: 0.05}

gates:

- {id:"DQ_001", kind:"not_null", cols:["id","ts"], level:"block"}

- {id:"DQ_002", kind:"unique", cols:[["id","ts"]], level:"block"}

- {id:"DQ_003", kind:"range", col:"value", rule:"[0,1e6]", unit:"<SI>", level:"block"}

- {id:"DQ_004", kind:"enum", col:"status", values:["ok","warn","err"], level:"block"}

- {id:"DQ_005", kind:"distribution", col:"latency_ms", rule:"p99<=200", level:"warn"}

- {id:"DQ_006", kind:"freshness", col:"updated_at", max_lag:"PT30M", level:"warn"}

- {id:"DQ_007", kind:"drift", col:"feature_*", metric:"psi<=0.2", level:"warn"}

- {id:"DQ_008", kind:"leakage", policy:["per-object","per-timewindow"], level:"block"}

on_fail: "quarantine|skip|block"

retries: {max: 2, backoff: "expo"}

timeout_s: 1800


IV. Rule Types & Decision Posture


V. Sampling, Significance & Severity


VI. Exception Handling & Audit Exports


VII. Metrology & Units (SI)

  1. Perf/time metrics: QPS (1/s), T_inf (ms with {p50,p95,p99}), ρ (unitless); bandwidth net_mbps, volume size_bytes.
  2. metrology:{units:"SI", check_dim:true} is mandatory; range/unit/distribution rules must pass SI checks.
  3. For path quantities (e.g., T_arr), register in the rule or stage config: delta_form, path="gamma(ell)", measure="d ell", and validate via one of:
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ).

VIII. Machine-Readable Fragment (Drop-in)

layers:

- name: "validate"

stages:

- name: "dq.scan"

type: "validate.dq"

impl: "I16-7.dq_scan"

inputs: ["clean_rows"]

outputs: ["dq_report"]

schema_ref: "contracts/clean_rows@v1.3"

dq:

sample: {rows: 100000, strategy: "stratified"}

significance: {alpha: 0.05}

gates:

- {id:"DQ_001", kind:"not_null", cols:["id","ts"], level:"block"}

- {id:"DQ_003", kind:"range", col:"power_w", rule:"[0,2e3]", unit:"W", level:"block"}

- {id:"DQ_005", kind:"distribution", col:"latency_ms", rule:"p99<=150", level:"warn"}

- {id:"DQ_007", kind:"drift", col:"feature_*", metric:"psi<=0.2", level:"warn"}

- {id:"DQ_008", kind:"leakage", policy:["per-object","per-timewindow"], level:"block"}

on_fail: "quarantine"

retries: {max: 2, backoff: "expo"}

timeout_s: 1800


IX. Lint Rules (Excerpt, Normative)

lint_rules:

- id: DQ.SCHEMA_REF_REQUIRED

when: "$.layers[*].stages[?(@.type=='validate.dq')]"

assert: "has_key('schema_ref')"

level: error

- id: DQ.SAMPLE_DEFINED

when: "$.layers[*].stages[?(@.type=='validate.dq')].dq.sample"

assert: "value.rows > 0 and value.strategy in ['head','random','stratified']"

level: error

- id: DQ.LEVEL_ALLOWED

when: "$.layers[*].stages[*].dq.gates[*].level"

assert: "value in ['block','warn']"

level: error

- id: DQ.RANGE_UNIT_SI

when: "$.layers[*].stages[*].dq.gates[?(@.kind=='range')]"

assert: "is_SI_unit($.unit)"

level: error

- id: DQ.DRIFT_THRESHOLDS

when: "$.layers[*].stages[*].dq.gates[?(@.kind=='drift')]"

assert: "psi_threshold_ok($.metric)"

level: warn

- id: DQ.LEAKAGE_POLICY

when: "$.layers[*].stages[*].dq.gates[?(@.kind=='leakage')]"

assert: "contains_any(['per-object','per-timewindow','per-scene'])"

level: error


X. Export Manifest & Reports

export_manifest:

version: "v1.0"

artifacts:

- {path:"dq/report.jsonl", sha256:"..."}

- {path:"dq/summary.csv", sha256:"..."}

- {path:"dq/leakage_report.csv",sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.DatasetCards v1.0:Ch.12"


XI. Chapter Compliance Checklist