Chapter 16 Machine-readable Schema & Lint


I. Chapter Purpose & Scope

.no Chinese for pipelines, covering structure/type/regex/dependencies/citation anchors/dimensional checks/idempotency & retries/frozen splits & leakage guardrails/minimal security & compliance checks; artifacts are used for pre-release blocking checks and portal auto-validation. Keys use snake_case; cross-volume citations use “Volume vX.Y:Anchor”; math uses backticks with parentheses and Lint ruleset and normative JSON SchemaProvide the

II. Normative Artifacts (Release-Critical)

artifacts:

- path: "schema/pipeline.schema.json"

- path: "schema/lint_rules.yaml"

- path: "schema/examples/minimal.yaml"

- path: "schema/examples/full.yaml"

These artifacts must be listed in export_manifest.artifacts[] with sha256; citation anchors follow this volume’s posture.

III. Normative JSON Schema (Core Excerpt)

JSON json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://eift.org/schema/pipeline.schema.json",
  "title": "EFT Data Pipeline",
  "type": "object",
  "required": [ "pipeline", "metrology", "export_manifest" ],
  "properties": {
    "pipeline": {
      "type": "object",
      "required": [ "id", "version", "layers", "edges" ],
      "properties": {
        "id": { "type": "string", "pattern": "^[a-z0-9_\\-\\.]+$" },
        "version": { "type": "string", "pattern": "^v\\d+\\.\\d+(\\.\\d+)?$" },
        "layers": { "type": "array", "items": { "type": "object" } },
        "edges": { "type": "array", "items": { "type": "object" } },
        "orchestration": { "type": "object" },
        "scheduling": { "type": "object" },
        "resources": { "type": "object" },
        "monitoring": { "type": "object" }
      }
    },
    "metrology": {
      "type": "object",
      "required": [ "units", "check_dim" ],
      "properties": {
        "units": { "type": "string", "const": "SI" },
        "check_dim": { "type": "boolean", "const": true }
      }
    },
    "export_manifest": {
      "type": "object",
      "required": [ "version", "artifacts", "references" ],
      "properties": {
        "version": { "type": "string" },
        "artifacts": { "type": "array", "items": { "type": "object" } },
        "references": {
          "type": "array",
          "minItems": 1,
          "items": { "type": "string", "pattern": "^[^:]+ v\\d+\\.\\d+:[A-Z].+$" }
        }
      }
    }
  },
  "additionalProperties": false
}
The references[] regex enforces “Volume vX.Y:Anchor”; metrology.units="SI" and check_dim=true are mandatory.

IV. Lint Rules (Normative)

version: "v1.0"

rules:

# Structure & versioning

- id: STRUCT.REQUIRED

when: "$"

assert: "has_keys(pipeline, metrology, export_manifest)"

level: error

- id: VERSION.SEMVER

when: "$.pipeline.version"

assert: "matches('^v\\d+\\.\\d+(\\.\\d+)?$')"

level: error

# Topology & contracts

- id: LAYERS.NOT_EMPTY

when: "$.pipeline.layers"

assert: "len(value) > 0"

level: error

- id: EDGES.COMPAT_SCHEMA

when: "$.pipeline.edges[*]"

assert: "schema_compat(edge.from.Σ_out, edge.to.Σ_in)"

level: error

# Sampling & splits

- id: SPLIT.RATIO_SUM

when: "$..stages[?(@.type=='export.splits')].splits"

assert: "abs(train.ratio + validation.ratio + test.ratio - 1) <= 1e-6"

level: error

- id: SPLIT.FREEZE_REQUIRED

when: "$..stages[?(@.type=='export.splits')].policy.freeze_indices"

assert: "value == true"

level: error

- id: LEAKAGE.GUARDS_PRESENT

when: "$..stages[?(@.type=='export.splits')].policy.leakage_guard"

assert: "contains_any(['per-object','per-timewindow','per-scene'])"

level: error

# Validation & DQ

- id: DQ.SCHEMA_REF_REQUIRED

when: "$..stages[?(@.type=='validate.dq')]"

assert: "has_key('schema_ref')"

level: error

- id: DQ.SAMPLE_DEFINED

when: "$..stages[?(@.type=='validate.dq')].dq.sample"

assert: "value.rows > 0 and value.strategy in ['head','random','stratified']"

level: error

# Transform & feature

- id: TF.IDEMPOTENT_REQUIRED

when: "$..stages[?(@.type^='transform.')]"

assert: "idempotent == true"

level: error

- id: FEAT.FS_REQUIRED

when: "$..stages[?(@.type^='feature.')]"

assert: "has_key('feature_space')"

level: error

# Security & compliance minimal checks

- id: SEC.CREDENTIALS_REF

when: "$..stages[?(@.type^='source.')].params"

assert: "has_key('credentials_ref') and not has_key('plain_secret')"

level: error

- id: PRIV.MINIMIZATION_ON

when: "$.privacy.data_minimization"

assert: "value == true"

level: error

# Metrology

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error

# Citation anchors

- id: REFERENCES.FORMAT

when: "$.export_manifest.references[*]"

assert: "matches('^[^:]+ v\\d+\\.\\d+:[A-Z].+$')"

level: error

Blocking rules include STRUCT.REQUIRED, VERSION.SEMVER, EDGES.COMPAT_SCHEMA, SPLIT.*, TF.IDEMPOTENT_REQUIRED, FEAT.FS_REQUIRED, SEC.CREDENTIALS_REF, METROLOGY.SI_AND_CHECKDIM, REFERENCES.FORMAT.

V. Failure Examples & Diagnostics (Excerpt)

fail_examples:

- case: "bad reference format"

input: {export_manifest:{references:["Core.DataSpec:EXPORT"]}}

expect: {rule:"REFERENCES.FORMAT", level:"error",

fix:"Use 'EFT.WP.Core.DataSpec v1.0:EXPORT'"}

- case: "split ratios sum != 1"

input: {stages:[{type:"export.splits", splits:{train:{ratio:0.7}, validation:{ratio:0.2}, test:{ratio:0.2}}}]}

expect: {rule:"SPLIT.RATIO_SUM", level:"error",

fix:"Normalize ratios so they sum to 1±1e-6"}

- case: "no credentials_ref"

input: {stages:[{type:"source.s3", params:{endpoint:"...", plain_secret:"abc"}}]}

expect: {rule:"SEC.CREDENTIALS_REF", level:"error",

fix:"Remove plaintext secret; reference a secrets manager via credentials_ref"}

Lint outputs must include rule/path/message/fix.

VI. Minimal Working Example (Validates under Schema & Lint)

pipeline:

id: "eift.ingest-validate-transform-export"

version: "v1.0"

layers:

- name: "ingest"

stages:

- name: "src.s3.pull"

type: "source.s3"

impl: "I16-1.s3_pull"

params: {endpoint:"https://s3.amazonaws.com", bucket_or_db:"eift-data",

prefix_or_table:"raw/2025/09/", query_or_pattern:"*.jsonl",

credentials_ref:"secrets://aws/ingest_ro", format:"json"}

outputs: ["raw_blob"]

idempotent: true

retries: {max:3, backoff:"expo", jitter_ms:200}

timeout_s: 1800

- name: "validate"

stages:

- name: "dq.scan"

type: "validate.dq"

impl: "I16-7.dq_scan"

inputs: ["raw_blob"]

outputs: ["dq_report"]

schema_ref: "contracts/raw_json@v1.2"

dq: {sample:{rows:100000, strategy:"stratified"}, significance:{alpha:0.05},

gates:[{id:"DQ_001", kind:"not_null", cols:["id","ts"], level:"block"}]}

edges:

- {from:"src.s3.pull:raw_blob", to:"dq.scan:raw_blob"}

metrology: {units:"SI", check_dim:true}

export_manifest:

version: "v1.0"

artifacts: [{path:"pipeline.yaml", sha256:"..."}]

references: ["EFT.WP.Core.DataSpec v1.0:EXPORT","EFT.WP.Core.Metrology v1.0:check_dim"]


VII. Coupling with Export Manifest (Normative)

export_manifest:

artifacts:

- {path:"schema/pipeline.schema.json", sha256:"..."}

- {path:"schema/lint_rules.yaml", sha256:"..."}

- {path:"schema/examples/minimal.yaml", sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

and must be listed and verifiable; references carry “Volume vX.Y:Anchor”.blockingSchema and Lint are

VIII. Validation Interfaces (Implementation Binding Ixx-?; Unified Return)

def validate_pipeline(spec: dict) -> dict: ...

def lint_pipeline(spec: dict, rules: dict) -> dict: ...

def check_units(spec: dict) -> dict: ... # uses Core.Metrology v1.0:check_dim

def verify_references(spec: dict) -> dict: ...# regex + anchor reachability

Return shape: {"ok": bool, "errors":[...], "warnings":[...], "metrics":{...}} for portal/CI.

IX. Chapter Compliance Checklist