Chapter 11 Versioning, Provenance & Lineage


I. Chapter Purpose & Scope

specifications: version locking for objects and artifacts, hashing and traceability, lineage graphs and replay, change notices and compatibility policy, audit trail and export manifest; ensure consistency with data contracts, Dataset/Model Cards, the Metrology chapter, and citation anchors.lineage, and provenance, versioningFix pipeline

II. Terminology & Dependencies


III. Fields & Structure (Normative)

versioning:

scheme: "semver" # vMAJOR.MINOR.PATCH

stability_line: "v1.*"

compat_mode: "forward|backward|both|break"

notice:

type: "release|correction|withdrawal"

summary: "<text>"

date: "<YYYY-MM-DD>"

provenance:

sources: ["<uri-or-ref>", "..."] # upstream references (reference-only)

transforms: ["<stage-name>@vX.Y", "..."]

environment:

containers: ["<image@digest>", "..."]

deps_lock: "locks/deps.lock.yaml"

seeds: {global: 1701}

lineage:

graph:

nodes:

- {id:"src.s3.pull", kind:"stage", version:"v1.0"}

- {id:"schema.check", kind:"stage", version:"v1.2"}

- {id:"feat.map", kind:"stage", version:"v1.1"}

- {id:"train_pkg", kind:"artifact", digest:"sha256:..."}

edges:

- {from:"src.s3.pull", to:"schema.check"}

- {from:"schema.check", to:"feat.map"}

- {from:"feat.map", to:"train_pkg"}

replay:

enabled: true

inputs_lock: "locks/inputs.manifest.json" # source list + offsets/watermarks

policy: "strict|lenient"

artifacts:

- {path:"pipeline.yaml", sha256:"<hex>"}

- {path:"locks/inputs.manifest.json", sha256:"<hex>"}

- {path:"locks/deps.lock.yaml", sha256:"<hex>"}

- {path:"outputs/train_pkg.tgz", sha256:"<hex>"}


IV. Versioning Strategy & Stability Line


V. Provenance & Reproducibility


VI. Lineage Graph & Replay


VII. Artifact Hashing & Integrity


VIII. Metrology & Units (SI)

  1. Performance & resources: QPS (1/s), T_inf (ms {p50,p95,p99}), ρ (—), net_mbps, size_bytes.
  2. Mandatory: metrology:{units:"SI", check_dim:true}; normalize units first before composition/conversion.
  3. Path quantities: if lineage covers arrival-time/correction chains, register delta_form, path="gamma(ell)", measure="d ell"; use:
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), or
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
      and pass check_dim.

IX. Machine-Readable Fragment (Drop-in)

versioning:

scheme: "semver"

stability_line: "v1.*"

compat_mode: "both"

notice: {type:"release", summary:"initial stable", date:"2025-09-21"}

provenance:

sources: ["s3://eift-data/raw/2025/09/", "contracts/raw_rows@v1.2"]

transforms: ["schema.check@v1.2", "feat.map@v1.1"]

environment:

containers: ["ghcr.io/eift/pipeline@sha256:abcdef..."]

deps_lock: "locks/deps.lock.yaml"

seeds: {global:1701}

lineage:

graph:

nodes:

- {id:"src.s3.pull", kind:"stage", version:"v1.0"}

- {id:"schema.check", kind:"stage", version:"v1.2"}

- {id:"feat.map", kind:"stage", version:"v1.1"}

- {id:"train_pkg", kind:"artifact", digest:"sha256:1234..."}

edges:

- {from:"src.s3.pull", to:"schema.check"}

- {from:"schema.check", to:"feat.map"}

- {from:"feat.map", to:"train_pkg"}

replay: {enabled:true, inputs_lock:"locks/inputs.manifest.json", policy:"strict"}

artifacts:

- {path:"pipeline.yaml", sha256:"..."}

- {path:"locks/inputs.manifest.json", sha256:"..."}

- {path:"locks/deps.lock.yaml", sha256:"..."}

- {path:"outputs/train_pkg.tgz", sha256:"..."}


X. Lint Rules (Excerpt, Normative)

lint_rules:

- id: VER.SEMVER

when: "$.versioning.scheme"

assert: "value == 'semver' and matches($.pipeline.version, '^v\\d+\\.\\d+(\\.\\d+)?$')"

level: error

- id: VER.COMPAT_ALLOWED

when: "$.versioning.compat_mode"

assert: "value in ['forward','backward','both','break']"

level: error

- id: LIN.GRAPH_CONNECTED

when: "$.lineage.graph"

assert: "graph_is_connected(value) and no_dangling_nodes(value)"

level: error

- id: LIN.REPLAY_INPUTS_LOCK

when: "$.lineage.replay.enabled"

assert: "value == false or has_key($.lineage.replay.inputs_lock)"

level: error

- id: ART.SHA256_REQUIRED

when: "$.artifacts[*]"

assert: "has_key('sha256') and len(value.sha256) > 0"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error


XI. Export Manifest & Audit Trail

export_manifest:

version: "v1.0"

artifacts:

- {path:"pipeline.yaml", sha256:"..."}

- {path:"locks/inputs.manifest.json", sha256:"..."}

- {path:"locks/deps.lock.yaml", sha256:"..."}

- {path:"lineage/graph.json", sha256:"..."}

- {path:"reports/replay.result.json", sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.DatasetCards v1.0:Ch.11"

- "EFT.WP.Data.ModelCards v1.0:Ch.11"


XII. Chapter Compliance Checklist