Chapter 8 Feature Pipelines & Reuse


I. Chapter Purpose & Scope

specifications: feature extraction/aggregation/alignment, dictionary & embedding management, materialization & caching, cross-task/multi-modal reuse, versioning & dependency mapping; ensure consistency with data contracts, Model Card feature space & task I/O, the Metrology chapter, and citation anchors.feature pipelineFix

II. Terminology & Dependencies


III. Fields & Structure (Normative)

stage:

name: "<feat.map|feat.aggregate|feat.join|feat.encode|feat.embed|feat.materialize>"

type: "feature.<op>"

impl: "I16-4.<impl_id>"

inputs: ["<Σ_in>"]

outputs: ["<Σ_out>"]

params:

key: ["<entity_id>", "<ts?>"]

point_in_time:

enabled: true

lookback: "PT7D|P30D|N/A"

tolerance: "PT5M"

dict_ref: "dicts/<name>@vX.Y"

embed:

store: "faiss|annoy|milvus|custom"

dim: 768

metric: "cosine|l2"

index_ref: "embeddings/<name>@vX.Y"

aggregate:

window: "PT1H|P1D"

funcs: ["mean","max","count","std"]

fillna: {"method":"pad|zero|drop"}

join:

on: ["<entity_id>","<ts?>"]

how: "left|inner|asof"

materialize:

mode: "none|cache|persist"

cache: {ttl: "P7D", max_gb: 128}

idempotent: true

schema_ref: "contracts/feat_<name>@vX.Y"

feature_space:

type: "<tabular|sequence|image|audio_spec|embedding>"

shape: "<(…)>"

dtype: "<float32|int32|...>"

normalization: "<zscore|minmax|robust|unit-norm|none>"


IV. Feature Operators & Postures


V. Reuse & Dependency Mapping


VI. Consistency & Point-in-Time (PIT) Alignment


VII. Dictionary & Embedding Management


VIII. Metrology & Units (SI)

  1. Performance: QPS (1/s), T_inf (ms {p50,p95,p99}), ρ (—); bandwidth net_mbps; storage/index volume size_bytes.
  2. metrology:{units:"SI", check_dim:true} is mandatory; normalize units first before composition/aggregation.
  3. For path-quantity features (e.g., T_arr), register delta_form, path="gamma(ell)", measure="d ell", use one of the equivalences below, and pass check_dim:
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ).

IX. Machine-Readable Fragment (Drop-in)

layers:

- name: "feature"

stages:

- name: "feat.map.stats"

type: "feature.map"

impl: "I16-4.feature_map"

inputs: ["std_rows"]

outputs: ["feat_rows"]

params:

key: ["entity_id","ts"]

point_in_time: {enabled:true, lookback:"P30D", tolerance:"PT5M"}

aggregate: {window:"P1D", funcs:["mean","std","count"], fillna:{method:"pad"}}

idempotent: true

schema_ref: "contracts/feat_stats@v1.1"

feature_space: {type:"tabular", shape:"(N,D)", dtype:"float32", normalization:"zscore"}

- name: "feat.encode.cat"

type: "feature.encode"

impl: "I16-4.encode"

inputs: ["feat_rows"]

outputs: ["feat_enc"]

params:

dict_ref: "dicts/category_voc@v2.0"

encode: {vocab_ref:"dicts/category_voc@v2.0", unk:"<UNK>", pad:"<PAD>"}

idempotent: true

schema_ref: "contracts/feat_enc@v1.0"

- name: "feat.materialize"

type: "feature.materialize"

impl: "I16-4.materialize"

inputs: ["feat_enc"]

outputs: ["feat_pkg"]

params:

materialize: {mode:"cache", cache:{ttl:"P7D", max_gb:256}}

idempotent: true

schema_ref: "contracts/feat_pkg@v1.0"


X. Lint Rules (Excerpt, Normative)

lint_rules:

- id: FEAT.FS_REQUIRED

when: "$.layers[*].stages[?(@.type^='feature.')]"

assert: "has_key('feature_space')"

level: error

- id: FEAT.DICT_VERSIONED

when: "$.layers[*].stages[?(@.type=='feature.encode')].params.dict_ref"

assert: "matches('^dicts/[a-z0-9_\\-]+@v\\d+\\.\\d+$')"

level: error

- id: FEAT.PIT_PARAMS

when: "$.layers[*].stages[*].params.point_in_time"

assert: "value.enabled == true -> (has_key('lookback') and has_key('tolerance'))"

level: error

- id: FEAT.MATERIALIZE_POLICY

when: "$.layers[*].stages[?(@.type=='feature.materialize')].params.materialize"

assert: "value.mode in ['none','cache','persist']"

level: error

- id: FEAT.UNITS_CHECKDIM

when: "$.pipeline.metrology"

assert: "units == 'SI' and check_dim == true"

level: error

- id: FEAT.LEAKAGE_GUARDS_FOR_TRAIN_EXPORT

when: "$.layers[*].stages[*].outputs"

assert: "produces_train_eval(outputs) -> has_leakage_guards()"

level: error


XI. Export Manifest & Audit

export_manifest:

version: "v1.0"

artifacts:

- {path:"features/feat_view.yaml", sha256:"..."}

- {path:"features/dict_category_v2.hash", sha256:"..."}

- {path:"features/feat_pkg.manifest.json", sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.ModelCards v1.0:Ch.6"

- "EFT.WP.Data.ModelCards v1.0:Ch.9"


XII. Chapter Compliance Checklist