45-EFT.WP.Data.Pipeline v1.0 | Chapter 13 Performance, Cost & Scaling

Chapter 13 Performance, Cost & Scaling

I. Chapter Purpose & Scope

: profiling for batch/stream/micro-batch, horizontal/vertical scaling and autoscaling, capacity planning tied to SLAs, cost metrology and budget constraints, load testing and profiling methods, exports and audit; ensure consistency with orchestration/scheduling/resources, monitoring, and the Metrology chapter.scaling, and cost, performanceFix pipeline specifications for

II. Terminology & Dependencies

Terms: QPS (throughput), T_inf (per-sample latency), ρ (utilization), p50/p95/p99, H/V scaling (horizontal/vertical), micro-batch, autoscale, capacity planning, cost model, egress, spot/on-demand.
Dependencies: contracts/exports (Core.DataSpec v1.0); units/perf metrology (Core.Metrology v1.0); orchestration/scheduling/resources (Ch.10); monitoring & observability (Ch.12).
Math & symbols: wrap inline symbols (e.g., QPS, T_inf, ρ, p99, λ, μ) in backticks; any division/integral/composite operator must use parentheses; if path quantity T_arr appears, register gamma(ell) and d ell; no Chinese in formulas/symbols/definitions.

III. Fields & Structure (Normative)

performance:

workload:

mode: "batch|stream|micro-batch"

batch_size: 1024

parallelism: {workers: 16, threads_per_worker: 2}

targets:

qps: {value: 5000}

latency_ms: {p50: 5, p95: 20, p99: 50}

utilization_rho: {max: 0.75}

profiling:

tools: ["py-spy","perf","jfr","flamegraph"]

sampling_interval_ms: 50

hotspots: ["io","serialization","shuffle","network"]

pressure_test:

stages: ["ingest","transform","feature","export"]

ramp: {from_qps: 1000, to_qps: 8000, step: 500, dwell_s: 120}

saturation_criteria: ["latency_ms.p99>target*1.2","error_rate>0.01","ρ>0.85"]

optimizations:

batch_tuning: {enable: true, size_candidates: [256,512,1024,2048]}

micro_batch: {enable: true, window_ms: 200, max_rows: 50000}

io: {compression: "zstd", level: 3, page_size_kb: 256}

cpu: {pin_core: true, numa_aware: true}

gc: {strategy: "g1|zgc|shenandoah", heap_gb: 16}

scaling:

strategy: "horizontal|vertical|hybrid"

horizontal:

shard_key: "entity_id|time|partition"

rebalance: "consistent-hash|range"

vertical:

sku_ref: "c8m64|a2-highgpu"

max_sku: "c32m256"

autoscale:

enabled: true

metric: "qps|latency_ms.p95|cpu"

target: 0.7

min_replicas: 4

max_replicas: 64

cooldown_s: 120

cost:

model:

compute: {on_demand_usd_per_h: 0.48, spot_discount: 0.6}

storage: {usd_per_gb_mo: 0.023}

egress: {usd_per_gb: 0.09}

budget:

currency: "USD"

monthly_cap: 5000

alert_thresholds: {warn: 0.8, block: 1.0}

mix:

on_demand_ratio: 0.4

spot_ratio: 0.6

reporting:

window: "P30D"

breakdown: ["compute","storage","egress","observability"]

IV. Performance Modeling & Profiling

Queueing & service: use λ (arrival) and μ (service) to estimate ρ = ( λ / μ ); as ρ→1, latency grows exponentially—scale or throttle early.
Load testing & saturation: step-ramp (ramp), capture QPS–latency curves and p99 trends; identify bottlenecks (CPU/IO/network/locks/GC) via saturation criteria.
Hotspot localization: flamegraphs/traces + logs; inspect in order: CPU → memory → IO → network; prioritize serialization, shuffle, and RPC optimizations.
Batch/micro-batch tuning: balance throughput–latency via batch_size and window_ms; ensure DQ gates and SLAs remain intact.

V. Scaling Strategies & Elasticity

Horizontal (H): shard by shard_key; choose consistent-hash|range to reduce rebalancing cost; protect hot-keys and fix skew.
Vertical (V): select higher sku_ref (CPU/Mem/GPU/IOPS); assess marginal gains and avoid NUMA pitfalls.
Hybrid: choose H→V or V→H under SLA and budget constraints as a multi-objective optimization.
Autoscaling: trigger on metric (qps/latency_ms.p95/cpu) toward target; set min/max_replicas and cooldown_s for damping.

VI. Cost Metrology & Budget

Cost model: itemize compute/storage/egress/observability; mix on-demand/spot with recorded discounts and interruption policies.
Budget enforcement: alert at warn (80%); at block (100%) enforce throttle/no-scale/degrade policies.
Cost-effectiveness: publish usd_per_kqps, usd_per_mrow unit-cost KPIs for decision support.

VII. Metrology & Units (SI)

Perf/resources: QPS (1/s), T_inf (ms {p50,p95,p99}), ρ (—), net_mbps, size_bytes.
Mandatory: metrology:{units:"SI", check_dim:true}; normalize units first before composing/benchmarking; unify units across charts/reports.
Path quantities: if perf tests involve T_arr-related operators, register delta_form, path="gamma(ell)", measure="d ell", and use one of:
- T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
- T_arr = ( ∫ ( n_eff / c_ref ) d ell ), and pass check_dim.

VIII. Machine-Readable Fragment (Drop-in)

performance:

workload: {mode:"micro-batch", batch_size:2048, parallelism:{workers:32, threads_per_worker:2}}

targets: {qps:{value:9000}, latency_ms:{p50:5,p95:20,p99:40}, utilization_rho:{max:0.75}}

profiling:{tools:["perf","flamegraph"], sampling_interval_ms:50, hotspots:["serialization","network"]}

pressure_test:{stages:["transform","feature","export"], ramp:{from_qps:2000,to_qps:12000,step:1000,dwell_s:120},

saturation_criteria:["latency_ms.p99>48","ρ>0.85"]}

scaling:

strategy: "hybrid"

horizontal: {shard_key:"entity_id", rebalance:"consistent-hash"}

vertical: {sku_ref:"c16m128", max_sku:"c32m256"}

autoscale: {enabled:true, metric:"latency_ms.p95", target:18, min_replicas:8, max_replicas:64, cooldown_s:120}

cost:

model: {compute:{on_demand_usd_per_h:0.52, spot_discount:0.55}, storage:{usd_per_gb_mo:0.023}, egress:{usd_per_gb:0.09}}

budget:{currency:"USD", monthly_cap:8000, alert_thresholds:{warn:0.8, block:1.0}}

mix: {on_demand_ratio:0.5, spot_ratio:0.5}

reporting:{window:"P30D", breakdown:["compute","storage","egress","observability"]}

metrology:{units:"SI", check_dim:true}

IX. Lint Rules (Excerpt, Normative)

lint_rules:

- id: PERF.TARGETS_DEFINED

when: "$.performance.targets"

assert: "has_keys(qps, latency_ms, utilization_rho)"

level: error

- id: PERF.RAMP_VALID

when: "$.performance.pressure_test.ramp"

assert: "value.from_qps > 0 and value.to_qps > value.from_qps and value.step > 0"

level: error

- id: SCALE.AUTOSCALE_BOUNDS

when: "$.scaling.autoscale"

assert: "value.enabled == false or (value.min_replicas >= 1 and value.max_replicas >= value.min_replicas)"

level: error

- id: COST.BUDGET_DEFINED

when: "$.cost.budget"

assert: "has_keys(currency, monthly_cap) and value.monthly_cap > 0"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error

X. Export Manifest & Reports

export_manifest:

version: "v1.0"

artifacts:

- {path:"perf/qps_latency_curve.csv", sha256:"..."}

- {path:"perf/flamegraph.svg", sha256:"..."}

- {path:"scaling/autoscale_history.csv", sha256:"..."}

- {path:"cost/monthly_breakdown.csv", sha256:"..."}

- {path:"capacity/plan.yaml", sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

XI. Chapter Compliance Checklist

Targets & profiles defined: QPS/latency_ms/ρ targets explicit; load test/profile reproducible; hotspots evidenced.
Scaling strategies & bounds complete: H/V/hybrid and autoscale parameters reasonable; coordinated with Ch.10 scheduling/resources.
Cost model, budget & alert thresholds clear; unit-cost KPIs and resource utilization traceable.
SI metrology with check_dim=true; if T_arr appears, delta_form/path/measure registered and validated.
Export manifest lists perf curves/flamegraphs/auto-scale history/cost breakdown/capacity plan with sha256, satisfying release gates.