Chapter 13 Performance, Cost & Scaling
I. Chapter Purpose & Scope
: profiling for batch/stream/micro-batch, horizontal/vertical scaling and autoscaling, capacity planning tied to SLAs, cost metrology and budget constraints, load testing and profiling methods, exports and audit; ensure consistency with orchestration/scheduling/resources, monitoring, and the Metrology chapter.scaling, and cost, performanceFix pipeline specifications forII. Terminology & Dependencies
- Terms: QPS (throughput), T_inf (per-sample latency), ρ (utilization), p50/p95/p99, H/V scaling (horizontal/vertical), micro-batch, autoscale, capacity planning, cost model, egress, spot/on-demand.
- Dependencies: contracts/exports (Core.DataSpec v1.0); units/perf metrology (Core.Metrology v1.0); orchestration/scheduling/resources (Ch.10); monitoring & observability (Ch.12).
- Math & symbols: wrap inline symbols (e.g., QPS, T_inf, ρ, p99, λ, μ) in backticks; any division/integral/composite operator must use parentheses; if path quantity T_arr appears, register gamma(ell) and d ell; no Chinese in formulas/symbols/definitions.
III. Fields & Structure (Normative)
performance:
workload:
mode: "batch|stream|micro-batch"
batch_size: 1024
parallelism: {workers: 16, threads_per_worker: 2}
targets:
qps: {value: 5000}
latency_ms: {p50: 5, p95: 20, p99: 50}
utilization_rho: {max: 0.75}
profiling:
tools: ["py-spy","perf","jfr","flamegraph"]
sampling_interval_ms: 50
hotspots: ["io","serialization","shuffle","network"]
pressure_test:
stages: ["ingest","transform","feature","export"]
ramp: {from_qps: 1000, to_qps: 8000, step: 500, dwell_s: 120}
saturation_criteria: ["latency_ms.p99>target*1.2","error_rate>0.01","ρ>0.85"]
optimizations:
batch_tuning: {enable: true, size_candidates: [256,512,1024,2048]}
micro_batch: {enable: true, window_ms: 200, max_rows: 50000}
io: {compression: "zstd", level: 3, page_size_kb: 256}
cpu: {pin_core: true, numa_aware: true}
gc: {strategy: "g1|zgc|shenandoah", heap_gb: 16}
scaling:
strategy: "horizontal|vertical|hybrid"
horizontal:
shard_key: "entity_id|time|partition"
rebalance: "consistent-hash|range"
vertical:
sku_ref: "c8m64|a2-highgpu"
max_sku: "c32m256"
autoscale:
enabled: true
metric: "qps|latency_ms.p95|cpu"
target: 0.7
min_replicas: 4
max_replicas: 64
cooldown_s: 120
cost:
model:
compute: {on_demand_usd_per_h: 0.48, spot_discount: 0.6}
storage: {usd_per_gb_mo: 0.023}
egress: {usd_per_gb: 0.09}
budget:
currency: "USD"
monthly_cap: 5000
alert_thresholds: {warn: 0.8, block: 1.0}
mix:
on_demand_ratio: 0.4
spot_ratio: 0.6
reporting:
window: "P30D"
breakdown: ["compute","storage","egress","observability"]
IV. Performance Modeling & Profiling
- Queueing & service: use λ (arrival) and μ (service) to estimate ρ = ( λ / μ ); as ρ→1, latency grows exponentially—scale or throttle early.
- Load testing & saturation: step-ramp (ramp), capture QPS–latency curves and p99 trends; identify bottlenecks (CPU/IO/network/locks/GC) via saturation criteria.
- Hotspot localization: flamegraphs/traces + logs; inspect in order: CPU → memory → IO → network; prioritize serialization, shuffle, and RPC optimizations.
- Batch/micro-batch tuning: balance throughput–latency via batch_size and window_ms; ensure DQ gates and SLAs remain intact.
V. Scaling Strategies & Elasticity
- Horizontal (H): shard by shard_key; choose consistent-hash|range to reduce rebalancing cost; protect hot-keys and fix skew.
- Vertical (V): select higher sku_ref (CPU/Mem/GPU/IOPS); assess marginal gains and avoid NUMA pitfalls.
- Hybrid: choose H→V or V→H under SLA and budget constraints as a multi-objective optimization.
- Autoscaling: trigger on metric (qps/latency_ms.p95/cpu) toward target; set min/max_replicas and cooldown_s for damping.
VI. Cost Metrology & Budget
- Cost model: itemize compute/storage/egress/observability; mix on-demand/spot with recorded discounts and interruption policies.
- Budget enforcement: alert at warn (80%); at block (100%) enforce throttle/no-scale/degrade policies.
- Cost-effectiveness: publish usd_per_kqps, usd_per_mrow unit-cost KPIs for decision support.
VII. Metrology & Units (SI)
- Perf/resources: QPS (1/s), T_inf (ms {p50,p95,p99}), ρ (—), net_mbps, size_bytes.
- Mandatory: metrology:{units:"SI", check_dim:true}; normalize units first before composing/benchmarking; unify units across charts/reports.
- Path quantities: if perf tests involve T_arr-related operators, register delta_form, path="gamma(ell)", measure="d ell", and use one of:
- T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
- T_arr = ( ∫ ( n_eff / c_ref ) d ell ), and pass check_dim.
VIII. Machine-Readable Fragment (Drop-in)
performance:
workload: {mode:"micro-batch", batch_size:2048, parallelism:{workers:32, threads_per_worker:2}}
targets: {qps:{value:9000}, latency_ms:{p50:5,p95:20,p99:40}, utilization_rho:{max:0.75}}
profiling:{tools:["perf","flamegraph"], sampling_interval_ms:50, hotspots:["serialization","network"]}
pressure_test:{stages:["transform","feature","export"], ramp:{from_qps:2000,to_qps:12000,step:1000,dwell_s:120},
saturation_criteria:["latency_ms.p99>48","ρ>0.85"]}
scaling:
strategy: "hybrid"
horizontal: {shard_key:"entity_id", rebalance:"consistent-hash"}
vertical: {sku_ref:"c16m128", max_sku:"c32m256"}
autoscale: {enabled:true, metric:"latency_ms.p95", target:18, min_replicas:8, max_replicas:64, cooldown_s:120}
cost:
model: {compute:{on_demand_usd_per_h:0.52, spot_discount:0.55}, storage:{usd_per_gb_mo:0.023}, egress:{usd_per_gb:0.09}}
budget:{currency:"USD", monthly_cap:8000, alert_thresholds:{warn:0.8, block:1.0}}
mix: {on_demand_ratio:0.5, spot_ratio:0.5}
reporting:{window:"P30D", breakdown:["compute","storage","egress","observability"]}
metrology:{units:"SI", check_dim:true}
IX. Lint Rules (Excerpt, Normative)
lint_rules:
- id: PERF.TARGETS_DEFINED
when: "$.performance.targets"
assert: "has_keys(qps, latency_ms, utilization_rho)"
level: error
- id: PERF.RAMP_VALID
when: "$.performance.pressure_test.ramp"
assert: "value.from_qps > 0 and value.to_qps > value.from_qps and value.step > 0"
level: error
- id: SCALE.AUTOSCALE_BOUNDS
when: "$.scaling.autoscale"
assert: "value.enabled == false or (value.min_replicas >= 1 and value.max_replicas >= value.min_replicas)"
level: error
- id: COST.BUDGET_DEFINED
when: "$.cost.budget"
assert: "has_keys(currency, monthly_cap) and value.monthly_cap > 0"
level: error
- id: METROLOGY.SI_AND_CHECKDIM
when: "$.metrology"
assert: "units == 'SI' and check_dim == true"
level: error
X. Export Manifest & Reports
export_manifest:
version: "v1.0"
artifacts:
- {path:"perf/qps_latency_curve.csv", sha256:"..."}
- {path:"perf/flamegraph.svg", sha256:"..."}
- {path:"scaling/autoscale_history.csv", sha256:"..."}
- {path:"cost/monthly_breakdown.csv", sha256:"..."}
- {path:"capacity/plan.yaml", sha256:"..."}
references:
- "EFT.WP.Core.DataSpec v1.0:EXPORT"
- "EFT.WP.Core.Metrology v1.0:check_dim"
XI. Chapter Compliance Checklist
- Targets & profiles defined: QPS/latency_ms/ρ targets explicit; load test/profile reproducible; hotspots evidenced.
- Scaling strategies & bounds complete: H/V/hybrid and autoscale parameters reasonable; coordinated with Ch.10 scheduling/resources.
- Cost model, budget & alert thresholds clear; unit-cost KPIs and resource utilization traceable.
- SI metrology with check_dim=true; if T_arr appears, delta_form/path/measure registered and validated.
- Export manifest lists perf curves/flamegraphs/auto-scale history/cost breakdown/capacity plan with sha256, satisfying release gates.