Appendix B — Data and Grid Formats
I. Scope and Object Types
- This appendix standardizes containers, grids and coordinates, time/frequency axes, metadata, and calibration fields for the density volume. It covers regular grids, unstructured grids, point-event tables, spectral objects, and probability-density containers.
- Arrays use zero-based indexing. Unless stated otherwise, memory layout is row-major; the base measure is lebegue. All variables, units, and dimensions obey Appendix A.
II. Domains, Coordinates, and Measures
- On a continuous domain D ⊂ R^d with base measure mu, the volume element is dV and the line (path) element is d ell. Write integrals as ( ∫_D • dV ) or, along a path, ( ∫_gamma • d ell ).
- Uniform axes: x_i = x0 + i * Delta_x, y_j = y0 + j * Delta_y, z_k = z0 + k * Delta_z, time t_n = t0 + n * Delta_t, frequency f_m = m * ( fs / N ).
- Nonuniform axes store coordinate arrays (x_coords[i], t_coords[n]); provide quadrature weights w_i to approximate ( ∫ g(x) dx ) ≈ ( ∑ g(x_i) * w_i ).
III. Regular Grids (Raster/Voxel)
- Geometry fields
- origin = (x0, y0, z0); spacing = (Delta_x, Delta_y, Delta_z); shape = (Nx, Ny, Nz); axis_order = ("x","y","z").
- cell_centered = true|false (if true, physical coordinates are x_i + 0.5 * Delta_x, etc.).
- Voxel volume and conservation
- V_cell = ( Delta_x * Delta_y * Delta_z ) (omit missing factors for lower-dimensional grids).
- Discrete total: mass_preserve = ( ∑ rho[i,j,k] * V_cell ), which must agree with the continuous form S92-2: M = ( ∫ rho dV ) (see Chapter 2).
- Edge vs. center consistency
Optional edge arrays edges_x[0..Nx], edges_y[0..Ny], edges_z[0..Nz]. If provided, these define Delta and V_cell.
IV. Unstructured Grids (Mesh/Unstructured)
- Topology and geometry
- nodes : float64[NumNodes, dim]; cells : int32[NumCells, k]; cell_type : enum{tri, quad, tet, hex, poly}.
- A per-cell Jacobian J_cell supports coordinate transforms and conservation: p_Y(y) = p_X( x(y) ) * | det( ∂x/∂y ) | (see S92-15, Chapter 9).
- Cell measures and integration
dV_cell[c] = ( ∫_{cell c} 1 dV ); discrete integration ( ∫_D rho dV ) ≈ ( ∑_c rho_c * dV_cell[c] ). - Sampling location
sample_location = {"cell","node"}; choose consistently with the discrete operators (e.g., ∇•J) used (see Chapter 2).
V. Time and Frequency Axes
- Time axis
- t in seconds; Delta_t is mandatory. If arrival-time calibration is applied, record T_arr_form = {"constant-pulled","path-wise"} and
delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) | (global convention). - Asynchrony/dropouts are indicated via time_mask[n] or valid_time_intervals. Time base aligns with Core.Threads Chapter 3.
- t in seconds; Delta_t is mandatory. If arrival-time calibration is applied, record T_arr_form = {"constant-pulled","path-wise"} and
- Frequency axis and spectral objects
- Store S_xx(f) as a one-sided PSD on f ∈ [0, fs/2]; record fs, window, U_w = ( 1 / N ) * ∑ w[n]^2, ENBW_Hz = fs * ( ∑ w[n]^2 ) / ( ∑ w[n] )^2, and effective degrees of freedom nu (see Chapter 6).
- Energy check: var(x) ≈ ( ∫_0^{fs/2} S_xx(f) df ), and report the deviation via Mx-95.
VI. Probability Density and Histogram Containers
- Histogram density
- Fields: edges, counts, N, Delta, normalize=true|false.
- Estimator: p_hat = count / ( N * Delta ) (S92-10). If storing p_hat, persist Delta or edges.
- Kernel density estimation
- Fields: kde_values, grid, K, h, rule, CV(h); reference S92-5 (estimator) and S92-6 (error) in metadata.
- For multi-D KDE, the grid is a tensor mesh or sampled locations; specify measures dx, dy, … so that ( ∫ p(x) dx ) = 1 holds.
VII. Spatial/Spatio-Temporal Intensity and Event Tables
- Event table (Poisson/Hawkes)
- Columns: x,y,z,t,weight,channel,id; optional meta for provenance/confidence.
- Intensity containers: lambda_grid (regular) or lambda_mesh (unstructured), with units aligned to lambda(x,t) (see Chapter 5).
- Expected counts over regions
Represent region A via polygon or voxel mask; publish Lambda(A) = ( ∫_A lambda dV ) (S92-7).
VIII. Normalization, Calibration, and Uncertainty Metadata
- Normalization
- scale, shift, mu_x, sigma_x, z_spec = {"none","zscore","minmax","custom"}. If normalized, retain the mapping between unit(x)/dim(x) and the transformed variable.
- For variable transforms, reference mapping and jacobian (see change_of_variables; S92-15, Chapter 9).
- Arrival time and medium
c_ref, n_eff_model, T_arr_form, delta_form; used for cross-domain time alignment (Chapter 9; Core.Sea Chapter 8). - Uncertainty
u (standard), u_c, k, cov, CRLB; Fisher information I_F(theta) and its numerical evaluation should reference I90-7 (see Chapter 10).
IX. File Formats and Mapping Recommendations
- Foundational scientific containers
- HDF5/NetCDF4: organize /grid, /mesh, /signals, /spectra, /pdf, /events, /meta. Use zlib|zstd|lz4 compression and axis-aligned chunking chunks = (Nz, Ny, Nx).
- Zarr: hierarchy consistent with HDF5; cloud-friendly; store metadata in zattrs.
- Geospatial rasters
GeoTIFF: set GTModelType, GTRasterType, GeoTransform, EPSG; prefer AREA_OR_POINT=Area to match cell-centered convention. - Events and tabular
- Parquet: columnar events; Arrow schemas carry unit/dimension annotations.
- CSV: for small examples only; must ship a sibling *.meta.json sidecar for units and measures.
- Lightweight arrays
NPY/NPZ: single objects or small batches; store metadata as *.meta.json or a meta key inside NPZ.
X. Encoding, Compression, and Numeric Precision
- Numeric types
- Default scalar fields: float32 (use float64 for very large ranges or extrapolated boolean masks). Counts/indices: int32.
- Endianness fixed to little-endian. Complex spectra as complex64/128 or real/imag twin channels; note this in metadata.
- Compression and chunking
Suggest zstd level=3 or lz4hc level=5; target 0.5–4 MB per chunk; prefer time-axis chunking to optimize streaming. - Quantization and scaling
If integer quantization is used, persist scale_int, offset_int, and dtype_raw, with read-back as x = scale_int * (raw - offset_int).
XI. Missing Values, Masks, and Quality Flags
- Missingness
- nodata_value, valid_min, valid_max, mask (boolean, same shape); separate freq_mask for spectral domains.
- qflag is a bit-field: e.g., bit0=saturation, bit1=interpolated, bit2=extrapolated.
- Sampling windows and anti-aliasing
Record Delta_t and spatial spacings Delta_x, Delta_y, Delta_z, plus anti-alias filter parameters under meta/sampling (align with Chapter 7 and Appendix C).
XII. Multi-Resolution Pyramids
- For large rasters, build pyramid levels level=L with spacing_L = 2^L * spacing_0; store the resampling kernel as resample_K.
- Preserve conservation: downsampling must use sum-preserving aggregation to keep mass_preserve consistent (see S92-11).
XIII. Interface Bindings (I90) and Object Naming
- Bindings
- I90-1: define_measure reference stored at /meta/measure.
- I90-2/4: kde_build, hist_density outputs go to /pdf, carrying K, h, CV(h), edges.
- I90-3: intensity_estimate, hawkes_fit outputs go to /intensity and /events.
- I90-6: spectral_density outputs go to /spectra, with ENBW_Hz and U_w.
- I90-7: fisher_information, crlb outputs go to /uncertainty.
- Naming
Scalar fields: rho, p, lambda, S_xx; vector/tensor fields use component suffixes: J_x,J_y,J_z or Sigma_xx,Sigma_xy,….
XIV. Minimum Sidecar Metadata (Suggested Keys)
- Identity: title, uuid, version="DEN-1.0", created, creator, license, provenance.
- Domain and coordinates: dim, origin, spacing or coords, shape, axis_order, cell_centered, measure="lebegue".
- Units and dimensions: unit(var), dim(var); if normalized: scale, shift, mu_x, sigma_x.
- Sampling and window: fs, window, U_w, ENBW_Hz, nu, one_sided=true.
- Arrival time: c_ref, n_eff_model, T_arr_form, delta_form.
- Estimators: K, h, rule, CV(h); histograms: edges, N, Delta.
- Uncertainty: u, u_c, k, cov, CRLB.
- Grid/mesh: nodes, cells, cell_type, dV_cell.
- Missing and quality: nodata_value, mask, qflag, valid_min/max.
- Storage: dtype, endianness="LE", chunks, compressor.
XV. Consistency and Pre-Publication Self-Checks (Mx-Rules)
- Mx-96: discrete–continuous conservation—mass_preserve vs. ( ∫ rho dV ) within tolerance.
- Mx-95: spectrum–energy check—var(x) vs. ( ∫ S_xx df ), consistent with the window’s ENBW_Hz.
- Mx-98: multi-source alignment—p_Y(y) = p_X(x(y)) * | det( ∂x/∂y ) | applied and recorded.
- KDE reports include K, h, CV(h); histograms include edges/Delta; point processes define the domain for Lambda(A).
- Arrival-time fields publish both forms and delta_form; unit/dimension checks pass check_dim(expr).
XVI. Versioning and Change Log
- Dataset version key CL-DEN-YYYYMMDD-###; any change must update /meta/version and provenance.
- Structural changes (new groups, field renames) must ship migration scripts and compatibility maps.
This appendix codifies the standardized constraints for density-volume data and grid formats. Implementations must adhere to these rules and remain consistent with Appendix A’s symbols/units and chapter references (S92-, Mx-9, I90-*).