Appendix B — Data and Grid Formats


I. Scope and Object Types


II. Domains, Coordinates, and Measures


III. Regular Grids (Raster/Voxel)

  1. Geometry fields
    • origin = (x0, y0, z0); spacing = (Delta_x, Delta_y, Delta_z); shape = (Nx, Ny, Nz); axis_order = ("x","y","z").
    • cell_centered = true|false (if true, physical coordinates are x_i + 0.5 * Delta_x, etc.).
  2. Voxel volume and conservation
    • V_cell = ( Delta_x * Delta_y * Delta_z ) (omit missing factors for lower-dimensional grids).
    • Discrete total: mass_preserve = ( ∑ rho[i,j,k] * V_cell ), which must agree with the continuous form S92-2: M = ( ∫ rho dV ) (see Chapter 2).
  3. Edge vs. center consistency
    Optional edge arrays edges_x[0..Nx], edges_y[0..Ny], edges_z[0..Nz]. If provided, these define Delta and V_cell.

IV. Unstructured Grids (Mesh/Unstructured)

  1. Topology and geometry
    • nodes : float64[NumNodes, dim]; cells : int32[NumCells, k]; cell_type : enum{tri, quad, tet, hex, poly}.
    • A per-cell Jacobian J_cell supports coordinate transforms and conservation: p_Y(y) = p_X( x(y) ) * | det( ∂x/∂y ) | (see S92-15, Chapter 9).
  2. Cell measures and integration
    dV_cell[c] = ( ∫_{cell c} 1 dV ); discrete integration ( ∫_D rho dV ) ≈ ( ∑_c rho_c * dV_cell[c] ).
  3. Sampling location
    sample_location = {"cell","node"}; choose consistently with the discrete operators (e.g., ∇•J) used (see Chapter 2).

V. Time and Frequency Axes

  1. Time axis
    • t in seconds; Delta_t is mandatory. If arrival-time calibration is applied, record T_arr_form = {"constant-pulled","path-wise"} and
      delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) | (global convention).
    • Asynchrony/dropouts are indicated via time_mask[n] or valid_time_intervals. Time base aligns with Core.Threads Chapter 3.
  2. Frequency axis and spectral objects
    • Store S_xx(f) as a one-sided PSD on f ∈ [0, fs/2]; record fs, window, U_w = ( 1 / N ) * ∑ w[n]^2, ENBW_Hz = fs * ( ∑ w[n]^2 ) / ( ∑ w[n] )^2, and effective degrees of freedom nu (see Chapter 6).
    • Energy check: var(x) ≈ ( ∫_0^{fs/2} S_xx(f) df ), and report the deviation via Mx-95.

VI. Probability Density and Histogram Containers

  1. Histogram density
    • Fields: edges, counts, N, Delta, normalize=true|false.
    • Estimator: p_hat = count / ( N * Delta ) (S92-10). If storing p_hat, persist Delta or edges.
  2. Kernel density estimation
    • Fields: kde_values, grid, K, h, rule, CV(h); reference S92-5 (estimator) and S92-6 (error) in metadata.
    • For multi-D KDE, the grid is a tensor mesh or sampled locations; specify measures dx, dy, … so that ( ∫ p(x) dx ) = 1 holds.

VII. Spatial/Spatio-Temporal Intensity and Event Tables

  1. Event table (Poisson/Hawkes)
    • Columns: x,y,z,t,weight,channel,id; optional meta for provenance/confidence.
    • Intensity containers: lambda_grid (regular) or lambda_mesh (unstructured), with units aligned to lambda(x,t) (see Chapter 5).
  2. Expected counts over regions
    Represent region A via polygon or voxel mask; publish Lambda(A) = ( ∫_A lambda dV ) (S92-7).

VIII. Normalization, Calibration, and Uncertainty Metadata

  1. Normalization
    • scale, shift, mu_x, sigma_x, z_spec = {"none","zscore","minmax","custom"}. If normalized, retain the mapping between unit(x)/dim(x) and the transformed variable.
    • For variable transforms, reference mapping and jacobian (see change_of_variables; S92-15, Chapter 9).
  2. Arrival time and medium
    c_ref, n_eff_model, T_arr_form, delta_form; used for cross-domain time alignment (Chapter 9; Core.Sea Chapter 8).
  3. Uncertainty
    u (standard), u_c, k, cov, CRLB; Fisher information I_F(theta) and its numerical evaluation should reference I90-7 (see Chapter 10).

IX. File Formats and Mapping Recommendations

  1. Foundational scientific containers
    • HDF5/NetCDF4: organize /grid, /mesh, /signals, /spectra, /pdf, /events, /meta. Use zlib|zstd|lz4 compression and axis-aligned chunking chunks = (Nz, Ny, Nx).
    • Zarr: hierarchy consistent with HDF5; cloud-friendly; store metadata in zattrs.
  2. Geospatial rasters
    GeoTIFF: set GTModelType, GTRasterType, GeoTransform, EPSG; prefer AREA_OR_POINT=Area to match cell-centered convention.
  3. Events and tabular
    • Parquet: columnar events; Arrow schemas carry unit/dimension annotations.
    • CSV: for small examples only; must ship a sibling *.meta.json sidecar for units and measures.
  4. Lightweight arrays
    NPY/NPZ: single objects or small batches; store metadata as *.meta.json or a meta key inside NPZ.

X. Encoding, Compression, and Numeric Precision

  1. Numeric types
    • Default scalar fields: float32 (use float64 for very large ranges or extrapolated boolean masks). Counts/indices: int32.
    • Endianness fixed to little-endian. Complex spectra as complex64/128 or real/imag twin channels; note this in metadata.
  2. Compression and chunking
    Suggest zstd level=3 or lz4hc level=5; target 0.5–4 MB per chunk; prefer time-axis chunking to optimize streaming.
  3. Quantization and scaling
    If integer quantization is used, persist scale_int, offset_int, and dtype_raw, with read-back as x = scale_int * (raw - offset_int).

XI. Missing Values, Masks, and Quality Flags

  1. Missingness
    • nodata_value, valid_min, valid_max, mask (boolean, same shape); separate freq_mask for spectral domains.
    • qflag is a bit-field: e.g., bit0=saturation, bit1=interpolated, bit2=extrapolated.
  2. Sampling windows and anti-aliasing
    Record Delta_t and spatial spacings Delta_x, Delta_y, Delta_z, plus anti-alias filter parameters under meta/sampling (align with Chapter 7 and Appendix C).

XII. Multi-Resolution Pyramids


XIII. Interface Bindings (I90) and Object Naming

  1. Bindings
    • I90-1: define_measure reference stored at /meta/measure.
    • I90-2/4: kde_build, hist_density outputs go to /pdf, carrying K, h, CV(h), edges.
    • I90-3: intensity_estimate, hawkes_fit outputs go to /intensity and /events.
    • I90-6: spectral_density outputs go to /spectra, with ENBW_Hz and U_w.
    • I90-7: fisher_information, crlb outputs go to /uncertainty.
  2. Naming
    Scalar fields: rho, p, lambda, S_xx; vector/tensor fields use component suffixes: J_x,J_y,J_z or Sigma_xx,Sigma_xy,….

XIV. Minimum Sidecar Metadata (Suggested Keys)


XV. Consistency and Pre-Publication Self-Checks (Mx-Rules)


XVI. Versioning and Change Log

This appendix codifies the standardized constraints for density-volume data and grid formats. Implementations must adhere to these rules and remain consistent with Appendix A’s symbols/units and chapter references (S92-, Mx-9, I90-*).