Sampling Utilities (Draft)

The nemora.sampling module provides helpers for converting registered PDFs into CDFs, drawing random variates, sampling fitted mixtures, and bootstrapping stand tables from nemora.fit results.

Convert a PDF to a CDF

import numpy as np

from nemora.sampling import pdf_to_cdf

cdf = pdf_to_cdf("weibull", {"a": 2.5, "beta": 12.0, "s": 1.0}, method="numeric")
x = np.linspace(0.0, 40.0, 100)
y = cdf(x)

When a distribution exposes an analytic CDF, pdf_to_cdf(..., method="analytic") delegates to it; otherwise the helper falls back to numeric integration.

Configuring numeric integration

pdf_to_cdf accepts a SamplingConfig so you can control grid density, integration backend, and quadrature tolerances:

from nemora.sampling import SamplingConfig, pdf_to_cdf

cfg = SamplingConfig(grid_points=2048, integration_method="quad", quad_rel_tol=1e-7)
cdf = pdf_to_cdf(
    "gamma",
    {"beta": 4.0, "p": 3.0, "s": 1.0},
    method="numeric",
    config=cfg,
)

The default uses a trapezoid grid; switching to "quad" delegates to scipy.integrate.quad with the tolerances above. You can also set integration_method="simpson" to integrate via Simpson’s rule. Set cache_numeric_cdf=True if you plan to evaluate the same numeric CDF repeatedly—Nemora caches the computed grid/integral pair in-memory so sampling calls avoid rerunning the integrator.

Sample from a distribution

from nemora.sampling import sample_distribution

draws = sample_distribution("gamma", {"beta": 4.0, "p": 3.0, "s": 1.0}, size=500)

Distributions with closed-form inverse CDFs (Weibull, exponential, Pareto, uniform, lognormal) use analytic inversion internally for improved accuracy. Logistic/Fisk currently fall back to the numeric pathway described above; this is documented in notes/sampling_inverse_matrix.md and will only change once synthesis requires a closed-form helper.

Sample from a mixture fit

from nemora.fit import MixtureComponentFit, MixtureFitResult
from nemora.sampling import sample_mixture_fit

components = [
    MixtureComponentFit(name="gamma", weight=0.6, parameters={"beta": 3.0, "p": 2.0}),
    MixtureComponentFit(name="gamma", weight=0.4, parameters={"beta": 8.0, "p": 5.0}),
]
mixture = MixtureFitResult(
    distribution="mixture",
    components=components,
    log_likelihood=-100.0,
    iterations=10,
    converged=True,
)
draws = sample_mixture_fit(mixture, size=1000, random_state=np.random.default_rng(42))

Pass a numpy.random.Generator (or integer seed) via random_state to obtain reproducible mixture draws.

Use the optional lower/upper parameters to truncate the draws (rejection sampling ensures all values fall inside the interval), and weight_overrides=[...] when you need to re-weight components dynamically (e.g., mixture-of-experts routing):

draws = sample_mixture_fit(
    mixture,
    size=250,
    random_state=123,
    lower=5.0,
    upper=25.0,
    weight_overrides=[1.0, 0.0],  # ignore the second component for this scenario
)

Invalid overrides (negative weights, wrong lengths, or zero totals) raise ValueError so calling code can fall back gracefully.

Bootstrap a fitted inventory

import numpy as np
from nemora.core import FitResult
from nemora.sampling import BootstrapResult, bootstrap_inventory

fit = FitResult(distribution="gamma", parameters={"beta": 5.0, "p": 2.5, "s": 1.0})
bins = np.array([10.0, 20.0, 30.0])
tallies = np.array([5, 3, 2], dtype=float)
result: BootstrapResult = bootstrap_inventory(
    fit,
    bins,
    tallies,
    resamples=5,
    sample_size=25,
    return_result=True,
)
samples = result.samples
stacked = result.stacked()

Passing return_result=True yields a BootstrapResult containing the sampled arrays and metadata (distribution, parameters, bins, tallies, RNG seed).

.. warning:: These APIs are experimental. Expect refinements (additional configuration, performance tuning) as we integrate them with downstream modules.

Bootstrap result metadata

bootstrap_inventory(..., return_result=True) returns a BootstrapResult containing:

samples: list of (bin, draw) arrays
distribution / parameters: metadata from the fit
bins, tallies, resamples, sample_size, rng_seed
Convenience helpers: stacked() to concatenate samples and to_dataframe() to return a DataFrame with resample, bin, and draw columns.

Use the metadata when passing bootstrap outputs into synthesis or the simulation module.

Using bootstrap outputs downstream

Synthesis (nemora.synthesis) expects paired (bin, draw) arrays plus the originating fit metadata so stem or stand generators can report provenance. Use nemora.synthesis.helpers.bootstrap_payload(result) to obtain both the stacked array and a pandas.DataFrame with metadata attached in frame.attrs["nemora_bootstrap"]. Prefer the CLI helper (nemora sampling-describe-bootstrap <stand-table.csv>) when you want to preview metadata from an existing stand table or emit JSON for automation.
Simulation (nemora.simulation) can persist the entire BootstrapResult (including RNG seed) to regenerate uncertainty studies or re-run Monte Carlo workflows deterministically.
DBH helpers use nemora.sampling.helpers.bootstrap_dbh_vectors(...) (or the CLI shown below) to produce per-stand DBH arrays + metadata suitable for synthesis/simulation. See docs/examples/faib_manifest_parquet.md for an end-to-end manifest walkthrough.
Ingest + benchmarking notebooks should write the stacked output to Parquet so future steps can slice by distribution, BAF, or inventory metadata without re-running the bootstrap sampling step.

Downstream modules should rely on the metadata provided here rather than reconstructing provenance manually.

Export DBH vectors for synthesis

Turn a BootstrapResult into per-resample DBH vectors (and an optional long-form table) via the helper:

from nemora.sampling import bootstrap_dbh_vectors

payload = bootstrap_dbh_vectors(result, stand_id="psp-stand-001")
print(payload.dbh_vectors[0][:5])
payload.frame.head()

The helper preserves metadata (distribution, parameters, bins, tallies, RNG seed) and annotates each row with the stand identifier plus the original tally-derived weights. When you prefer a no-code path, the CLI mirrors this workflow:

nemora sampling-export-bootstrap-dbh tests/fixtures/hps_psp_stand_table.csv \
    --stand-id psp-stand-001 \
    --output tmp/bootstrap_dbh.json \
    --table-output tmp/bootstrap_dbh.parquet \
    --resamples 3 \
    --sample-size 25 \
    --seed 2025

The JSON file captures per-resample DBH arrays and metadata, while the table export (Parquet or CSV) stores every (resample, bin, dbh) row for downstream analysis.

Sampling directly from ingest-created manifests

After generating manifests (CSV + Parquet by default) via nemora faib-manifest, select an entry, fit a distribution, and draw samples while tuning the numeric integration settings. Use --no-parquet if you prefer CSV-only outputs. docs/examples/faib_manifest_parquet.md contains a complete example that:

Loads the Parquet manifest and resolves the stand-table path.
Wraps the DBH bins/tallies in an InventorySpec.
Fits a distribution via fit_inventory.
Calls sample_distribution(..., config=SamplingConfig(...)) to test trapezoid/Simpson/quad integration modes, grid densities, and cache settings.

Use this pattern when validating ingest outputs or when you need to benchmark how numeric integration tolerances impact downstream sampling accuracy.