Sampling Utilities (Draft)

The nemora.sampling module provides helpers for converting registered PDFs into CDFs, drawing random variates, sampling fitted mixtures, and bootstrapping stand tables from nemora.fit results.

Convert a PDF to a CDF

import numpy as np

from nemora.sampling import pdf_to_cdf

cdf = pdf_to_cdf("weibull", {"a": 2.5, "beta": 12.0, "s": 1.0}, method="numeric")
x = np.linspace(0.0, 40.0, 100)
y = cdf(x)

When a distribution exposes an analytic CDF, pdf_to_cdf(..., method="analytic") delegates to it; otherwise the helper falls back to numeric integration.

Configuring numeric integration

pdf_to_cdf accepts a SamplingConfig so you can control grid density, integration backend, and quadrature tolerances:

from nemora.sampling import SamplingConfig, pdf_to_cdf

cfg = SamplingConfig(grid_points=2048, integration_method="quad", quad_rel_tol=1e-7)
cdf = pdf_to_cdf(
    "gamma",
    {"beta": 4.0, "p": 3.0, "s": 1.0},
    method="numeric",
    config=cfg,
)

The default uses a trapezoid grid; switching to "quad" delegates to scipy.integrate.quad with the tolerances above. You can also set integration_method="simpson" to integrate via Simpson’s rule. Set cache_numeric_cdf=True if you plan to evaluate the same numeric CDF repeatedly—Nemora caches the computed grid/integral pair in-memory so sampling calls avoid rerunning the integrator.

Sample from a distribution

from nemora.sampling import sample_distribution

draws = sample_distribution("gamma", {"beta": 4.0, "p": 3.0, "s": 1.0}, size=500)

Distributions with closed-form inverse CDFs (Weibull, exponential, Pareto, uniform, lognormal) use analytic inversion internally for improved accuracy. Logistic/Fisk currently fall back to the numeric pathway described above; this is documented in notes/sampling_inverse_matrix.md and will only change once synthesis requires a closed-form helper.

Sample from a mixture fit

from nemora.fit import MixtureComponentFit, MixtureFitResult
from nemora.sampling import sample_mixture_fit

components = [
    MixtureComponentFit(name="gamma", weight=0.6, parameters={"beta": 3.0, "p": 2.0}),
    MixtureComponentFit(name="gamma", weight=0.4, parameters={"beta": 8.0, "p": 5.0}),
]
mixture = MixtureFitResult(
    distribution="mixture",
    components=components,
    log_likelihood=-100.0,
    iterations=10,
    converged=True,
)
draws = sample_mixture_fit(mixture, size=1000, random_state=np.random.default_rng(42))

Pass a numpy.random.Generator (or integer seed) via random_state to obtain reproducible mixture draws.

Use the optional lower/upper parameters to truncate the draws (rejection sampling ensures all values fall inside the interval), and weight_overrides=[...] when you need to re-weight components dynamically (e.g., mixture-of-experts routing):

draws = sample_mixture_fit(
    mixture,
    size=250,
    random_state=123,
    lower=5.0,
    upper=25.0,
    weight_overrides=[1.0, 0.0],  # ignore the second component for this scenario
)

Invalid overrides (negative weights, wrong lengths, or zero totals) raise ValueError so calling code can fall back gracefully.

Bootstrap a fitted inventory

import numpy as np
from nemora.core import FitResult
from nemora.sampling import BootstrapResult, bootstrap_inventory

fit = FitResult(distribution="gamma", parameters={"beta": 5.0, "p": 2.5, "s": 1.0})
bins = np.array([10.0, 20.0, 30.0])
tallies = np.array([5, 3, 2], dtype=float)
result: BootstrapResult = bootstrap_inventory(
    fit,
    bins,
    tallies,
    resamples=5,
    sample_size=25,
    return_result=True,
)
samples = result.samples
stacked = result.stacked()

Passing return_result=True yields a BootstrapResult containing the sampled arrays and metadata (distribution, parameters, bins, tallies, RNG seed).

.. warning:: These APIs are experimental. Expect refinements (additional configuration, performance tuning) as we integrate them with downstream modules.

Bootstrap result metadata

bootstrap_inventory(..., return_result=True) returns a BootstrapResult containing:

  • samples: list of (bin, draw) arrays

  • distribution / parameters: metadata from the fit

  • bins, tallies, resamples, sample_size, rng_seed

  • Convenience helpers: stacked() to concatenate samples and to_dataframe() to return a DataFrame with resample, bin, and draw columns.

Use the metadata when passing bootstrap outputs into synthesis or the simulation module.

Using bootstrap outputs downstream

  • Synthesis (nemora.synthesis) expects paired (bin, draw) arrays plus the originating fit metadata so stem or stand generators can report provenance. Use nemora.synthesis.helpers.bootstrap_payload(result) to obtain both the stacked array and a pandas.DataFrame with metadata attached in frame.attrs["nemora_bootstrap"]. Prefer the CLI helper (nemora sampling-describe-bootstrap <stand-table.csv>) when you want to preview metadata from an existing stand table or emit JSON for automation.

  • Simulation (nemora.simulation) can persist the entire BootstrapResult (including RNG seed) to regenerate uncertainty studies or re-run Monte Carlo workflows deterministically.

  • DBH helpers use nemora.sampling.helpers.bootstrap_dbh_vectors(...) (or the CLI shown below) to produce per-stand DBH arrays + metadata suitable for synthesis/simulation. See docs/examples/faib_manifest_parquet.md for an end-to-end manifest walkthrough.

  • Ingest + benchmarking notebooks should write the stacked output to Parquet so future steps can slice by distribution, BAF, or inventory metadata without re-running the bootstrap sampling step.

Downstream modules should rely on the metadata provided here rather than reconstructing provenance manually.

Export DBH vectors for synthesis

Turn a BootstrapResult into per-resample DBH vectors (and an optional long-form table) via the helper:

from nemora.sampling import bootstrap_dbh_vectors

payload = bootstrap_dbh_vectors(result, stand_id="psp-stand-001")
print(payload.dbh_vectors[0][:5])
payload.frame.head()

The helper preserves metadata (distribution, parameters, bins, tallies, RNG seed) and annotates each row with the stand identifier plus the original tally-derived weights. When you prefer a no-code path, the CLI mirrors this workflow:

nemora sampling-export-bootstrap-dbh tests/fixtures/hps_psp_stand_table.csv \
    --stand-id psp-stand-001 \
    --output tmp/bootstrap_dbh.json \
    --table-output tmp/bootstrap_dbh.parquet \
    --resamples 3 \
    --sample-size 25 \
    --seed 2025

The JSON file captures per-resample DBH arrays and metadata, while the table export (Parquet or CSV) stores every (resample, bin, dbh) row for downstream analysis.

Sampling directly from ingest-created manifests

After generating manifests (CSV + Parquet by default) via nemora faib-manifest, select an entry, fit a distribution, and draw samples while tuning the numeric integration settings. Use --no-parquet if you prefer CSV-only outputs. docs/examples/faib_manifest_parquet.md contains a complete example that:

  1. Loads the Parquet manifest and resolves the stand-table path.

  2. Wraps the DBH bins/tallies in an InventorySpec.

  3. Fits a distribution via fit_inventory.

  4. Calls sample_distribution(..., config=SamplingConfig(...)) to test trapezoid/Simpson/quad integration modes, grid densities, and cache settings.

Use this pattern when validating ingest outputs or when you need to benchmark how numeric integration tolerances impact downstream sampling accuracy.