Sampling Utilities (Draft)
The nemora.sampling module provides helpers for converting registered PDFs
into CDFs, drawing random variates, sampling fitted mixtures, and bootstrapping
stand tables from nemora.fit results.
Convert a PDF to a CDF
import numpy as np
from nemora.sampling import pdf_to_cdf
cdf = pdf_to_cdf("weibull", {"a": 2.5, "beta": 12.0, "s": 1.0}, method="numeric")
x = np.linspace(0.0, 40.0, 100)
y = cdf(x)
When a distribution exposes an analytic CDF, pdf_to_cdf(..., method="analytic")
delegates to it; otherwise the helper falls back to numeric integration.
Configuring numeric integration
pdf_to_cdf accepts a SamplingConfig so you can control grid density,
integration backend, and quadrature tolerances:
from nemora.sampling import SamplingConfig, pdf_to_cdf
cfg = SamplingConfig(grid_points=2048, integration_method="quad", quad_rel_tol=1e-7)
cdf = pdf_to_cdf(
"gamma",
{"beta": 4.0, "p": 3.0, "s": 1.0},
method="numeric",
config=cfg,
)
The default uses a trapezoid grid; switching to "quad" delegates to
scipy.integrate.quad with the tolerances above. You can also set
integration_method="simpson" to integrate via Simpson’s rule.
Set cache_numeric_cdf=True if you plan to evaluate the same numeric CDF repeatedly—Nemora caches
the computed grid/integral pair in-memory so sampling calls avoid rerunning the integrator.
Sample from a distribution
from nemora.sampling import sample_distribution
draws = sample_distribution("gamma", {"beta": 4.0, "p": 3.0, "s": 1.0}, size=500)
Distributions with closed-form inverse CDFs (Weibull, exponential, Pareto,
uniform, lognormal) use analytic inversion internally for improved accuracy.
Logistic/Fisk currently fall back to the numeric pathway described above; this is documented in
notes/sampling_inverse_matrix.md and will only change once synthesis requires a closed-form helper.
Sample from a mixture fit
from nemora.fit import MixtureComponentFit, MixtureFitResult
from nemora.sampling import sample_mixture_fit
components = [
MixtureComponentFit(name="gamma", weight=0.6, parameters={"beta": 3.0, "p": 2.0}),
MixtureComponentFit(name="gamma", weight=0.4, parameters={"beta": 8.0, "p": 5.0}),
]
mixture = MixtureFitResult(
distribution="mixture",
components=components,
log_likelihood=-100.0,
iterations=10,
converged=True,
)
draws = sample_mixture_fit(mixture, size=1000, random_state=np.random.default_rng(42))
Pass a numpy.random.Generator (or integer seed) via random_state to obtain reproducible mixture draws.
Use the optional lower/upper parameters to truncate the draws (rejection sampling ensures all values fall inside the interval), and weight_overrides=[...] when you need to re-weight components dynamically (e.g., mixture-of-experts routing):
draws = sample_mixture_fit(
mixture,
size=250,
random_state=123,
lower=5.0,
upper=25.0,
weight_overrides=[1.0, 0.0], # ignore the second component for this scenario
)
Invalid overrides (negative weights, wrong lengths, or zero totals) raise ValueError so calling code can fall back gracefully.
Bootstrap a fitted inventory
import numpy as np
from nemora.core import FitResult
from nemora.sampling import BootstrapResult, bootstrap_inventory
fit = FitResult(distribution="gamma", parameters={"beta": 5.0, "p": 2.5, "s": 1.0})
bins = np.array([10.0, 20.0, 30.0])
tallies = np.array([5, 3, 2], dtype=float)
result: BootstrapResult = bootstrap_inventory(
fit,
bins,
tallies,
resamples=5,
sample_size=25,
return_result=True,
)
samples = result.samples
stacked = result.stacked()
Passing return_result=True yields a BootstrapResult containing the sampled
arrays and metadata (distribution, parameters, bins, tallies, RNG seed).
.. warning:: These APIs are experimental. Expect refinements (additional configuration, performance tuning) as we integrate them with downstream modules.
Bootstrap result metadata
bootstrap_inventory(..., return_result=True) returns a BootstrapResult
containing:
samples: list of(bin, draw)arraysdistribution/parameters: metadata from the fitbins,tallies,resamples,sample_size,rng_seedConvenience helpers:
stacked()to concatenate samples andto_dataframe()to return a DataFrame withresample,bin, anddrawcolumns.
Use the metadata when passing bootstrap outputs into synthesis or the simulation module.
Using bootstrap outputs downstream
Synthesis (
nemora.synthesis) expects paired(bin, draw)arrays plus the originating fit metadata so stem or stand generators can report provenance. Usenemora.synthesis.helpers.bootstrap_payload(result)to obtain both the stacked array and apandas.DataFramewith metadata attached inframe.attrs["nemora_bootstrap"]. Prefer the CLI helper (nemora sampling-describe-bootstrap <stand-table.csv>) when you want to preview metadata from an existing stand table or emit JSON for automation.Simulation (
nemora.simulation) can persist the entireBootstrapResult(including RNG seed) to regenerate uncertainty studies or re-run Monte Carlo workflows deterministically.DBH helpers use
nemora.sampling.helpers.bootstrap_dbh_vectors(...)(or the CLI shown below) to produce per-stand DBH arrays + metadata suitable for synthesis/simulation. Seedocs/examples/faib_manifest_parquet.mdfor an end-to-end manifest walkthrough.Ingest + benchmarking notebooks should write the stacked output to Parquet so future steps can slice by distribution, BAF, or inventory metadata without re-running the bootstrap sampling step.
Downstream modules should rely on the metadata provided here rather than reconstructing provenance manually.
Export DBH vectors for synthesis
Turn a BootstrapResult into per-resample DBH vectors (and an optional long-form table) via the helper:
from nemora.sampling import bootstrap_dbh_vectors
payload = bootstrap_dbh_vectors(result, stand_id="psp-stand-001")
print(payload.dbh_vectors[0][:5])
payload.frame.head()
The helper preserves metadata (distribution, parameters, bins, tallies, RNG seed) and annotates each row with the stand identifier plus the original tally-derived weights. When you prefer a no-code path, the CLI mirrors this workflow:
nemora sampling-export-bootstrap-dbh tests/fixtures/hps_psp_stand_table.csv \
--stand-id psp-stand-001 \
--output tmp/bootstrap_dbh.json \
--table-output tmp/bootstrap_dbh.parquet \
--resamples 3 \
--sample-size 25 \
--seed 2025
The JSON file captures per-resample DBH arrays and metadata, while the table export (Parquet or CSV) stores every (resample, bin, dbh) row for downstream analysis.
Sampling directly from ingest-created manifests
After generating manifests (CSV + Parquet by default) via nemora faib-manifest,
select an entry, fit a distribution, and draw samples while tuning the numeric
integration settings. Use --no-parquet if you prefer CSV-only outputs.
docs/examples/faib_manifest_parquet.md contains a complete example that:
Loads the Parquet manifest and resolves the stand-table path.
Wraps the DBH bins/tallies in an
InventorySpec.Fits a distribution via
fit_inventory.Calls
sample_distribution(..., config=SamplingConfig(...))to test trapezoid/Simpson/quad integration modes, grid densities, and cache settings.
Use this pattern when validating ingest outputs or when you need to benchmark how numeric integration tolerances impact downstream sampling accuracy.