FAIB Manifest Parquet Workflow
Nemora emits FAIB manifest summaries as both CSV and Parquet by default. Parquet
provides columnar storage and faster downstream analytics—recommended for
notebook or Spark pipelines. Pass --no-parquet if you only need CSV outputs.
CLI examples
Fetch PSP extracts, auto-select BAFs, and generate manifests/stats:
nemora faib-manifest data/external/faib/manifest_psp --auto-bafs --auto-count 3Reuse cached downloads, limit rows, and emit Parquet alongside CSV (default):
nemora faib-manifest examples/faib_manifest --source tests/fixtures/faib --no-fetch --baf 12 --max-rows 200Produce CSV only when downstream tooling cannot read Parquet:
nemora faib-manifest examples/faib_manifest --source tests/fixtures/faib --no-fetch --baf 12 --max-rows 200 --no-parquet
Loading the Parquet manifest
import pandas as pd
manifest = pd.read_parquet("examples/faib_manifest/faib_manifest.parquet")
print(manifest.head())
The Parquet file mirrors the CSV schema (dataset, baf, rows, path,
truncated). Use --no-parquet if you need to skip the columnar output or keep
storage requirements minimal.
Feed manifest entries into sampling workflows
Once a manifest exists you can select an individual stand table, fit a distribution, and draw samples while tuning the numeric integration settings:
from pathlib import Path
import numpy as np
import pandas as pd
from nemora.core import InventorySpec
from nemora.fit import fit_inventory
from nemora.sampling import SamplingConfig, sample_distribution
manifest = pd.read_parquet("examples/faib_manifest/faib_manifest.parquet")
stand_csv = Path(manifest.loc[0, "path"]) # CSV path captured in the manifest
stand_table = pd.read_csv(stand_csv)
bins = stand_table["dbh_cm"].to_numpy()
tallies = stand_table["tally"].to_numpy(dtype=float)
inventory = InventorySpec(
name=stand_csv.stem,
sampling="hps",
bins=bins,
tallies=tallies,
metadata={"grouped": True},
)
fit = fit_inventory(inventory, ["weibull"], configs={})[0]
config = SamplingConfig(
grid_points=4096,
support_multiplier=12.0,
integration_method="quad",
quad_abs_tol=1e-9,
quad_rel_tol=1e-8,
cache_numeric_cdf=True,
)
draws = sample_distribution(
fit.distribution,
fit.parameters,
size=500,
random_state=123,
config=config,
)
print(draws[:5])
This script loads the Parquet manifest, pinpoints the original stand-table CSV,
fits a Weibull distribution, and samples DBH draws using a high-resolution CDF
grid. Adjust SamplingConfig to benchmark how different grid densities or
integration methods affect accuracy/performance, and swap in
nemora.sampling.bootstrap_inventory when you need the richer metadata tracked
by BootstrapResult.
Export bootstrap DBH vectors
After calling bootstrap_inventory(..., return_result=True) you can convert the result into per-resample DBH vectors (plus an optional long-form table) using nemora.sampling.bootstrap_dbh_vectors. The Typer CLI wraps this workflow so you can export JSON + Parquet artifacts without writing code:
nemora sampling-export-bootstrap-dbh "$(python - <<'PY'
import pandas as pd
manifest = pd.read_parquet('examples/faib_manifest/faib_manifest.parquet')
print(manifest.loc[0, 'path'])
PY
)" \
--stand-id faib-demo-001 \
--output examples/faib_manifest/faib_demo_dbh.json \
--table-output examples/faib_manifest/faib_demo_dbh.parquet \
--resamples 3 \
--sample-size 25
The JSON file captures metadata (distribution, fitted parameters, bins/tallies, RNG seed) alongside per-resample DBH arrays, while the Parquet export stores every (resample, bin, dbh) row plus tally-derived weights. Feed either artifact directly into upcoming synthesis/simulation tooling or archive them with your sampling experiment logs.