# FAIB Manifest Parquet Workflow

Nemora emits FAIB manifest summaries as both CSV and Parquet by default. Parquet
provides columnar storage and faster downstream analytics—recommended for
notebook or Spark pipelines. Pass `--no-parquet` if you only need CSV outputs.

## CLI examples

- Fetch PSP extracts, auto-select BAFs, and generate manifests/stats:

  `nemora faib-manifest data/external/faib/manifest_psp --auto-bafs --auto-count 3`

- Reuse cached downloads, limit rows, and emit Parquet alongside CSV (default):

  `nemora faib-manifest examples/faib_manifest --source tests/fixtures/faib --no-fetch --baf 12 --max-rows 200`

- Produce CSV only when downstream tooling cannot read Parquet:

  `nemora faib-manifest examples/faib_manifest --source tests/fixtures/faib --no-fetch --baf 12 --max-rows 200 --no-parquet`

## Loading the Parquet manifest

```python
import pandas as pd

manifest = pd.read_parquet("examples/faib_manifest/faib_manifest.parquet")
print(manifest.head())
```

The Parquet file mirrors the CSV schema (`dataset`, `baf`, `rows`, `path`,
`truncated`). Use `--no-parquet` if you need to skip the columnar output or keep
storage requirements minimal.

## Feed manifest entries into sampling workflows

Once a manifest exists you can select an individual stand table, fit a
distribution, and draw samples while tuning the numeric integration settings:

```python
from pathlib import Path

import numpy as np
import pandas as pd

from nemora.core import InventorySpec
from nemora.fit import fit_inventory
from nemora.sampling import SamplingConfig, sample_distribution

manifest = pd.read_parquet("examples/faib_manifest/faib_manifest.parquet")
stand_csv = Path(manifest.loc[0, "path"])  # CSV path captured in the manifest
stand_table = pd.read_csv(stand_csv)

bins = stand_table["dbh_cm"].to_numpy()
tallies = stand_table["tally"].to_numpy(dtype=float)

inventory = InventorySpec(
    name=stand_csv.stem,
    sampling="hps",
    bins=bins,
    tallies=tallies,
    metadata={"grouped": True},
)

fit = fit_inventory(inventory, ["weibull"], configs={})[0]
config = SamplingConfig(
    grid_points=4096,
    support_multiplier=12.0,
    integration_method="quad",
    quad_abs_tol=1e-9,
    quad_rel_tol=1e-8,
    cache_numeric_cdf=True,
)

draws = sample_distribution(
    fit.distribution,
    fit.parameters,
    size=500,
    random_state=123,
    config=config,
)
print(draws[:5])
```

This script loads the Parquet manifest, pinpoints the original stand-table CSV,
fits a Weibull distribution, and samples DBH draws using a high-resolution CDF
grid. Adjust `SamplingConfig` to benchmark how different grid densities or
integration methods affect accuracy/performance, and swap in
`nemora.sampling.bootstrap_inventory` when you need the richer metadata tracked
by `BootstrapResult`.

## Export bootstrap DBH vectors

After calling `bootstrap_inventory(..., return_result=True)` you can convert the result into per-resample DBH vectors (plus an optional long-form table) using `nemora.sampling.bootstrap_dbh_vectors`. The Typer CLI wraps this workflow so you can export JSON + Parquet artifacts without writing code:

```bash
nemora sampling-export-bootstrap-dbh "$(python - <<'PY'
import pandas as pd
manifest = pd.read_parquet('examples/faib_manifest/faib_manifest.parquet')
print(manifest.loc[0, 'path'])
PY
)" \
  --stand-id faib-demo-001 \
  --output examples/faib_manifest/faib_demo_dbh.json \
  --table-output examples/faib_manifest/faib_demo_dbh.parquet \
  --resamples 3 \
  --sample-size 25
```

The JSON file captures metadata (`distribution`, fitted parameters, bins/tallies, RNG seed) alongside per-resample DBH arrays, while the Parquet export stores every `(resample, bin, dbh)` row plus tally-derived weights. Feed either artifact directly into upcoming synthesis/simulation tooling or archive them with your sampling experiment logs.