BC PSP HPS Data

This note outlines how to assemble publicly available horizontal point sampling datasets from the BC Forest Analysis and Inventory Branch (FAIB) compilations. The goal is to obtain a clean, reproducible subset that mirrors the BAF 12 HPS workflow used in the Vegetation Resource Inventory (VRI).

Source

FTP: ftp://ftp.for.gov.bc.ca/HTS/external/!publish/ground_plot_compilations/
- psp/ – Provincial Vegetation Resource Inventory permanent sample plots.
- non_psp/ – Related compilations for non‑PSP programmes.
Metadata: PSP_data_dictionary_20250514.xlsx, non_PSP_data_dictionary_20250514.xlsx (download and store checksums alongside scripts).

Relevant Tables

File	Purpose	Key Fields
`faib_plot_header.csv`	Plot descriptors (one per plot/visit)	`CLSTR_ID`, `VISIT_NUMBER`, `PLOT`, `SITE_IDENTIFIER`
`faib_sample_byvisit.csv`	Plot visit metadata	`CLSTR_ID`, `VISIT_NUMBER`, `FIRST_MSMT`, `MEAS_DT`, `SAMP_TYP`
`faib_tree_detail.csv`	Per-tree measurements (large; chunked download)	`CLSTR_ID`, `VISIT_NUMBER`, `PLOT`, `DBH`, `LV_D`, `TREE_NO`, `SP0`

Additional summary tables (faib_compiled_*) provide aggregated basal area and heights but are optional for the initial HPS tally pipeline.

Extraction Recipe

Mirror metadata: save the data dictionaries and record SHA256 hashes in data/external/psp/CHECKSUMS.
Filter plots: load faib_plot_header.csv and retain rows that correspond to the desired PSP visit(s). The compilations do not store the BAF explicitly, so the workflow records the assumed value (BAF 12) alongside each plot.
Join visit context: merge faib_sample_byvisit on (CLSTR_ID, VISIT_NUMBER) to identify active measurement cycles (e.g., FIRST_MSMT == "Y" for baseline PSP visits).
Build tallies: stream faib_tree_detail.csv with pandas.read_csv(..., chunksize=...) selecting the columns above; filter to plots discovered in step 2, keep live trees (STATUS_CD == "L"), and bin DBH to centimetre midpoints. Output per plot:
- dbh_cm bin centre,
- tally counts,
- baf (12),
- optional species/stratum attributes for future use. Store under data/examples/hps_baf12/<plot_id>.csv.
Document lineage: create data/examples/hps_baf12/README.md summarising the selection criteria, transformation script, and citation requirements.

Command-line helper

Use scripts/prepare_hps_dataset.py to automate the recipe above. The script downloads (or reuses cached) PSP CSVs, filters to first-measurement BAF 12 plots, and writes per-plot tallies plus a manifest, following the data preparation steps documented in the EarthArXiv preprint by Paradis (2025).

python scripts/prepare_hps_dataset.py \
  --output-dir data/examples/hps_baf12 \
  --cache-dir data/external/psp/raw \
  --baf 12 \
  --max-plots 25

Key options:

--include-all-visits: keep every measurement instead of first-measurement plots.
--sample-type F: restrict to specific SAMP_TYP codes if required.
--status L --status I: define which tree status codes count as “live”.
--dry-run: inspect how many plots would be produced without writing files.

DataLad shortcut

If you prefer to mirror the manuscript dataset directly, the CLI exposes a helper that prints the DataLad commands required to install the reference data:

nemora fetch-reference-data --dry-run

Run with --no-dry-run (and a working DataLad installation) to automatically install the dataset. If DataLad is not present:

From a source checkout, use pip install -e ".[data]" to pull in the optional extra.
From PyPI, use pip install --upgrade "nemora[data]" (which installs datalad[full]).

The command also attempts to enable the arbutus-s3 sibling by default. Pass --enable-remote "" to skip, or another sibling name if your configuration differs.

Installing with DataLad

pip install "nemora[data]"
nemora fetch-reference-data --no-dry-run
# if the remote requires enabling manually:
cd reference-data
datalad siblings
datalad siblings --name arbutus-s3 --action enable
datalad get -r .

The dataset is a standard git-annex repository. The top-level tree contains examples/data artifacts used by the parity notebooks (e.g. reference_hps/binned_meta_plots.csv, the meta-plot table referenced below).

You can point the notebooks (or scripted workflows) directly at the files under reference-data/ once they are present locally.

Sample bundle

The repository ships a small bundle generated with:

PYTHONPATH=src python scripts/prepare_hps_dataset.py \
  --output-dir examples/hps_baf12 \
  --manifest examples/hps_baf12_manifest.csv \
  --cache-dir data/external/psp/raw \
  --baf 12 \
  --max-plots 5

Outputs:

Tallies: examples/hps_baf12/*.csv
Manifest: examples/hps_baf12_manifest.csv
Raw downloads cached (gitignored) under data/external/psp/raw.

Worked censored workflow

The censored/two-stage regression in tests/test_censored_workflow.py loads the binned_meta_plots.csv file shipped with the DataLad dataset (or the copy committed in examples/). Reuse that test as a template for exploratory analysis:

import pandas as pd

from nemora.workflows.censoring import fit_censored_inventory

full_meta = pd.read_csv("examples/data/reference_hps/binned_meta_plots.csv")
censored = (
    full_meta[full_meta["dbh_cm"] >= 20.0]
    .groupby("dbh_cm", as_index=False)
    .agg({"tally": "sum", "expansion_factor": "mean"})
)

dbh = censored["dbh_cm"].to_numpy()
stand_table = censored["tally"].to_numpy() * censored["expansion_factor"].to_numpy()

results = fit_censored_inventory(dbh, stand_table, support=(20.0, float("inf")))

The resulting FitResult objects expose the same GOF metrics and residual summaries used in the PSP examples. Combine them with the reporting pattern described in the programmatic HPS analysis guide or the parity notebook to regenerate the manuscript figures.

Automation Status

[x] Scripted pipeline (scripts/prepare_hps_dataset.py) with caching and binning controls.
[x] Pytest fixtures covering selection + tally logic (tests/fixtures/hps).
[x] PSP sample bundle committed under examples/hps_baf12 with manifest and provenance notes.
[x] Regression guard for the reference Weibull fit (tests/test_hps_parity.py).
[x] Censored meta-plot fixture + regression (tests/fixtures/hps/meta_censored.csv, tests/test_censored_workflow.py). .. todo:: Update this section once the nemora.ingest / sampling / synthesis modules land to reflect the broader workflow.