BC PSP HPS Data
This note outlines how to assemble publicly available horizontal point sampling datasets from the BC Forest Analysis and Inventory Branch (FAIB) compilations. The goal is to obtain a clean, reproducible subset that mirrors the BAF 12 HPS workflow used in the Vegetation Resource Inventory (VRI).
Source
FTP:
ftp://ftp.for.gov.bc.ca/HTS/external/!publish/ground_plot_compilations/psp/– Provincial Vegetation Resource Inventory permanent sample plots.non_psp/– Related compilations for non‑PSP programmes.
Metadata:
PSP_data_dictionary_20250514.xlsx,non_PSP_data_dictionary_20250514.xlsx(download and store checksums alongside scripts).
Relevant Tables
File |
Purpose |
Key Fields |
|---|---|---|
|
Plot descriptors (one per plot/visit) |
|
|
Plot visit metadata |
|
|
Per-tree measurements (large; chunked download) |
|
Additional summary tables (faib_compiled_*) provide aggregated basal area and
heights but are optional for the initial HPS tally pipeline.
Extraction Recipe
Mirror metadata: save the data dictionaries and record SHA256 hashes in
data/external/psp/CHECKSUMS.Filter plots: load
faib_plot_header.csvand retain rows that correspond to the desired PSP visit(s). The compilations do not store the BAF explicitly, so the workflow records the assumed value (BAF 12) alongside each plot.Join visit context: merge
faib_sample_byvisiton(CLSTR_ID, VISIT_NUMBER)to identify active measurement cycles (e.g.,FIRST_MSMT == "Y"for baseline PSP visits).Build tallies: stream
faib_tree_detail.csvwithpandas.read_csv(..., chunksize=...)selecting the columns above; filter to plots discovered in step 2, keep live trees (STATUS_CD == "L"), and bin DBH to centimetre midpoints. Output per plot:dbh_cmbin centre,tallycounts,baf(12),optional species/stratum attributes for future use. Store under
data/examples/hps_baf12/<plot_id>.csv.
Document lineage: create
data/examples/hps_baf12/README.mdsummarising the selection criteria, transformation script, and citation requirements.
Command-line helper
Use scripts/prepare_hps_dataset.py to automate the recipe above. The script
downloads (or reuses cached) PSP CSVs, filters to first-measurement BAF 12 plots,
and writes per-plot tallies plus a manifest, following the data preparation steps
documented in the EarthArXiv preprint by Paradis (2025).
python scripts/prepare_hps_dataset.py \
--output-dir data/examples/hps_baf12 \
--cache-dir data/external/psp/raw \
--baf 12 \
--max-plots 25
Key options:
--include-all-visits: keep every measurement instead of first-measurement plots.--sample-type F: restrict to specificSAMP_TYPcodes if required.--status L --status I: define which tree status codes count as “live”.--dry-run: inspect how many plots would be produced without writing files.
DataLad shortcut
If you prefer to mirror the manuscript dataset directly, the CLI exposes a helper that prints the DataLad commands required to install the reference data:
nemora fetch-reference-data --dry-run
Run with --no-dry-run (and a working DataLad installation) to automatically install the dataset.
If DataLad is not present:
From a source checkout, use
pip install -e ".[data]"to pull in the optional extra.From PyPI, use
pip install --upgrade "nemora[data]"(which installsdatalad[full]).
The command also attempts to enable the arbutus-s3 sibling by default. Pass
--enable-remote "" to skip, or another sibling name if your configuration differs.
Installing with DataLad
pip install "nemora[data]"
nemora fetch-reference-data --no-dry-run
# if the remote requires enabling manually:
cd reference-data
datalad siblings
datalad siblings --name arbutus-s3 --action enable
datalad get -r .
The dataset is a standard git-annex repository. The top-level tree contains examples/data
artifacts used by the parity notebooks (e.g. reference_hps/binned_meta_plots.csv, the meta-plot
table referenced below).
You can point the notebooks (or scripted workflows) directly at the files under reference-data/
once they are present locally.
Sample bundle
The repository ships a small bundle generated with:
PYTHONPATH=src python scripts/prepare_hps_dataset.py \
--output-dir examples/hps_baf12 \
--manifest examples/hps_baf12_manifest.csv \
--cache-dir data/external/psp/raw \
--baf 12 \
--max-plots 5
Outputs:
Tallies:
examples/hps_baf12/*.csvManifest:
examples/hps_baf12_manifest.csvRaw downloads cached (gitignored) under
data/external/psp/raw.
Worked censored workflow
The censored/two-stage regression in tests/test_censored_workflow.py loads the
binned_meta_plots.csv file shipped with the DataLad dataset (or the copy committed in examples/).
Reuse that test as a template for exploratory analysis:
import pandas as pd
from nemora.workflows.censoring import fit_censored_inventory
full_meta = pd.read_csv("examples/data/reference_hps/binned_meta_plots.csv")
censored = (
full_meta[full_meta["dbh_cm"] >= 20.0]
.groupby("dbh_cm", as_index=False)
.agg({"tally": "sum", "expansion_factor": "mean"})
)
dbh = censored["dbh_cm"].to_numpy()
stand_table = censored["tally"].to_numpy() * censored["expansion_factor"].to_numpy()
results = fit_censored_inventory(dbh, stand_table, support=(20.0, float("inf")))
The resulting FitResult objects expose the same GOF metrics and residual summaries used in the PSP
examples. Combine them with the reporting pattern described in the
programmatic HPS analysis guide or the parity notebook to regenerate the manuscript
figures.
Automation Status
[x] Scripted pipeline (
scripts/prepare_hps_dataset.py) with caching and binning controls.[x] Pytest fixtures covering selection + tally logic (
tests/fixtures/hps).[x] PSP sample bundle committed under
examples/hps_baf12with manifest and provenance notes.[x] Regression guard for the reference Weibull fit (
tests/test_hps_parity.py).[x] Censored meta-plot fixture + regression (
tests/fixtures/hps/meta_censored.csv,tests/test_censored_workflow.py). .. todo:: Update this section once the nemora.ingest / sampling / synthesis modules land to reflect the broader workflow.