nemora.ingest
Nemora’s ingest package wraps reusable dataset abstractions and pipelines that
convert raw inventory releases (FAIB PSP, FIA Datamart, etc.) into the tidy stand
tables consumed by nemora.fit, nemora.sampling, and downstream tooling.
The key entry points mirror the concepts introduced in the
Ingest Module how-to:
DatasetSource/DatasetFetcherdescribe how to locate and cache raw files.TransformPipelineorchestrates composable DataFrame transformations.Submodules (
faib,fia,hps) provide dataset-specific helpers and CLI-ready pipelines for stand-table and HPS tally generation.
See also
docs/howto/ingest.mdfor step-by-step ingest workflows.nemora.clifor Typer commands such asingest-faib,faib-manifest, andingest-faib-hpsthat wrap these helpers.
Package API
.. py:module:: nemora.ingest
Ingestion/ETL scaffolding for Nemora.
This module defines lightweight abstractions that describe raw inventory
datasets (DatasetSource) and the transformation pipelines (TransformPipeline)
that convert them into the tidy stand tables consumed by other Nemora modules.
Concrete connectors for BC FAIB, FIA, and other inventories will extend these
primitives in upcoming revisions.
.. py:class:: DatasetFetcher(*args, **kwargs) :module: nemora.ingest
Bases: :py:class:~typing.Protocol
Callable that retrieves one or more artifacts for a dataset source.
.. py:class:: DatasetSource(name, description, uri=None, metadata=
Bases: :py:class:object
Describe a raw inventory dataset that can be ingested by Nemora.
:type name: :sphinx_autodoc_typehints_type:\:py\:class\:\str` :param name: Human-readable identifier for the dataset. :type description: :sphinx_autodoc_typehints_type::py:class:`str` :param description: Short summary of the dataset contents (region, sampling design, etc.). :type uri: :sphinx_autodoc_typehints_type::py:class:`str` | :py:obj:`None` :param uri: Optional canonical URI (open data portal link, DataLad URL, etc.). :type metadata: :sphinx_autodoc_typehints_type::py:class:`dict`\ \[:py:class:`str`, :py:data:`~typing.Any`] :param metadata: Arbitrary extra fields (licensing, citation info, cache preferences). :type fetcher: :sphinx_autodoc_typehints_type::py:class:`~nemora.ingest.DatasetFetcher` | :py:obj:`None``
:param fetcher: Optional callable able to retrieve the dataset artifacts when invoked.
.. py:attribute:: DatasetSource.description :module: nemora.ingest :type: str
.. py:method:: DatasetSource.fetch() :module: nemora.ingest
Return artifacts for this dataset via the configured fetcher.
:rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`\~collections.abc.Iterable\`\\ \\\[\:py\:class\:\`\~pathlib.Path\`\]`
.. py:attribute:: DatasetSource.fetcher :module: nemora.ingest :type: ~nemora.ingest.DatasetFetcher | None
.. py:attribute:: DatasetSource.metadata :module: nemora.ingest :type: dict[str, ~typing.Any]
.. py:attribute:: DatasetSource.name :module: nemora.ingest :type: str
.. py:attribute:: DatasetSource.uri :module: nemora.ingest :type: str | None
.. py:class:: TransformPipeline(name, steps=
Bases: :py:class:object
A sequence of callables that transform raw dataframes into Nemora tables.
.. py:method:: TransformPipeline.add_step(step) :module: nemora.ingest
Append a transformation step to the pipeline.
:rtype: :sphinx_autodoc_typehints_type:`\:py\:obj\:\`None\``
.. py:attribute:: TransformPipeline.metadata :module: nemora.ingest :type: dict[str, ~typing.Any]
.. py:attribute:: TransformPipeline.name :module: nemora.ingest :type: str
.. py:method:: TransformPipeline.run(frame) :module: nemora.ingest
Apply every transformation step to the supplied dataframe.
:rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`\~pandas.DataFrame\``
.. py:attribute:: TransformPipeline.steps :module: nemora.ingest :type: list[~collections.abc.Callable[[~pandas.DataFrame], ~pandas.DataFrame]]
Dataset helpers
FAIB (nemora.ingest.faib)
.. py:module:: nemora.ingest.faib
Helpers for working with BC FAIB ground sample datasets.
.. py:class:: DataDictionary(sheets) :module: nemora.ingest.faib
Bases: :py:class:object
Structured representation of FAIB data dictionary entries.
.. py:method:: DataDictionary.get_table_schema(table) :module: nemora.ingest.faib
Return the schema for a specific table.
:rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`\~pandas.DataFrame\``
.. py:attribute:: DataDictionary.sheets :module: nemora.ingest.faib :type: ~collections.abc.Mapping[str, ~pandas.DataFrame]
.. py:class:: FAIBManifestResult(manifest_path, tables, bafs, truncated_flags, downloaded) :module: nemora.ingest.faib
Bases: :py:class:object
Summary of outputs produced by :func:generate_faib_manifest.
.. py:attribute:: FAIBManifestResult.bafs :module: nemora.ingest.faib :type: list[float]
.. py:attribute:: FAIBManifestResult.downloaded :module: nemora.ingest.faib :type: list[~pathlib.Path]
.. py:attribute:: FAIBManifestResult.manifest_path :module: nemora.ingest.faib :type: ~pathlib.Path
.. py:attribute:: FAIBManifestResult.tables :module: nemora.ingest.faib :type: list[~pathlib.Path]
.. py:attribute:: FAIBManifestResult.truncated_flags :module: nemora.ingest.faib :type: dict[~pathlib.Path, bool]
.. py:function:: aggregate_stand_table(tree_detail, plot_info, *, baf, dbh_col=’DBH_CM’, expansion_col=’TREE_EXP’, baf_col=’BAF’, group_keys=(‘CLSTR_ID’, ‘VISIT_NUMBER’, ‘PLOT’)) :module: nemora.ingest.faib
Aggregate tree detail records into a stand table for a given BAF.
:type tree_detail: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame` :param tree_detail: Raw FAIB tree detail records. :type plot_info: :sphinx_autodoc_typehints_type::py:class:`~pandas.DataFrame` :param plot_info: Plot-level records containing BAF metadata (sample-by-visit or plot header). :type baf: :sphinx_autodoc_typehints_type::py:class:`float` :param baf: Target basal area factor to filter (e.g., 12). :type dbh_col: :sphinx_autodoc_typehints_type::py:class:`str` :param dbh_col: Column containing diameter at breast height in centimetres. :type expansion_col: :sphinx_autodoc_typehints_type::py:class:`str` :param expansion_col: Column representing tree expansion weights. :type group_keys: :sphinx_autodoc_typehints_type::py:class:`tuple`\ \[:py:class:`str`, :py:data:`…<Ellipsis>`]`
:param group_keys: Keys used to join tree detail with sample-by-visit metadata.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame``
.. py:function:: auto_select_bafs(root, count=3, *, plot_file=’faib_plot_header.csv’, sample_file=’faib_sample_byvisit.csv’) :module: nemora.ingest.faib
Select representative BAF values from FAIB metadata.
:type root: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path` :param root: Directory containing FAIB CSV extracts. :type count: :sphinx_autodoc_typehints_type::py:class:`int` :param count: Number of representative BAF values to return. :type plot_file: :sphinx_autodoc_typehints_type::py:class:`str` :param plot_file: CSV filenames to inspect for BAF metadata (plot header preferred). :type sample_file: :sphinx_autodoc_typehints_type::py:class:`str``
:param sample_file: CSV filenames to inspect for BAF metadata (plot header preferred).
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\list`\ \[:py:class:`float`]`
.. py:function:: build_faib_dataset_source(dataset=’psp’, *, destination, filenames=None, overwrite=False) :module: nemora.ingest.faib
Create a :class:DatasetSource for FAIB CSV extracts.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.DatasetSource``
.. py:function:: build_faib_stand_table_pipeline(plot_info, *, baf, dbh_col, expansion_col, baf_col, group_keys=(‘CLSTR_ID’, ‘VISIT_NUMBER’, ‘PLOT’)) :module: nemora.ingest.faib
Create a :class:TransformPipeline that aggregates FAIB stand tables.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.TransformPipeline``
.. py:function:: build_stand_table_from_csvs(root, baf, *, tree_file=’faib_tree_detail.csv’, sample_file=’faib_sample_byvisit.csv’, plot_file=None, dbh_col=None, expansion_col=None, baf_col=None) :module: nemora.ingest.faib
Load FAIB CSV extracts from root and build a stand table for baf.
:type root: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path` :param root: Directory containing the FAIB CSV extracts. :type baf: :sphinx_autodoc_typehints_type::py:class:`float` :param baf: Desired basal area factor to filter. :type tree_file: :sphinx_autodoc_typehints_type::py:class:`str` :param tree_file: Filename for the tree detail CSV within ``root``. :type sample_file: :sphinx_autodoc_typehints_type::py:class:`str` | :py:obj:`None``
:param sample_file: Filename for the sample-by-visit CSV within root.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame``
.. py:function:: download_faib_csvs(destination, dataset=’psp’, *, overwrite=False, filenames=None) :module: nemora.ingest.faib
Download FAIB CSV extracts via FTP into destination.
:type destination: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path` :param destination: Directory where files will be written. :type dataset: :sphinx_autodoc_typehints_type::py:class:`str``
:param dataset: Either "psp" or "non_psp".
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\list`\ \[:py:class:`~pathlib.Path`]`
.. py:function:: generate_faib_manifest(destination, *, dataset=’psp’, source=None, fetch=False, overwrite=False, bafs=None, auto_count=None, max_rows=None, write_parquet=True) :module: nemora.ingest.faib
Fetch FAIB extracts, build stand tables, and emit a manifest.
:type destination: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path` :param destination: Directory where the manifest and stand tables will be written. :type dataset: :sphinx_autodoc_typehints_type::py:class:`str` :param dataset: FAIB dataset to process (``"psp"`` or ``"non_psp"``). :type source: :sphinx_autodoc_typehints_type::py:class:`str` | :py:class:`~pathlib.Path` | :py:obj:`None` :param source: Optional directory containing pre-downloaded FAIB CSV files. When omitted, files will be fetched into ``destination / "raw"`` if ``fetch`` is true. :type fetch: :sphinx_autodoc_typehints_type::py:class:`bool` :param fetch: When set, download the FAIB CSV files before building stand tables. :type overwrite: :sphinx_autodoc_typehints_type::py:class:`bool` :param overwrite: Force re-download of CSV files even when they already exist locally. :type bafs: :sphinx_autodoc_typehints_type::py:class:`~collections.abc.Sequence`\ \[:py:class:`float`] | :py:obj:`None` :param bafs: Explicit BAF values to build stand tables for. :type auto_count: :sphinx_autodoc_typehints_type::py:class:`int` | :py:obj:`None` :param auto_count: When provided, automatically select ``auto_count`` representative BAF values instead of using ``bafs``. :type max_rows: :sphinx_autodoc_typehints_type::py:class:`int` | :py:obj:`None` :param max_rows: Optional limit on the number of rows retained in each stand table. :type write_parquet: :sphinx_autodoc_typehints_type::py:class:`bool``
:param write_parquet: When true (default), persist a Parquet copy of the manifest alongside the CSV.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.faib.FAIBManifestResult``
.. py:function:: load_data_dictionary(url) :module: nemora.ingest.faib
Download and parse a FAIB data dictionary XLSX file.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.faib.DataDictionary``
.. py:function:: load_non_psp_dictionary() :module: nemora.ingest.faib
Convenience wrapper for the non-PSP data dictionary.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.faib.DataDictionary``
.. py:function:: load_psp_dictionary() :module: nemora.ingest.faib
Convenience wrapper for the PSP data dictionary.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.faib.DataDictionary``
FIA (nemora.ingest.fia)
.. py:module:: nemora.ingest.fia
Helpers for working with USDA FIA datasets.
.. py:class:: FIATables(tree, condition, plot) :module: nemora.ingest.fia
Bases: :py:class:object
Container for FIA plot/condition/tree tables.
.. py:attribute:: FIATables.condition :module: nemora.ingest.fia :type: ~pandas.DataFrame
.. py:attribute:: FIATables.plot :module: nemora.ingest.fia :type: ~pandas.DataFrame
.. py:attribute:: FIATables.tree :module: nemora.ingest.fia :type: ~pandas.DataFrame
.. py:function:: aggregate_plot_stand_table(tables, *, plot_cn=None, plot_number=None, live_status_codes=(1,), condition_status_codes=(1,), dbh_bin_cm=1.0) :module: nemora.ingest.fia
Aggregate FIA tree records into a stand table summarised by DBH bins.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame``
.. py:function:: build_fia_dataset_source(state, *, destination, tables=(‘TREE’, ‘PLOT’, ‘COND’), overwrite=False) :module: nemora.ingest.fia
Create a :class:DatasetSource for downloading FIA CSV extracts.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.DatasetSource``
.. py:function:: build_stand_table_from_csvs(root, *, plot_cn=None, plot_number=None, tree_file=’TREE.csv’, plot_file=’PLOT.csv’, cond_file=’COND.csv’, live_status_codes=(1,), condition_status_codes=(1,), dbh_bin_cm=1.0) :module: nemora.ingest.fia
Convenience wrapper that loads FIA tables and aggregates a stand table.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame``
.. py:function:: download_fia_tables(destination, state, tables=(‘TREE’, ‘PLOT’, ‘COND’), *, overwrite=False) :module: nemora.ingest.fia
Download FIA CSV extracts for a given state.
:type destination: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path` :param destination: Directory where the CSV files will be written. :type state: :sphinx_autodoc_typehints_type::py:class:`str` :param state: Two-letter state or territory code (e.g., ``HI``, ``OR``). :type tables: :sphinx_autodoc_typehints_type::py:class:`~collections.abc.Sequence`\ \[:py:class:`str`] :param tables: Iterable of table names to download (default: TREE, PLOT, COND). :type overwrite: :sphinx_autodoc_typehints_type::py:class:`bool``
:param overwrite: When True, re-download files even if they already exist locally.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\list`\ \[:py:class:`~pathlib.Path`]`
.. py:function:: load_fia_tables(root, *, tree_file=’TREE.csv’, plot_file=’PLOT.csv’, cond_file=’COND.csv’, columns_tree=None, columns_plot=None, columns_cond=None) :module: nemora.ingest.fia
Load FIA CSV extracts from root and return trimmed dataframes.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.fia.FIATables``
HPS (nemora.ingest.hps)
.. py:module:: nemora.ingest.hps
Pipelines for preparing HPS tallies from FAIB PSP compilations.
.. py:class:: HPSPipelineResult(tallies, manifest, tallies_frame) :module: nemora.ingest.hps
Bases: :py:class:object
Container produced by :func:run_hps_pipeline.
.. py:attribute:: HPSPipelineResult.manifest :module: nemora.ingest.hps :type: ~pandas.DataFrame
.. py:attribute:: HPSPipelineResult.tallies :module: nemora.ingest.hps :type: dict[str, ~pandas.DataFrame]
.. py:attribute:: HPSPipelineResult.tallies_frame :module: nemora.ingest.hps :type: ~pandas.DataFrame
.. py:class:: SelectionCriteria(first_visit_only=True, allowed_sample_types=None, max_plots=None) :module: nemora.ingest.hps :canonical: nemora.dataprep.hps.SelectionCriteria
Bases: :py:class:object
Options that control which plots are retained from the PSP compilations.
.. py:attribute:: SelectionCriteria.allowed_sample_types :module: nemora.ingest.hps :type: tuple[str, …] | None :value: None
.. py:attribute:: SelectionCriteria.first_visit_only :module: nemora.ingest.hps :type: bool :value: True
.. py:attribute:: SelectionCriteria.max_plots :module: nemora.ingest.hps :type: int | None :value: None
.. py:function:: build_hps_pipeline(tree_detail_source, selections, *, dbh_column=’DBH’, status_column=’LV_D’, live_status=(‘L’,), bin_width=1.0, bin_origin=0.0, chunk_size=200000, encoding=’latin1’) :module: nemora.ingest.hps
Return a pipeline that aggregates HPS tallies for selections.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.TransformPipeline``
.. py:function:: export_hps_outputs(tallies, manifest, *, output_dir, manifest_path, quiet=False) :module: nemora.ingest.hps
Write per-plot tallies and accompanying manifest to disk.
:rtype: :sphinx_autodoc_typehints_type:\:py\:obj\:\None``
.. py:function:: load_plot_selections(plot_header_source, sample_byvisit_source, *, baf, criteria=None, encoding=’latin1’) :module: nemora.ingest.hps
Load PSP plot metadata and filter to the subset needed for HPS tallies.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\list`\ \[:py:class:`~nemora.dataprep.hps.PlotSelection`]`
.. py:function:: run_hps_pipeline(tree_detail_source, selections, *, dbh_column=’DBH’, status_column=’LV_D’, live_status=(‘L’,), bin_width=1.0, bin_origin=0.0, chunk_size=200000, encoding=’latin1’) :module: nemora.ingest.hps
Execute the HPS pipeline and return tallies/manifest dataframes.
:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.hps.HPSPipelineResult``