nemora.ingest

Nemora’s ingest package wraps reusable dataset abstractions and pipelines that convert raw inventory releases (FAIB PSP, FIA Datamart, etc.) into the tidy stand tables consumed by nemora.fit, nemora.sampling, and downstream tooling. The key entry points mirror the concepts introduced in the Ingest Module how-to:

  • DatasetSource / DatasetFetcher describe how to locate and cache raw files.

  • TransformPipeline orchestrates composable DataFrame transformations.

  • Submodules (faib, fia, hps) provide dataset-specific helpers and CLI-ready pipelines for stand-table and HPS tally generation.

See also

  • docs/howto/ingest.md for step-by-step ingest workflows.

  • nemora.cli for Typer commands such as ingest-faib, faib-manifest, and ingest-faib-hps that wrap these helpers.

Package API

.. py:module:: nemora.ingest

Ingestion/ETL scaffolding for Nemora.

This module defines lightweight abstractions that describe raw inventory datasets (DatasetSource) and the transformation pipelines (TransformPipeline) that convert them into the tidy stand tables consumed by other Nemora modules. Concrete connectors for BC FAIB, FIA, and other inventories will extend these primitives in upcoming revisions.

.. py:class:: DatasetFetcher(*args, **kwargs) :module: nemora.ingest

Bases: :py:class:~typing.Protocol

Callable that retrieves one or more artifacts for a dataset source.

.. py:class:: DatasetSource(name, description, uri=None, metadata=, fetcher=None) :module: nemora.ingest

Bases: :py:class:object

Describe a raw inventory dataset that can be ingested by Nemora.

:type name: :sphinx_autodoc_typehints_type:\:py\:class\:\str`    :param name: Human-readable identifier for the dataset.    :type description: :sphinx_autodoc_typehints_type::py:class:`str`    :param description: Short summary of the dataset contents (region, sampling design, etc.).    :type uri: :sphinx_autodoc_typehints_type::py:class:`str` | :py:obj:`None`    :param uri: Optional canonical URI (open data portal link, DataLad URL, etc.).    :type metadata: :sphinx_autodoc_typehints_type::py:class:`dict`\ \[:py:class:`str`, :py:data:`~typing.Any`]    :param metadata: Arbitrary extra fields (licensing, citation info, cache preferences).    :type fetcher: :sphinx_autodoc_typehints_type::py:class:`~nemora.ingest.DatasetFetcher` | :py:obj:`None`` :param fetcher: Optional callable able to retrieve the dataset artifacts when invoked.

.. py:attribute:: DatasetSource.description :module: nemora.ingest :type: str

.. py:method:: DatasetSource.fetch() :module: nemora.ingest

  Return artifacts for this dataset via the configured fetcher.


  :rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`\~collections.abc.Iterable\`\\ \\\[\:py\:class\:\`\~pathlib.Path\`\]`

.. py:attribute:: DatasetSource.fetcher :module: nemora.ingest :type: ~nemora.ingest.DatasetFetcher | None

.. py:attribute:: DatasetSource.metadata :module: nemora.ingest :type: dict[str, ~typing.Any]

.. py:attribute:: DatasetSource.name :module: nemora.ingest :type: str

.. py:attribute:: DatasetSource.uri :module: nemora.ingest :type: str | None

.. py:class:: TransformPipeline(name, steps=, metadata=) :module: nemora.ingest

Bases: :py:class:object

A sequence of callables that transform raw dataframes into Nemora tables.

.. py:method:: TransformPipeline.add_step(step) :module: nemora.ingest

  Append a transformation step to the pipeline.


  :rtype: :sphinx_autodoc_typehints_type:`\:py\:obj\:\`None\``

.. py:attribute:: TransformPipeline.metadata :module: nemora.ingest :type: dict[str, ~typing.Any]

.. py:attribute:: TransformPipeline.name :module: nemora.ingest :type: str

.. py:method:: TransformPipeline.run(frame) :module: nemora.ingest

  Apply every transformation step to the supplied dataframe.


  :rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`\~pandas.DataFrame\``

.. py:attribute:: TransformPipeline.steps :module: nemora.ingest :type: list[~collections.abc.Callable[[~pandas.DataFrame], ~pandas.DataFrame]]

Dataset helpers

FAIB (nemora.ingest.faib)

.. py:module:: nemora.ingest.faib

Helpers for working with BC FAIB ground sample datasets.

.. py:class:: DataDictionary(sheets) :module: nemora.ingest.faib

Bases: :py:class:object

Structured representation of FAIB data dictionary entries.

.. py:method:: DataDictionary.get_table_schema(table) :module: nemora.ingest.faib

  Return the schema for a specific table.


  :rtype: :sphinx_autodoc_typehints_type:`\:py\:class\:\`\~pandas.DataFrame\``

.. py:attribute:: DataDictionary.sheets :module: nemora.ingest.faib :type: ~collections.abc.Mapping[str, ~pandas.DataFrame]

.. py:class:: FAIBManifestResult(manifest_path, tables, bafs, truncated_flags, downloaded) :module: nemora.ingest.faib

Bases: :py:class:object

Summary of outputs produced by :func:generate_faib_manifest.

.. py:attribute:: FAIBManifestResult.bafs :module: nemora.ingest.faib :type: list[float]

.. py:attribute:: FAIBManifestResult.downloaded :module: nemora.ingest.faib :type: list[~pathlib.Path]

.. py:attribute:: FAIBManifestResult.manifest_path :module: nemora.ingest.faib :type: ~pathlib.Path

.. py:attribute:: FAIBManifestResult.tables :module: nemora.ingest.faib :type: list[~pathlib.Path]

.. py:attribute:: FAIBManifestResult.truncated_flags :module: nemora.ingest.faib :type: dict[~pathlib.Path, bool]

.. py:function:: aggregate_stand_table(tree_detail, plot_info, *, baf, dbh_col=’DBH_CM’, expansion_col=’TREE_EXP’, baf_col=’BAF’, group_keys=(‘CLSTR_ID’, ‘VISIT_NUMBER’, ‘PLOT’)) :module: nemora.ingest.faib

Aggregate tree detail records into a stand table for a given BAF.

:type tree_detail: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame`    :param tree_detail: Raw FAIB tree detail records.    :type plot_info: :sphinx_autodoc_typehints_type::py:class:`~pandas.DataFrame`    :param plot_info: Plot-level records containing BAF metadata (sample-by-visit or plot header).    :type baf: :sphinx_autodoc_typehints_type::py:class:`float`    :param baf: Target basal area factor to filter (e.g., 12).    :type dbh_col: :sphinx_autodoc_typehints_type::py:class:`str`    :param dbh_col: Column containing diameter at breast height in centimetres.    :type expansion_col: :sphinx_autodoc_typehints_type::py:class:`str`    :param expansion_col: Column representing tree expansion weights.    :type group_keys: :sphinx_autodoc_typehints_type::py:class:`tuple`\ \[:py:class:`str`, :py:data:`…<Ellipsis>`]` :param group_keys: Keys used to join tree detail with sample-by-visit metadata.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame``

.. py:function:: auto_select_bafs(root, count=3, *, plot_file=’faib_plot_header.csv’, sample_file=’faib_sample_byvisit.csv’) :module: nemora.ingest.faib

Select representative BAF values from FAIB metadata.

:type root: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path`    :param root: Directory containing FAIB CSV extracts.    :type count: :sphinx_autodoc_typehints_type::py:class:`int`    :param count: Number of representative BAF values to return.    :type plot_file: :sphinx_autodoc_typehints_type::py:class:`str`    :param plot_file: CSV filenames to inspect for BAF metadata (plot header preferred).    :type sample_file: :sphinx_autodoc_typehints_type::py:class:`str`` :param sample_file: CSV filenames to inspect for BAF metadata (plot header preferred).

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\list`\ \[:py:class:`float`]`

.. py:function:: build_faib_dataset_source(dataset=’psp’, *, destination, filenames=None, overwrite=False) :module: nemora.ingest.faib

Create a :class:DatasetSource for FAIB CSV extracts.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.DatasetSource``

.. py:function:: build_faib_stand_table_pipeline(plot_info, *, baf, dbh_col, expansion_col, baf_col, group_keys=(‘CLSTR_ID’, ‘VISIT_NUMBER’, ‘PLOT’)) :module: nemora.ingest.faib

Create a :class:TransformPipeline that aggregates FAIB stand tables.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.TransformPipeline``

.. py:function:: build_stand_table_from_csvs(root, baf, *, tree_file=’faib_tree_detail.csv’, sample_file=’faib_sample_byvisit.csv’, plot_file=None, dbh_col=None, expansion_col=None, baf_col=None) :module: nemora.ingest.faib

Load FAIB CSV extracts from root and build a stand table for baf.

:type root: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path`    :param root: Directory containing the FAIB CSV extracts.    :type baf: :sphinx_autodoc_typehints_type::py:class:`float`    :param baf: Desired basal area factor to filter.    :type tree_file: :sphinx_autodoc_typehints_type::py:class:`str`    :param tree_file: Filename for the tree detail CSV within ``root``.    :type sample_file: :sphinx_autodoc_typehints_type::py:class:`str` | :py:obj:`None`` :param sample_file: Filename for the sample-by-visit CSV within root.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame``

.. py:function:: download_faib_csvs(destination, dataset=’psp’, *, overwrite=False, filenames=None) :module: nemora.ingest.faib

Download FAIB CSV extracts via FTP into destination.

:type destination: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path`    :param destination: Directory where files will be written.    :type dataset: :sphinx_autodoc_typehints_type::py:class:`str`` :param dataset: Either "psp" or "non_psp".

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\list`\ \[:py:class:`~pathlib.Path`]`

.. py:function:: generate_faib_manifest(destination, *, dataset=’psp’, source=None, fetch=False, overwrite=False, bafs=None, auto_count=None, max_rows=None, write_parquet=True) :module: nemora.ingest.faib

Fetch FAIB extracts, build stand tables, and emit a manifest.

:type destination: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path`    :param destination: Directory where the manifest and stand tables will be written.    :type dataset: :sphinx_autodoc_typehints_type::py:class:`str`    :param dataset: FAIB dataset to process (``"psp"`` or ``"non_psp"``).    :type source: :sphinx_autodoc_typehints_type::py:class:`str` | :py:class:`~pathlib.Path` | :py:obj:`None`    :param source: Optional directory containing pre-downloaded FAIB CSV files. When omitted,                   files will be fetched into ``destination / "raw"`` if ``fetch`` is true.    :type fetch: :sphinx_autodoc_typehints_type::py:class:`bool`    :param fetch: When set, download the FAIB CSV files before building stand tables.    :type overwrite: :sphinx_autodoc_typehints_type::py:class:`bool`    :param overwrite: Force re-download of CSV files even when they already exist locally.    :type bafs: :sphinx_autodoc_typehints_type::py:class:`~collections.abc.Sequence`\ \[:py:class:`float`] | :py:obj:`None`    :param bafs: Explicit BAF values to build stand tables for.    :type auto_count: :sphinx_autodoc_typehints_type::py:class:`int` | :py:obj:`None`    :param auto_count: When provided, automatically select ``auto_count`` representative BAF values                       instead of using ``bafs``.    :type max_rows: :sphinx_autodoc_typehints_type::py:class:`int` | :py:obj:`None`    :param max_rows: Optional limit on the number of rows retained in each stand table.    :type write_parquet: :sphinx_autodoc_typehints_type::py:class:`bool`` :param write_parquet: When true (default), persist a Parquet copy of the manifest alongside the CSV.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.faib.FAIBManifestResult``

.. py:function:: load_data_dictionary(url) :module: nemora.ingest.faib

Download and parse a FAIB data dictionary XLSX file.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.faib.DataDictionary``

.. py:function:: load_non_psp_dictionary() :module: nemora.ingest.faib

Convenience wrapper for the non-PSP data dictionary.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.faib.DataDictionary``

.. py:function:: load_psp_dictionary() :module: nemora.ingest.faib

Convenience wrapper for the PSP data dictionary.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.faib.DataDictionary``

FIA (nemora.ingest.fia)

.. py:module:: nemora.ingest.fia

Helpers for working with USDA FIA datasets.

.. py:class:: FIATables(tree, condition, plot) :module: nemora.ingest.fia

Bases: :py:class:object

Container for FIA plot/condition/tree tables.

.. py:attribute:: FIATables.condition :module: nemora.ingest.fia :type: ~pandas.DataFrame

.. py:attribute:: FIATables.plot :module: nemora.ingest.fia :type: ~pandas.DataFrame

.. py:attribute:: FIATables.tree :module: nemora.ingest.fia :type: ~pandas.DataFrame

.. py:function:: aggregate_plot_stand_table(tables, *, plot_cn=None, plot_number=None, live_status_codes=(1,), condition_status_codes=(1,), dbh_bin_cm=1.0) :module: nemora.ingest.fia

Aggregate FIA tree records into a stand table summarised by DBH bins.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame``

.. py:function:: build_fia_dataset_source(state, *, destination, tables=(‘TREE’, ‘PLOT’, ‘COND’), overwrite=False) :module: nemora.ingest.fia

Create a :class:DatasetSource for downloading FIA CSV extracts.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.DatasetSource``

.. py:function:: build_stand_table_from_csvs(root, *, plot_cn=None, plot_number=None, tree_file=’TREE.csv’, plot_file=’PLOT.csv’, cond_file=’COND.csv’, live_status_codes=(1,), condition_status_codes=(1,), dbh_bin_cm=1.0) :module: nemora.ingest.fia

Convenience wrapper that loads FIA tables and aggregates a stand table.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~pandas.DataFrame``

.. py:function:: download_fia_tables(destination, state, tables=(‘TREE’, ‘PLOT’, ‘COND’), *, overwrite=False) :module: nemora.ingest.fia

Download FIA CSV extracts for a given state.

:type destination: :sphinx_autodoc_typehints_type:\:py\:class\:\str` | :py:class:`~pathlib.Path`    :param destination: Directory where the CSV files will be written.    :type state: :sphinx_autodoc_typehints_type::py:class:`str`    :param state: Two-letter state or territory code (e.g., ``HI``, ``OR``).    :type tables: :sphinx_autodoc_typehints_type::py:class:`~collections.abc.Sequence`\ \[:py:class:`str`]    :param tables: Iterable of table names to download (default: TREE, PLOT, COND).    :type overwrite: :sphinx_autodoc_typehints_type::py:class:`bool`` :param overwrite: When True, re-download files even if they already exist locally.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\list`\ \[:py:class:`~pathlib.Path`]`

.. py:function:: load_fia_tables(root, *, tree_file=’TREE.csv’, plot_file=’PLOT.csv’, cond_file=’COND.csv’, columns_tree=None, columns_plot=None, columns_cond=None) :module: nemora.ingest.fia

Load FIA CSV extracts from root and return trimmed dataframes.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.fia.FIATables``

HPS (nemora.ingest.hps)

.. py:module:: nemora.ingest.hps

Pipelines for preparing HPS tallies from FAIB PSP compilations.

.. py:class:: HPSPipelineResult(tallies, manifest, tallies_frame) :module: nemora.ingest.hps

Bases: :py:class:object

Container produced by :func:run_hps_pipeline.

.. py:attribute:: HPSPipelineResult.manifest :module: nemora.ingest.hps :type: ~pandas.DataFrame

.. py:attribute:: HPSPipelineResult.tallies :module: nemora.ingest.hps :type: dict[str, ~pandas.DataFrame]

.. py:attribute:: HPSPipelineResult.tallies_frame :module: nemora.ingest.hps :type: ~pandas.DataFrame

.. py:class:: SelectionCriteria(first_visit_only=True, allowed_sample_types=None, max_plots=None) :module: nemora.ingest.hps :canonical: nemora.dataprep.hps.SelectionCriteria

Bases: :py:class:object

Options that control which plots are retained from the PSP compilations.

.. py:attribute:: SelectionCriteria.allowed_sample_types :module: nemora.ingest.hps :type: tuple[str, …] | None :value: None

.. py:attribute:: SelectionCriteria.first_visit_only :module: nemora.ingest.hps :type: bool :value: True

.. py:attribute:: SelectionCriteria.max_plots :module: nemora.ingest.hps :type: int | None :value: None

.. py:function:: build_hps_pipeline(tree_detail_source, selections, *, dbh_column=’DBH’, status_column=’LV_D’, live_status=(‘L’,), bin_width=1.0, bin_origin=0.0, chunk_size=200000, encoding=’latin1’) :module: nemora.ingest.hps

Return a pipeline that aggregates HPS tallies for selections.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.TransformPipeline``

.. py:function:: export_hps_outputs(tallies, manifest, *, output_dir, manifest_path, quiet=False) :module: nemora.ingest.hps

Write per-plot tallies and accompanying manifest to disk.

:rtype: :sphinx_autodoc_typehints_type:\:py\:obj\:\None``

.. py:function:: load_plot_selections(plot_header_source, sample_byvisit_source, *, baf, criteria=None, encoding=’latin1’) :module: nemora.ingest.hps

Load PSP plot metadata and filter to the subset needed for HPS tallies.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\list`\ \[:py:class:`~nemora.dataprep.hps.PlotSelection`]`

.. py:function:: run_hps_pipeline(tree_detail_source, selections, *, dbh_column=’DBH’, status_column=’LV_D’, live_status=(‘L’,), bin_width=1.0, bin_origin=0.0, chunk_size=200000, encoding=’latin1’) :module: nemora.ingest.hps

Execute the HPS pipeline and return tallies/manifest dataframes.

:rtype: :sphinx_autodoc_typehints_type:\:py\:class\:\~nemora.ingest.hps.HPSPipelineResult``