HPS Pipeline Benchmarking
nemora ingest-benchmark measures how long the FAIB→HPS pipeline takes for a
given set of plots without writing outputs. Use this to sanity-check performance
before running large batch jobs or after modifying the pipeline.
Running the benchmark
# Reuse local PSP extracts and run three iterations (default)
nemora ingest-benchmark data/external/faib --no-fetch
# Download PSP files to a cache directory and run five iterations
nemora ingest-benchmark data/external/faib --fetch --cache-dir data/external/psp/raw --iterations 5
Example output:
Iteration 1/3: 1.842s
Iteration 2/3: 1.807s
Iteration 3/3: 1.815s
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Runs ┃ Average (s) ┃ Fastest (s) ┃ Slowest (s) ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ 3 │ 1.821 │ 1.807 │ 1.842 │
└──────────────┴──────────────┴──────────────┴──────────────┘
Tree total: 12,408 (plots=3, live_status=L)
Interpreting results
Average/Fastest/Slowest help spot variability (e.g., cold cache vs warm cache).
Tree total / plot count confirm the benchmark used the expected subset.
Record typical timings in your project notes; if nightly ingest monitoring reports significant deviations, rerun this benchmark to diagnose regression vs. upstream changes.
Capturing JSONL telemetry
Use --report-path to append JSON lines that mirror the console summary. These logs power the
nightly ingest monitoring workflow and are handy when reviewing performance-sensitive pull requests.
nemora ingest-benchmark data/external/faib --no-fetch --iterations 3 \
--report-path logs/ingest_benchmark.jsonl
tail -n 1 logs/ingest_benchmark.jsonl
{"timestamp":"2025-11-08T08:12:02.345Z","iterations":3,"average_seconds":1.82,"tree_total":12408,...}
Check the nightly GitHub Actions artifact (ingest-benchmark-report) if you need the latest trends
without rerunning the benchmark locally.
Nightly automation
The Nightly Ingest Integration workflow runs the benchmark every night, parses the JSONL output
into Markdown/text summaries (reports/ingest_benchmark_summary.md / .txt), and enforces
INGEST_BENCHMARK_AVG_THRESHOLD=3.0 seconds for the average runtime. When the threshold is
exceeded the job fails and the auto-created issue includes the summary table, making it easy to spot
regressions without digging through raw logs. Download the artifact if you need the full JSONL +
summary history for deeper analysis.