Chunk probe walkthrough

Explore badc chunk probe and badc chunk split on the public bogus dataset. These cells are lightweight so they can run on a laptop without GPUs.

Configure paths

Update DATASET_ROOT if your checkout lives elsewhere. The notebook assumes you already ran badc data connect bogus so audio lives under data/datalad/bogus.

[ ]:
from pathlib import Path

DATASET_ROOT = Path("..") / "data" / "datalad" / "bogus"
AUDIO_FILE = DATASET_ROOT / "audio" / "GNWT-290_20230331_235938.wav"
MANIFEST_DIR = DATASET_ROOT / "manifests"
MANIFEST_DIR.mkdir(exist_ok=True)
print(AUDIO_FILE.resolve())
print("Exists:", AUDIO_FILE.exists())

Probe chunk duration

This uses the placeholder probe logic today; once HawkEars probing lands, rerun to capture updated notes.

[ ]:
import subprocess

probe_cmd = [
    "badc",
    "chunk",
    "probe",
    str(AUDIO_FILE),
    "--initial-duration",
    "60",
]
subprocess.run(probe_cmd, check=True)

Plan chunks and write a manifest

[ ]:
manifest_path = MANIFEST_DIR / "GNWT-290_chunk_probe.csv"
split_cmd = [
    "badc",
    "chunk",
    "split",
    str(AUDIO_FILE),
    "--chunk-duration",
    "60",
    "--manifest",
    str(manifest_path),
]
subprocess.run(split_cmd, check=True)
print("Manifest written to", manifest_path)