Chunk probe walkthrough¶
Explore badc chunk probe and badc chunk split on the public bogus dataset. These cells are lightweight so they can run on a laptop without GPUs.
Configure paths¶
Update DATASET_ROOT if your checkout lives elsewhere. The notebook assumes you already ran badc data connect bogus so audio lives under data/datalad/bogus.
[ ]:
from pathlib import Path
DATASET_ROOT = Path("..") / "data" / "datalad" / "bogus"
AUDIO_FILE = DATASET_ROOT / "audio" / "GNWT-290_20230331_235938.wav"
MANIFEST_DIR = DATASET_ROOT / "manifests"
MANIFEST_DIR.mkdir(exist_ok=True)
print(AUDIO_FILE.resolve())
print("Exists:", AUDIO_FILE.exists())
Probe chunk duration¶
This uses the placeholder probe logic today; once HawkEars probing lands, rerun to capture updated notes.
[ ]:
import subprocess
probe_cmd = [
"badc",
"chunk",
"probe",
str(AUDIO_FILE),
"--initial-duration",
"60",
]
subprocess.run(probe_cmd, check=True)
Plan chunks and write a manifest¶
[ ]:
manifest_path = MANIFEST_DIR / "GNWT-290_chunk_probe.csv"
split_cmd = [
"badc",
"chunk",
"split",
str(AUDIO_FILE),
"--chunk-duration",
"60",
"--manifest",
str(manifest_path),
]
subprocess.run(split_cmd, check=True)
print("Manifest written to", manifest_path)