{ "cells": [ { "cell_type": "markdown", "id": "5448b473", "metadata": {}, "source": [ "# Chunk probe walkthrough\n", "\n", "Explore `badc chunk probe` and `badc chunk split` on the public `bogus` dataset. These cells are\n", "lightweight so they can run on a laptop without GPUs." ] }, { "cell_type": "markdown", "id": "0466c865", "metadata": {}, "source": [ "## Configure paths\n", "Update `DATASET_ROOT` if your checkout lives elsewhere. The notebook assumes you already ran\n", "``badc data connect bogus`` so audio lives under ``data/datalad/bogus``." ] }, { "cell_type": "code", "execution_count": null, "id": "8a1e31db", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "DATASET_ROOT = Path(\"..\") / \"data\" / \"datalad\" / \"bogus\"\n", "AUDIO_FILE = DATASET_ROOT / \"audio\" / \"GNWT-290_20230331_235938.wav\"\n", "MANIFEST_DIR = DATASET_ROOT / \"manifests\"\n", "MANIFEST_DIR.mkdir(exist_ok=True)\n", "print(AUDIO_FILE.resolve())\n", "print(\"Exists:\", AUDIO_FILE.exists())" ] }, { "cell_type": "markdown", "id": "d94808f3", "metadata": {}, "source": [ "## Probe chunk duration\n", "This uses the placeholder probe logic today; once HawkEars probing lands, rerun to capture updated\n", "notes." ] }, { "cell_type": "code", "execution_count": null, "id": "ca657153", "metadata": {}, "outputs": [], "source": [ "import subprocess\n", "\n", "probe_cmd = [\n", " \"badc\",\n", " \"chunk\",\n", " \"probe\",\n", " str(AUDIO_FILE),\n", " \"--initial-duration\",\n", " \"60\",\n", "]\n", "subprocess.run(probe_cmd, check=True)" ] }, { "cell_type": "markdown", "id": "1a90eddf", "metadata": {}, "source": [ "## Plan chunks and write a manifest" ] }, { "cell_type": "code", "execution_count": null, "id": "25507e9d", "metadata": {}, "outputs": [], "source": [ "manifest_path = MANIFEST_DIR / \"GNWT-290_chunk_probe.csv\"\n", "split_cmd = [\n", " \"badc\",\n", " \"chunk\",\n", " \"split\",\n", " str(AUDIO_FILE),\n", " \"--chunk-duration\",\n", " \"60\",\n", " \"--manifest\",\n", " str(manifest_path),\n", "]\n", "subprocess.run(split_cmd, check=True)\n", "print(\"Manifest written to\", manifest_path)" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 5 }