# Object boxes JSONL

One JSON object per **detection per frame** (multiple lines per frame allowed).

## Record format

```json
{
  "video": "sample_01_shuttle_tube_packaging.mp4",
  "frame_idx": 100,
  "timestamp_sec": 3.333,
  "track_id": "tube_3",
  "class": "shuttle_tube",
  "bbox_type": "axis_aligned",
  "bbox": {
    "x_min": 0.42,
    "y_min": 0.18,
    "x_max": 0.71,
    "y_max": 0.65
  },
  "confidence": 0.87,
  "source": "auto_v1"
}
```

## Fields

| Field | Required | Description |
|-------|----------|-------------|
| `video` | yes | Matches MP4 filename in `videos/` |
| `frame_idx` | yes | Zero-based frame index |
| `timestamp_sec` | yes | `frame_idx / fps` |
| `class` | yes | Manipulation object from segments / `object_taxonomy.md` (objects only; hands live in `*_hand_boxes.jsonl`) |
| `bbox` | yes | Normalized 0–1 coords (origin top-left) |
| `bbox_type` | yes | `axis_aligned` or `oriented` |
| `track_id` | recommended | Stable ID across frames |
| `confidence` | yes | Detection score 0–1 |
| `source` | yes | `auto_v1`, `auto_guided_v1`, or `human_reviewed` |
| `rotation_deg` | if oriented | Clockwise rotation of box |

## Oriented box (optional)

When `bbox_type` is `oriented`, `bbox` is still axis-aligned **enclosing** rect; add `rotation_deg` for the object major axis.

## Sampling cadence

**Object detections** are exported on **sampled frames** (roughly one frame per 1–1.5 seconds; see `sample_every` in `object_boxes_summary.json`), not on every frame. `frame_idx` is the exact source-video frame, so timestamps remain precise.

This file contains **objects only**. Hand bounding boxes are a separate, denser stream — see `hand_boxes_schema.md` / `*_hand_boxes.jsonl`. Keeping them separate avoids mixing the sparse object cadence with the per-frame hand cadence.

For sparse object rows, interpolate or hold the last box across the gap between consecutive `frame_idx` values. The full-length overlay videos in `previews/` hold by **class** (not `track_id`) to avoid ghost accumulation when the detector assigns new track IDs every sample.

## Files

- `sample_XX_*_object_boxes.jsonl` — per-video **object** detections + tracks
- `sample_XX_*_hand_boxes.jsonl` — per-frame `left_hand` / `right_hand` boxes (see `hand_boxes_schema.md`)
- `object_boxes_summary.json` — aggregate object counts, classes, frames covered (`frames_with_boxes`, `sample_every`)
- `hand_boxes_summary.json` — aggregate hand-box counts, classes, frames covered
- `previews/sample_XX_*_boxes_preview.mp4` — full-length overlay video showing **both** object and hand boxes