# Hand boxes JSONL

One JSON object per **hand per frame**. These are 2D bounding boxes for the operator's hands, kept separate from `object_boxes` because they are **dense** (one row per frame a hand is visible) versus the **sparse** sampled object detections.

## Record format

```json
{
  "video": "sample_07_car_detailing.mp4",
  "frame_idx": 0,
  "timestamp_sec": 0.0,
  "track_id": "right_hand",
  "class": "right_hand",
  "bbox_type": "axis_aligned",
  "bbox": {
    "x_min": 0.675974,
    "y_min": 0.065518,
    "x_max": 0.862654,
    "y_max": 0.477378
  },
  "confidence": 0.95,
  "source": "auto_v1"
}
```

## Fields

| Field | Required | Description |
|-------|----------|-------------|
| `video` | yes | Matches MP4 filename in `videos/` |
| `frame_idx` | yes | Zero-based frame index |
| `timestamp_sec` | yes | `frame_idx / fps` |
| `class` | yes | `left_hand` or `right_hand` |
| `track_id` | yes | `left_hand` / `right_hand` (one instance per side) |
| `bbox` | yes | Normalized 0–1 coords (origin top-left) |
| `bbox_type` | yes | `axis_aligned` |
| `confidence` | yes | Derived (constant) — boxes come from keypoints, not a detector score |
| `source` | yes | `auto_v1` |

## Derivation

Each box is the **axis-aligned enclosing rectangle of the 21 hand landmarks** from `*_hand_keypoints.jsonl`, with `0.02` normalized padding. The keypoints file remains the richer hand representation; these boxes are a convenience layer for box-only consumers. For full hand pose, use `*_hand_keypoints.jsonl`.

## Cadence

Dense: one row per frame per visible hand (see `hand_boxes_summary.json`). Unlike object boxes, no interpolation is needed.

## Files

- `sample_XX_*_hand_boxes.jsonl` — per-frame hand boxes
- `hand_boxes_summary.json` — aggregate counts, classes, frames covered