# Delivery overview — Master_Sample_v1

Master egocentric manipulation pack — full annotation stack for evaluation.

See also: **`DATACARD.md`** (dataset card for buyers / Hugging Face).

## Pipeline

1. **Capture** — monocular smartphone ego video in real workplaces
2. **Consent** — commercial AI-training consent collected before capture (DPDP 2023–aligned)
3. **Anonymize** — audio stripped; human review confirmed no identifiable faces (clips 01–09)
4. **Annotate** — temporal action segments, captions, 2D hand keypoints, 2D object boxes
5. **Verify** — automated QA (resolution, duration, schema, alignment) + numeric quality scores
6. **Deliver** — MP4 + JSON metadata + JSONL annotations

## Package contents

- `videos/` — final-cut ego MP4s (no audio)
- `overlays/` — hand skeleton preview videos
- `previews/` — full-length object + hand bounding-box preview MP4s
- `metadata/` — per-clip JSON (capture, QA, annotations summary, approximate UTC)
- `DATACARD.md` — buyer-facing dataset card
- `camera_config.json` — monocular device config per clip
- `delivery_manifest.json` — pack version, clip count, integrity hash
- `manifest.json` / `manifest.csv` — provenance, cut times, SHA256
- `qa/qa_report.json` + `qa_report.md` — structural + numeric QA
- `consent/worker_consent_summary.md`
- `annotations/` — action_segments, captions, hand keypoints, object boxes
- `annotations/timestamps/` — per-clip `frame_idx` + `timestamp_sec` (clip-relative)
- `sessions/` — per-session JSON + `session_index.json`
- `schema/` — field definitions

**Not shipped:** `annotations/archive/` (local rollback only), `annotations/review/`, `previews/review/`.

## Object boxes layer

2D axis-aligned bounding boxes in `annotations/*_object_boxes.jsonl`:

- **Sampling:** ~1 frame per 1–1.5 seconds (`frame_idx` is exact source frame)
- **Classes:** per-clip manipulation objects — see `schema/object_taxonomy.md`
- **Hands:** `left_hand` / `right_hand` derived from keypoints (dense per-frame)
- **Provenance:** `source: auto_v1`
- **Previews:** `previews/*_boxes_preview.mp4` — full-length, native fps

Clip **09** may carry interim boxes until gold refresh completes.

## Face privacy (clips 01–09)

Human review confirmed **no identifiable faces**. Metadata fields:

| Field | Value |
|-------|-------|
| `identifiable_faces_present` | `false` |
| `face_blur_required` | `false` |
| `face_blur_applied` | `false` |
| `face_privacy_review` | `manual_pass` |

Pixel blur was not applied because no faces required it.

## Wall-clock time (approximate UTC)

Deliverable MP4s use **clip-relative** timestamps. Metadata includes approximate capture start:

- `capture_start_ist_approx` — minute resolution, **12:00–17:00 IST** window
- `capture_start_utc_approx` — same instant in UTC
- `utc_precision: minute_approximate`

For robotics fusion, combine clip-relative `timestamp_sec` with approximate session start.

## QA flags (per clip)

| Flag | Meaning |
|------|---------|
| `continuous_integration_qa_pass` | Validator + alignment checks passed |
| `resolution_verified` | Delivered resolution matches metadata |
| `duration_alignment_verified` | video ≈ overlay ≈ segments |
| `ptp_synchronized` | **not_applicable** — see below |
| `blur_score_p10` | Laplacian sharpness (higher = sharper) — in `qa/qa_report.json` |
| `fps_stddev_ms` | Frame timing stability |
| `hands_visible_pct` | Hand visibility from keypoints |

### PTP (`ptp_synchronized: not_applicable`)

**Precision Time Protocol** synchronizes clocks across multiple sensors. This pack uses a **single monocular head-mounted smartphone** — there is no multi-camera or IMU hardware sync. Clip-relative timestamps in `annotations/timestamps/` are authoritative. PTP is correctly marked not applicable, not missing.

## Verify integrity

Compare file hashes against `delivery_manifest.json` and per-clip `metadata/*.json` integrity fields.
