# DATACARD — Egocentric Manipulation Sample (Master_Sample_v1)

**Dataset:** `ggn-egocentric-data-sample` / `sample_data_june/`  
**Version:** June 2026 evaluation sample  
**License:** Commercial AI training & evaluation (enterprise terms on request)

---

## Dataset description

Nine egocentric video clips (~5 minutes each) of real manual work in Indian workplaces: factory packaging, sewing, heat-shrink batching, garment ironing, commercial catering, cane weaving, car detailing, auto-body primer/painting, and denting/filing. Each clip includes a full annotation stack for robotics and embodied-AI evaluation.

**Intended use:** Training and evaluating vision-language-action models, manipulation policies, hand-object interaction research, and egocentric video understanding in industrial and service settings.

**Out of scope:** Surveillance, worker performance scoring, biometric identification of individuals, or any use that re-identifies participants.

---

## Collection methodology

| Aspect | Detail |
|--------|--------|
| Camera | Monocular smartphone, head-mounted via headband |
| Devices | Samsung Galaxy S24, Apple iPhone 16 Pro Max |
| Environment | Real workplaces (factory, restaurant, roadside shop, showroom, repair shop) |
| Duration | ~4.5–6 min per clip after hero-cut selection |
| Audio | Stripped from all deliverable MP4s |
| Consent | Commercial AI-training consent collected before capture (see `consent/worker_consent_summary.md`) |
| Time reference | Clip-relative `frame_idx` / `timestamp_sec`; approximate wall-clock start in metadata (`12:00–17:00 IST`, minute resolution) |

---

## Annotation layers (7)

| Layer | Format | Description |
|-------|--------|-------------|
| Action segments | JSONL | Temporal verb–noun segments from human review (~5–15 s spans) |
| Captions | JSONL | One natural-language paragraph per clip |
| Hand keypoints | JSONL | 2D hand landmarks (21 points) per frame |
| Object boxes | JSONL | 2D axis-aligned boxes, sampled ~1 Hz; includes `left_hand` / `right_hand` from keypoints |
| Hand–object contact | JSONL | Derived overlap samples between hands and objects |
| Timestamps | CSV | Per-clip `frame_idx` + `timestamp_sec` |
| Overlays | MP4 | Hand skeleton previews + full-length object-box previews |

Schema definitions: `schema/`

---

## Known limitations

- **Monocular only** — no stereo depth, no wrist camera.
- **PTP sync** — `not_applicable`; single-camera capture, no multi-sensor hardware sync.
- **UTC** — approximate capture-start times only (not embedded in MP4).
- **Object boxes** — sampled frames, not dense per-frame labels; clip 09 may be on interim boxes until gold refresh completes.
- **Hand–object contact** — derived layer; may lag object-box gold updates.
- **Geography** — India workplaces only; not representative of global factory conditions.
- **Language** — action notes may include STT artifacts; segments are human-reviewed.

---

## Quality assurance

Per-clip numeric QA in `qa/qa_report.json`:

- `blur_score_p10` — Laplacian sharpness (higher = sharper)
- `fps_stddev_ms` — frame timing stability
- `hands_visible_pct` — fraction of frames with visible hands

Structural CI checks: resolution, duration alignment, schema validation, audio stripped.

---

## Privacy & consent

- **Faces:** Human review confirmed no identifiable faces in delivered clips; pixel blur not required.
- **Audio:** Removed from all deliverables.
- **Consent:** India DPDP Act 2023–aligned collection; commercial AI training and evaluation scope. See `consent/worker_consent_summary.md`.

---

## Bias & coverage notes

- Skews toward manual labor and small-business settings in India.
- Strong bimanual and tool-use diversity (weaving, detailing, cooking, auto repair).
- Does not include scripted lab bench tasks or robot teleop wrist cameras.
- Gender, age, and ethnicity of collectors not exhaustively balanced across clips.

---

## Download

Public S3 (no credentials):

```bash
aws s3 sync s3://ggn-egocentric-data-sample/sample_data_june ./Master_Sample_v1 --no-sign-request
```

Landing page: https://ggn-egocentric-data-sample.s3.ap-south-1.amazonaws.com/sample_data_june/index.html

---

## Citation & contact

GGN / Egocentric-100K evaluation sample. For enterprise license or full dataset access, contact the data collection partner.
