How to Postprocess the Data#
This section covers the offline processing pipeline that turns raw recordings into full-body SMPL pose estimates.
Tip
A sample recording is available for testing the full pipeline. Download
sample_data.zip
and extract into received_recordings/:
unzip sample_data.zip -d received_recordings/
This provides a complete raw session (video, metadata, IMU, Aria VRS + MPS outputs) that you can use to follow along with every step below.
Step 1: Add Aria Data#
The Aria VRS recording and MPS outputs must be added to the session directory manually after the recording is complete:
Download the
.vrsfile and its JSON sidecar from the Aria companion app.Submit the VRS to Aria MPS for SLAM and hand tracking processing.
Place the outputs into the session directory:
received_recordings/<session_name>/
├── ... # (existing files from recording)
├── <recording>.vrs # Aria VRS recording
├── <recording>.vrs.json # VRS metadata sidecar
└── mps_<recording>_vrs/ # Aria MPS outputs
├── slam/
│ ├── closed_loop_trajectory.csv
│ └── semidense_points.csv.gz
└── hand_tracking/
└── hand_tracking_results.csv
Step 2: Calibration Pipeline#
The calibration pipeline can run automatically when the receiver finishes uploading, or be triggered manually on an existing session:
# Run the full calibration pipeline (steps 1–4 below) on an existing session
python src/pipeline/01_receiver.py --session <session_dir>
This runs four sub-steps:
Prepare session — extract video frames, camera intrinsics, AprilTag summary.
SAM-3D-Body — estimate 3D body parameters from third-person RGB frames.
MHR → SMPL-X — convert MHR outputs to SMPL-X format.
Calibration solve — compute bone-to-sensor rotation offsets (
imu_calibration.json). Can also be re-run independently viapython src/pipeline/02_calibrate.py <session_dir>.
Note
The calibration pipeline automatically detects which sub-steps have already been completed and skips them. If you need to re-run a specific step (e.g. after updating a model checkpoint), delete its output directory first.
After this step, the session directory contains:
received_recordings/<session_name>/
├── ... # (existing files)
├── meta/
│ ├── camera.json # Camera intrinsics extracted from metadata
│ └── calibration_segment.json # Auto-detected calibration time window
├── color/ # Extracted video frames (PNG)
├── frames.csv # Frame index, UTC timestamp, image path
├── color_apriltag/
│ └── detection_summary.json # Per-frame AprilTag detection counts
├── body_data/ # SAM-3D-Body MHR outputs (*.npz per frame)
├── smpl_output/ # SMPL-X conversion results
│ ├── smpl_parameters.npz # Joints, rotations, betas across all frames
│ ├── smpl_vertices.npy # Mesh vertices (memory-mapped)
│ └── per_frame/ # Individual SMPL-X fits
└── imu_calibration.json # Bone-to-sensor rotation offsets (B_R_S per joint)
Step 3: Synchronization#
The sync pipeline aligns IMU, third-person RGB, and (optionally) Aria egocentric data to a shared UTC timeline:
python src/pipeline/03_sync.py <session_dir>
This produces a sync/ folder with calibrated, UTC-aligned data ready for
inference:
received_recordings/<session_name>/
├── ... # (existing files)
└── sync/
├── frames.csv # UTC-mapped third-person RGB
├── color/ # Symlinked RGB images
├── imu_info.csv # Calibrated IMU rotations (9 rows per timestamp)
├── imu_info.pkl # Same data as pickle: {utc_ns: {imu_id: 3x3 rotation matrix}}
├── vrs_frames.csv # (optional) UTC-mapped Aria RGB
└── vrs_color/ # (optional) Extracted Aria RGB frames
Step 4: Pose Estimation#
Run EgoAllo diffusion-based pose estimation conditioned on head trajectory.
The --guidance-mode flag selects which signals guide the diffusion process:
# Default: RoSHI (diffusion + IMU guidance)
python src/pipeline/04_inference.py --traj-root <session_dir>
# Or specify a different mode
python src/pipeline/04_inference.py --traj-root <session_dir> --guidance-mode egoallo
Available guidance modes:
Mode |
Diffusion |
IMU |
Aria Hand |
Description |
|---|---|---|---|---|
|
yes |
no |
no |
Pure EgoAllo baseline (foot skating constraint only) |
|
yes |
no |
wrist only |
EgoAllo + Aria wrist pose guidance (no full hand, no IMU) |
|
yes |
yes |
no |
RoSHI: diffusion guided by IMU bone orientations |
|
yes |
yes |
yes |
RoSHI + Aria hand tracking |
For details on the IMU guidance constraints and optimizer parameters, see Guidance Parameters.
Results are saved to <session_dir>/egoallo_outputs/ as NPZ files containing:
Key |
Description |
|---|---|
|
CPF (central pupil frame) poses in world frame (T, 7) |
|
Root joint (pelvis) pose in world frame (T, 7) |
|
Local body joint quaternions, wxyz (samples, T, 21, 4) |
|
Left hand joint quaternions, wxyz (samples, T, 15, 4) |
|
Right hand joint quaternions, wxyz (samples, T, 15, 4) |
|
Foot contact predictions per body joint (samples, T, 21) |
|
SMPL-H body shape parameters (samples, T, 16) |
|
Tracking timestamps in nanoseconds (T,) |
Step 5: Visualization#
Compare methods side-by-side with third-person RGB:
SAM-3D — third-person video body estimation from the calibration pipeline
IMU FK — pure forward kinematics from calibrated IMUs, no diffusion model, assume missing joints are at the identity
RoSHI — diffusion + IMU guidance (from
04_inference.py)EgoAllo — diffusion baseline outputs (from
04_inference.py --guidance-mode egoallo)
python src/pipeline/05_visualize.py <session_dir>
For evaluation against OptiTrack ground truth, see Evaluation.