How to Postprocess the Data#

This section covers the offline processing pipeline that turns raw recordings into full-body SMPL pose estimates.

Tip

A sample recording is available for testing the full pipeline. Download sample_data.zip and extract into received_recordings/:

unzip sample_data.zip -d received_recordings/

This provides a complete raw session (video, metadata, IMU, Aria VRS + MPS outputs) that you can use to follow along with every step below.

Step 1: Add Aria Data#

The Aria VRS recording and MPS outputs must be added to the session directory manually after the recording is complete:

Download the .vrs file and its JSON sidecar from the Aria companion app.
Submit the VRS to Aria MPS for SLAM and hand tracking processing.
Place the outputs into the session directory:

received_recordings/<session_name>/
├── ...                              # (existing files from recording)
├── <recording>.vrs                  # Aria VRS recording
├── <recording>.vrs.json             # VRS metadata sidecar
└── mps_<recording>_vrs/             # Aria MPS outputs
    ├── slam/
    │   ├── closed_loop_trajectory.csv
    │   └── semidense_points.csv.gz
    └── hand_tracking/
        └── hand_tracking_results.csv

Step 2: Calibration Pipeline#

The calibration pipeline can run automatically when the receiver finishes uploading, or be triggered manually on an existing session:

# Run the full calibration pipeline (steps 1–4 below) on an existing session
python src/pipeline/01_receiver.py --session <session_dir>

This runs four sub-steps:

Prepare session — extract video frames, camera intrinsics, AprilTag summary.
SAM-3D-Body — estimate 3D body parameters from third-person RGB frames.
MHR → SMPL-X — convert MHR outputs to SMPL-X format.
Calibration solve — compute bone-to-sensor rotation offsets (imu_calibration.json). Can also be re-run independently via python src/pipeline/02_calibrate.py <session_dir>.

Note

The calibration pipeline automatically detects which sub-steps have already been completed and skips them. If you need to re-run a specific step (e.g. after updating a model checkpoint), delete its output directory first.

After this step, the session directory contains:

received_recordings/<session_name>/
├── ...                              # (existing files)
├── meta/
│   ├── camera.json                  # Camera intrinsics extracted from metadata
│   └── calibration_segment.json     # Auto-detected calibration time window
├── color/                           # Extracted video frames (PNG)
├── frames.csv                       # Frame index, UTC timestamp, image path
├── color_apriltag/
│   └── detection_summary.json       # Per-frame AprilTag detection counts
├── body_data/                       # SAM-3D-Body MHR outputs (*.npz per frame)
├── smpl_output/                     # SMPL-X conversion results
│   ├── smpl_parameters.npz          # Joints, rotations, betas across all frames
│   ├── smpl_vertices.npy            # Mesh vertices (memory-mapped)
│   └── per_frame/                   # Individual SMPL-X fits
└── imu_calibration.json             # Bone-to-sensor rotation offsets (B_R_S per joint)

Step 3: Synchronization#

The sync pipeline aligns IMU, third-person RGB, and (optionally) Aria egocentric data to a shared UTC timeline:

python src/pipeline/03_sync.py <session_dir>

This produces a sync/ folder with calibrated, UTC-aligned data ready for inference:

received_recordings/<session_name>/
├── ...                              # (existing files)
└── sync/
    ├── frames.csv                   # UTC-mapped third-person RGB
    ├── color/                       # Symlinked RGB images
    ├── imu_info.csv                 # Calibrated IMU rotations (9 rows per timestamp)
    ├── imu_info.pkl                 # Same data as pickle: {utc_ns: {imu_id: 3x3 rotation matrix}}
    ├── vrs_frames.csv               # (optional) UTC-mapped Aria RGB
    └── vrs_color/                   # (optional) Extracted Aria RGB frames

Step 4: Pose Estimation#

Run EgoAllo diffusion-based pose estimation conditioned on head trajectory. The --guidance-mode flag selects which signals guide the diffusion process:

# Default: RoSHI (diffusion + IMU guidance)
python src/pipeline/04_inference.py --traj-root <session_dir>

# Or specify a different mode
python src/pipeline/04_inference.py --traj-root <session_dir> --guidance-mode egoallo

Available guidance modes:

Mode	Diffusion	IMU	Aria Hand	Description
`egoallo`	yes	no	no	Pure EgoAllo baseline (foot skating constraint only)
`egoallo_ariawrist`	yes	no	wrist only	EgoAllo + Aria wrist pose guidance (no full hand, no IMU)
`roshi` (default)	yes	yes	no	RoSHI: diffusion guided by IMU bone orientations
`roshi_ariahand`	yes	yes	yes	RoSHI + Aria hand tracking

For details on the IMU guidance constraints and optimizer parameters, see Guidance Parameters.

Note

Before saving, 04_inference.py applies a hard clamp to the knee and ankle joints (SMPL-H body indices 3, 4, 6, 7) so the exported body_quats are guaranteed to stay within biomechanical limits — knees cannot hyperextend (Euler X ≥ 0), and the Y/Z Euler angles are clipped to the same bounds used by the soft penalty in the optimizer (see Lower-Body Joint Angle Limits). The console prints how many joint-frames were adjusted.

Results are saved to <session_dir>/egoallo_outputs/ as NPZ files containing:

Key	Description
`Ts_world_cpf`	CPF (central pupil frame) poses in world frame (T, 7)
`Ts_world_root`	Root joint (pelvis) pose in world frame (T, 7)
`body_quats`	Local body joint quaternions, wxyz (samples, T, 21, 4)
`left_hand_quats`	Left hand joint quaternions, wxyz (samples, T, 15, 4)
`right_hand_quats`	Right hand joint quaternions, wxyz (samples, T, 15, 4)
`contacts`	Foot contact predictions per body joint (samples, T, 21)
`betas`	SMPL-H body shape parameters (samples, T, 16)
`timestamps_ns`	Tracking timestamps in nanoseconds (T,)

Step 5: Visualization#

Visualize IMU FK and SAM results (no localization):

python src/pipeline/05_visualize.py <session_dir>

Visualize RoSHI pipeline results:

python src/pipeline/05_visualize_roshi.py <session_dir>

For evaluation against OptiTrack ground truth, see Evaluation.