Installation#

This section covers the four components you need to install and set up before using RoSHI:

  1. Environment & Codebase — software environments, dependencies, model files

  2. Calibration App — iOS companion app for sensor-to-bone calibration

  3. Hardware – IMUs — 9 body-worn IMU trackers with AprilTags

  4. Aria Glasses — Project Aria for egocentric SLAM and hand tracking

1. Environment & Codebase#

Prerequisites#

  • NVIDIA GPU with CUDA support

  • Python 3.12+

We recommend using conda to manage the environment and dependencies.

Clone the main repository:

git clone https://github.com/Jirl-upenn/RoSHI-MoCap.git
cd RoSHI-MoCap

Conda Environments#

RoSHI uses 3 separate conda environments for different pipeline components.

roshi — main pipeline + EgoAllo inference (Python 3.12)

This environment handles IMU data reception, calibration, pose reconstruction, synchronization, and EgoAllo-based diffusion pose estimation with our guidance optimizer. EgoAllo is integrated directly into the codebase under src/egoallo/.

conda create -n roshi python=3.12
conda activate roshi

# Install the package (includes egoallo and all dependencies)
pip install -e .

# JAX with CUDA support (required for guidance optimization)
pip install "jax[cuda12]>=0.6"
pip install git+https://github.com/brentyi/jaxls.git

sam_3d_body — 3D body estimation from video (Python 3.11)

This environment is used by the sam-3d-body module for extracting SMPL-X body parameters from third-person video frames as part of our calibration pipeline. See the installation guide for detailed setup instructions. Model checkpoints should be placed under model/sam3d/ (see Model Files below).

mhr — motion and hand reconstruction (Python 3.12)

This environment is used by the MHR module to convert MHR format to SMPL-X as part of our calibration pipeline. Refer to its README for detailed installation instructions. Model checkpoints should be placed under model/mhr/ (see Model Files below).

Model Files#

All model checkpoints are placed under model/. Some models are licensed separately and must be downloaded manually.

model/
├── egoallo/                           # EgoAllo diffusion checkpoint
│   └── checkpoints_3000000/
│       ├── model.safetensors
│       └── ...
├── mhr/                               # MHR model assets (full bundle)
│   ├── mhr_model.pt
│   ├── lod*.fbx
│   ├── compact_v6_1.model
│   ├── corrective_activation.npz
│   └── corrective_blendshapes_lod*.npz
├── sam3d/                             # SAM 3D Body checkpoint
│   └── sam-3d-body-dinov3/
│       ├── model.ckpt
│       ├── model_config.yaml
│       └── assets/
│           └── mhr_model.pt
├── smplh/                             # SMPL-H body model
│   └── neutral/model.npz
└── smplx/                             # SMPL-X body model
    └── SMPLX_NEUTRAL.npz
  • SAM 3D Body: download the sam-3d-body-dinov3 checkpoint from Hugging Face (access request required). Place it under model/sam3d/:

    huggingface-cli download facebook/sam-3d-body-dinov3 \
        --local-dir model/sam3d/sam-3d-body-dinov3
    
  • MHR: download the full assets bundle from the MHR GitHub release. The MHR→SMPL-X conversion requires the complete assets (LOD meshes, corrective blendshapes, etc.), not just the torchscript model:

    curl -OL https://github.com/facebookresearch/MHR/releases/download/v1.0.0/assets.zip
    unzip assets.zip -d model/mhr/ && rm assets.zip
    # Flatten: move files from model/mhr/assets/ up to model/mhr/
    mv model/mhr/assets/* model/mhr/ && rmdir model/mhr/assets
    
  • SMPL-H (16 shape parameters, “Extended SMPL+H model”): download from the MANO project page. Used by EgoAllo for diffusion-based pose estimation.

  • SMPL-X: download from the SMPL-X project page. Used by the calibration pipeline and IMU pose reconstruction.

  • EgoAllo checkpoint: run bash scripts/download_checkpoint_and_data.sh, or download manually from Google Drive.

Sample Data#

A sample recording is available for testing the full postprocessing pipeline. Download from Google Drive and extract into received_recordings/:

# Download sample_data.zip from the link above, then:
unzip sample_data.zip -d received_recordings/

This provides a complete raw session (video, metadata, IMU, Aria VRS + MPS outputs) ready for the postprocessing pipeline.

Project Structure#

After setup, the repository is organized as follows:

RoSHI-MoCap/
├── src/
│   ├── egoallo/          # EgoAllo: diffusion model + IMU guidance optimizer
│   ├── pipeline/         # End-to-end pipeline scripts
│   │   ├── 01_receiver.py      # Receive calibration data from iOS app
│   │   ├── 02_calibrate.py     # Calibrate bone-to-sensor rotation offsets
│   │   ├── 03_sync.py          # Synchronize RGB + calibrated IMU data
│   │   ├── 04_inference.py     # Run EgoAllo diffusion-based pose estimation
│   │   ├── 05_visualize.py     # Multi-method visualization
│   │   └── 06_evaluate.py      # Evaluate against OptiTrack ground truth
│   └── utils/             # Shared utilities
├── sam-3d-body/            # SAM 3D Body
├── MHR/                   # Momentum Human Rig
├── hardware/              # IMU hardware driver (ESP32 serial reader)
├── evaluation/            # Evaluation scripts and ground truth
├── scripts/               # Download scripts
├── model/                 # All model files (checkpoints, SMPL-X, etc.)
├── received_recordings/   # Raw + processed session data
│   └── <session_name>/    # One directory per recording session
├── pyproject.toml         # Package configuration
└── requirements_roshi.txt # Pip requirements

2. Calibration App#

The RoSHI calibration app runs on an iOS device and is used to record a short video for sensor-to-bone calibration. See the RoSHI-App repository for build instructions, and Calibration App Setup for detailed setup and receiver configuration.

3. Hardware – IMUs#

TODO: Add hardware setup instructions.

4. Aria Glasses#

TODO: Add detailed Aria setup walkthrough and MPS submission instructions.