MIKASA-Robo-VLA Documentation#

Find Imposter Shape And Color render preview

Shell Game Color Lamp Touch render preview

Shell Game Shuffle Color Lamp Touch render preview

Quick Links: Installation · Quick Start · Benchmarking · Datasets · Cite

MIKASA-Robo-VLA significantly extends MIKASA-Robo to the VLA setting. It preserves the original benchmark’s focus on memory-intensive tabletop manipulation, while broadening the task suite, introducing language-conditioned evaluation, and providing standardized data export for modern VLA training pipelines.

What changed from MIKASA-Robo (RL release)

Task set grows from 32 → 90 registered environments covering 10 memory types (vs 4 in the RL release).
Every task ships a natural-language LANGUAGE_INSTRUCTION for VLA conditioning.
Episodes are grouped into three horizon splits (Short / Medium / Long) so multi-task training and evaluation are tractable.
22,500 PPO / motion-planning oracle trajectories are released on Hugging Face in RLDS and LeRobotDataset v3 formats — no further conversion needed (6+ millions of transitions).
Dense and normalised-dense rewards are calibrated for every task, enabling both offline imitation learning and online RL.
The original 32-task RL implementation is available from the mikasa-robo-rl branch and remains under mikasa_robo_suite/rl/ for backwards compatibility.

Pick your path#

“I want to evaluate my VLA model” → Benchmarking (CLI, JSON output, Python API) and the canonical Evaluation Protocol.
“I want to fine-tune a VLA model” → Datasets (RLDS, LeRobotDataset v3) and Observation and Action Space.
“I want to explore tasks” → Environments & Tasks (per-task pages with previews, language instuctions, horizons, and setup parameters).
“I want to know what makes the benchmark important” → Core Concepts (memory taxonomy, episode structure).

Key Features#

90 memory tasks across 10 memory types, horizons 25–2160 steps, multiple difficulty levels.
The public benchmark grows from 32 (RL release) to 90 tasks with language instructions for every task.
Three horizon splits (Short / Medium / Long) for structured multi-task evaluation.
Trajectory collection via PPO oracles and motion planning.
22,500 trajectories (>6 M timesteps) in RLDS and LeRobotDataset v3 formats on Hugging Face.
Physics fixes, dense / normalised-dense rewards, and full GPU-parallelised simulation via ManiSkill.

Guides

Wrappers Cookbook

Citation#

If you use MIKASA-Robo-VLA in your research, please cite:

@inproceedings{cherepanov2026memory,
  title     = {Memory, Benchmark \& Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning},
  author    = {Egor Cherepanov and Nikita Kachaev and Alexey Kovalev and Aleksandr Panov},
  booktitle = {The Fourteenth International Conference on Learning Representations},
  year      = {2026},
  url       = {https://openreview.net/forum?id=9cLPurIZMj}
}

Legacy RL Version#

Note

If you need the original RL benchmark from the MIKASA-Robo paper (arXiv:2502.10550), install mikasa-robo-suite==0.0.5 from PyPI or use the mikasa-robo-rl branch. New development targets MIKASA-Robo-VLA. The previous 32-environment RL implementation is still kept under mikasa_robo_suite/rl/ for compatibility.