MIKASA-Robo-VLA Documentation#

Batteries Checker Easy render preview Batteries Checker Hard render preview Blink Count Button Press render preview Bunch Of Colors render preview Chain Of Colors render preview Find Imposter Color render preview Find Imposter Shape render preview Find Imposter Shape And Color render preview Gather And Recall render preview Intercept render preview Intercept Grab render preview Remember Color render preview Remember Shape render preview Remember Shape And Color render preview Rotate Lenient render preview Rotate Strict render preview Seq Of Colors render preview Shell Game Color Lamp Touch render preview Shell Game Push render preview Shell Game Shuffle Color Lamp Touch render preview Shell Game Shuffle Touch render preview Shell Game Touch render preview Take It Back render preview Timed Transfer render preview Trace Shape render preview Trace Shape Seq render preview

arXiv PyPI HuggingFace GitHub

Quick Links: Installation · Quick Start · Benchmarking · Datasets · Cite

MIKASA-Robo-VLA significantly extends MIKASA-Robo to the VLA setting. It preserves the original benchmark’s focus on memory-intensive tabletop manipulation, while broadening the task suite, introducing language-conditioned evaluation, and providing standardized data export for modern VLA training pipelines.

What changed from MIKASA-Robo (RL release)
  • Task set grows from 32 → 90 registered environments covering 10 memory types (vs 4 in the RL release).

  • Every task ships a natural-language LANGUAGE_INSTRUCTION for VLA conditioning.

  • Episodes are grouped into three horizon splits (Short / Medium / Long) so multi-task training and evaluation are tractable.

  • 22,500 PPO / motion-planning oracle trajectories are released on Hugging Face in RLDS and LeRobotDataset v3 formats — no further conversion needed (6+ millions of transitions).

  • Dense and normalised-dense rewards are calibrated for every task, enabling both offline imitation learning and online RL.

  • The original 32-task RL implementation is available from the mikasa-robo-rl branch and remains under mikasa_robo_suite/rl/ for backwards compatibility.

Pick your path#

Key Features#

  • 90 memory tasks across 10 memory types, horizons 25–2160 steps, multiple difficulty levels.

  • The public benchmark grows from 32 (RL release) to 90 tasks with language instructions for every task.

  • Three horizon splits (Short / Medium / Long) for structured multi-task evaluation.

  • Trajectory collection via PPO oracles and motion planning.

  • 22,500 trajectories (>6 M timesteps) in RLDS and LeRobotDataset v3 formats on Hugging Face.

  • Physics fixes, dense / normalised-dense rewards, and full GPU-parallelised simulation via ManiSkill.

Citation#

If you use MIKASA-Robo-VLA in your research, please cite:

@inproceedings{cherepanov2026memory,
  title     = {Memory, Benchmark \& Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning},
  author    = {Egor Cherepanov and Nikita Kachaev and Alexey Kovalev and Aleksandr Panov},
  booktitle = {The Fourteenth International Conference on Learning Representations},
  year      = {2026},
  url       = {https://openreview.net/forum?id=9cLPurIZMj}
}

Legacy RL Version#

Note

If you need the original RL benchmark from the MIKASA-Robo paper (arXiv:2502.10550), install mikasa-robo-suite==0.0.5 from PyPI or use the mikasa-robo-rl branch. New development targets MIKASA-Robo-VLA. The previous 32-environment RL implementation is still kept under mikasa_robo_suite/rl/ for compatibility.