MIKASA-Robo-VLA Documentation#
Quick Links: Installation · Quick Start · Benchmarking · Datasets · Cite
MIKASA-Robo-VLA significantly extends MIKASA-Robo to the VLA setting. It preserves the original benchmark’s focus on memory-intensive tabletop manipulation, while broadening the task suite, introducing language-conditioned evaluation, and providing standardized data export for modern VLA training pipelines.
- What changed from MIKASA-Robo (RL release)
Task set grows from 32 → 90 registered environments covering 10 memory types (vs 4 in the RL release).
Every task ships a natural-language
LANGUAGE_INSTRUCTIONfor VLA conditioning.Episodes are grouped into three horizon splits (Short / Medium / Long) so multi-task training and evaluation are tractable.
22,500 PPO / motion-planning oracle trajectories are released on Hugging Face in RLDS and LeRobotDataset v3 formats — no further conversion needed (6+ millions of transitions).
Dense and normalised-dense rewards are calibrated for every task, enabling both offline imitation learning and online RL.
The original 32-task RL implementation is available from the mikasa-robo-rl branch and remains under
mikasa_robo_suite/rl/for backwards compatibility.
Pick your path#
“I want to evaluate my VLA model” → Benchmarking (CLI, JSON output, Python API) and the canonical Evaluation Protocol.
“I want to fine-tune a VLA model” → Datasets (RLDS, LeRobotDataset v3) and Observation and Action Space.
“I want to explore tasks” → Environments & Tasks (per-task pages with previews, language instuctions, horizons, and setup parameters).
“I want to know what makes the benchmark important” → Core Concepts (memory taxonomy, episode structure).
Key Features#
90 memory tasks across 10 memory types, horizons 25–2160 steps, multiple difficulty levels.
The public benchmark grows from 32 (RL release) to 90 tasks with language instructions for every task.
Three horizon splits (Short / Medium / Long) for structured multi-task evaluation.
Trajectory collection via PPO oracles and motion planning.
22,500 trajectories (>6 M timesteps) in RLDS and LeRobotDataset v3 formats on Hugging Face.
Physics fixes, dense / normalised-dense rewards, and full GPU-parallelised simulation via ManiSkill.
Getting Started
The Benchmark
Guides
Citation#
If you use MIKASA-Robo-VLA in your research, please cite:
@inproceedings{cherepanov2026memory,
title = {Memory, Benchmark \& Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning},
author = {Egor Cherepanov and Nikita Kachaev and Alexey Kovalev and Aleksandr Panov},
booktitle = {The Fourteenth International Conference on Learning Representations},
year = {2026},
url = {https://openreview.net/forum?id=9cLPurIZMj}
}
Legacy RL Version#
Note
If you need the original RL benchmark from the MIKASA-Robo paper
(arXiv:2502.10550), install
mikasa-robo-suite==0.0.5 from PyPI or use the
mikasa-robo-rl branch.
New development targets MIKASA-Robo-VLA. The previous 32-environment RL
implementation is still kept under mikasa_robo_suite/rl/ for
compatibility.