Environments API#

All VLA environments live under mikasa_robo_suite.vla.memory_envs and are registered with Gymnasium via @register_env decorators. Import the package to make the IDs available:

import mikasa_robo_suite.vla.memory_envs
import gymnasium as gym
from mikasa_robo_suite.vla.utils.apply_wrappers import apply_mikasa_vla_wrappers

env = gym.make(
    "RememberColor3-VLA-v0",
    num_envs=1,
    obs_mode="rgb",
    control_mode="pd_ee_delta_pose",
)
env = apply_mikasa_vla_wrappers(env)  # canonical per-task wrapper chain

Common Patterns#

Every environment exposes two class-level attributes that VLA wrappers rely on:

class MyEnv(BaseEnv):
    LANGUAGE_INSTRUCTION: str = "..."  # natural-language task description
    # self.task_cue        — cue tensor (or None)
    # self.oracle_info   — privileged hint (or None)

LANGUAGE_INSTRUCTION is stored in mikasa_robo_vla_envs.csv (column language_instruction) for all 90 registered tasks.

Note

The full task list and split labels are in the Environments & Tasks section. The pages below are an API reference, not a task catalogue.

Shell Game Environments#

Shell-game touch tasks for the VLA memory benchmark.

class ShellGameTouchVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: BaseEnv

Remember which cup hides the ball and select that cup later.

This is the simplest shell-game variant in the suite. The ball location is shown once, then cups cover the scene, and the robot must touch the correct cup based only on the remembered location.

Episode flow: - The ball is visible during the cue phase. - Cups cover the candidate locations. - The robot touches the cup it believes hides the ball.

Success (success=True): - The robot must touch the cup covering the memorized ball location.

How to customize: - CUE_PHASE_STEPS changes the amount of observation time before action. - MIN_DIST changes the spacing between the cup positions. - BALL_RADIUS changes the hidden object size. - GOAL_THRESH changes how close the robot must get for the touch to count. - MUG_SCALE changes the cup size.

BALL_RADIUS = 0.02#
CUE_PHASE_STEPS = [1, 5]#
GOAL_THRESH = 0.08#
HEIGHT_OFFSET = 1000#
LANGUAGE_INSTRUCTION = 'Observe which cup hides the ball, wait, then touch that cup.'#
MIN_DIST = 0.2#
MUG_DISPLACEMENT_PENALTY_COEF = 0.1#
MUG_DISPLACEMENT_SUCCESS_THRESH = 0.05#
MUG_SCALE = 1.3#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

class ShellGameTouchVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: ShellGameTouchVLABaseEnv

CUE_PHASE_STEPS = [1, 5]#

Shell-game push tasks for the VLA memory benchmark.

class ShellGamePushVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: BaseEnv

Remember where the hidden ball is and recover it by pushing.

The cue reveals which cup position contains the ball. Once the cups are in place, the robot must use memory to move to the correct location and push the hidden ball into a nearby goal region. The task is about spatial memory plus directional contact, not grasping.

Episode flow: - The ball is visible during the cue phase. - Cups cover the candidate locations. - The robot pushes the ball from the correct hidden location toward the goal.

Success (success=True): - The ball center must finish inside the goal region.

How to customize: - CUE_PHASE_STEPS changes how long the agent can observe the target location. - MIN_DIST changes spacing between the three cup positions. - BALL_RADIUS changes contact geometry and pushing difficulty. - GOAL_THRESH changes how forgiving the final goal region is. - MUG_SCALE changes the size of the cup assets used as shells.

BALL_RADIUS = 0.02#
CUE_PHASE_STEPS = [1, 5]#
GOAL_THRESH = 0.06#
HEIGHT_OFFSET = 1000#
LANGUAGE_INSTRUCTION = 'Observe which cup hides the ball, wait, then push that cup forward.'#
MIN_DIST = 0.2#
MUG_SCALE = 1.3#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

class ShellGamePushVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: ShellGamePushVLABaseEnv

CUE_PHASE_STEPS = [1, 5]#

Shell-game pick tasks for the VLA memory benchmark.

class ShellGamePickVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: BaseEnv

Remember where the hidden ball is and retrieve it by grasping.

The task begins by showing which cup position contains the ball. Then the cups cover the scene and the robot must act from memory: it has to recover the ball from the correct location and place it onto the goal marker.

Episode flow: - The ball is visible during the cue phase. - Cups are positioned over the possible ball locations. - The robot picks the ball from the correct location and lifts it to the goal.

Success (success=True): - The ball must be placed inside the goal region above the correct cup site.

How to customize: - CUE_PHASE_STEPS changes how long the ball is visible before action starts. - MIN_DIST changes spacing between the cup positions. - BALL_RADIUS changes the size of the ball to be grasped. - GOAL_THRESH changes the size of the placement goal region. - MUG_SCALE changes the size of the mug assets used as cups.

BALL_RADIUS = 0.02#
CUE_PHASE_STEPS = [1, 5]#
GOAL_THRESH = 0.05#
HANDLE_OFFSET_X = 0.045#
HANDLE_OFFSET_Y = 0.043#
HEIGHT_OFFSET = 1000#
LANGUAGE_INSTRUCTION = 'Observe which cup hides the ball, wait, then pick up that cup and lift it.'#
MIN_DIST = 0.2#
MUG_SCALE = 1.3#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

class ShellGamePickVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: ShellGamePickVLABaseEnv

CUE_PHASE_STEPS = [1, 5]#

Shell-game tasks with lamp color cues and no cup shuffling.

class ShellGameColorLampTouchVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: BaseEnv

Select a fixed cup using a lamp color cue.

Three cups hide three different colored balls in fixed left, center, and right slots. During the cue phase the balls are visible, so the agent can observe which color is in which slot. After that the cups cover the balls, the lamp turns to one target color, and the agent must touch the cup hiding the ball of that color.

Episode flow: - Cue: the three colored balls are visible in fixed cup slots. - Manipulation: cups cover the balls and the lamp reveals the target color.

Success (success=True): - The robot must touch the cup covering the ball whose color matches the

lamp, and the robot must be static.

How to customize: - CUE_PHASE_STEPS changes how long the color-to-slot mapping is visible. - MIN_DIST changes spacing between the three cup positions. - BALL_RADIUS changes the object size under each cup. - GOAL_THRESH changes how close the TCP must get for the touch to count. - MUG_SCALE changes the size of the mug assets used as cups.

BALL_RADIUS = 0.02#
CUE_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.08#
HEIGHT_OFFSET = 1000.0#
LAMP_BEHIND_OFFSET_X = 0.25#
LAMP_HEIGHT = 0.06#
LANGUAGE_INSTRUCTION = 'Observe which color is under each cup, then touch the cup matching the lamp color.'#
MIN_DIST = 0.2#
MUG_SCALE = 1.3#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

class ShellGameColorLampTouchVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: ShellGameColorLampTouchVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#

Shell-game shuffle-and-touch tasks for the VLA benchmark.

class ShellGameShuffleTouchLongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: ShellGameShuffleTouchVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
NUM_SWAPS: List[int] = [5, 15]#
SHUFFLE_PHASE_STEPS: List[int] = [100, 400]#
class ShellGameShuffleTouchVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: BaseEnv

Track one target cup through a shell-game shuffle.

The robot first observes which cup hides the ball. The cups then swap places several times, and the robot must keep track of the target cup through the entire motion sequence before making its final selection.

Episode flow: - The target cup is visible before the shuffle starts. - Cups swap positions multiple times. - The robot touches the cup it believes still hides the ball.

Success (success=True): - The robot must touch the final cup position that contains the hidden ball.

How to customize: - CUE_PHASE_STEPS changes the observation time before the shuffle begins. - SHUFFLE_PHASE_STEPS changes the overall duration of the shuffle. - NUM_SWAPS changes how many swaps the agent must track. - SWAP_ARC_HEIGHT changes the vertical arc used during swapping. - MIN_DIST changes spacing between cup slots. - BALL_RADIUS and GOAL_THRESH affect object geometry and touch tolerance.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
BALL_RADIUS = 0.02#
CUE_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.08#
HEIGHT_OFFSET = 1000#
LANGUAGE_INSTRUCTION = 'Observe which cup hides the ball, track the cups as they shuffle, then touch the correct cup.'#
MIN_DIST = 0.2#
MUG_DISPLACEMENT_PENALTY_COEF = 0.1#
MUG_DISPLACEMENT_SUCCESS_THRESH = 0.05#
MUG_SCALE = 1.3#
NUM_SWAPS: List[int] = [2, 4]#
QVEL_L2_COEF = 0.0#
SHUFFLE_PHASE_STEPS: List[int] = [20, 35]#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

SWAP_ARC_HEIGHT = 0.06#
agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

class ShellGameShuffleTouchVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: ShellGameShuffleTouchVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
NUM_SWAPS: List[int] = [2, 4]#
SHUFFLE_PHASE_STEPS: List[int] = [20, 35]#

Shell-game shuffle tasks with lamp color cues for VLA.

class ShellGameShuffleColorLampTouchLongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: ShellGameShuffleColorLampTouchVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
NUM_SWAPS: List[int] = [5, 15]#
SHUFFLE_PHASE_STEPS: List[int] = [100, 400]#
class ShellGameShuffleColorLampTouchVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: BaseEnv

Track shuffled cups, then use a color cue to choose the right one.

Three cups initially hide three differently colored balls. After the cups shuffle, a lamp tells the robot which ball color is the target. The robot therefore has to solve two subproblems: track the shuffle, then map the lamp color to the cup that currently hides the matching ball.

Episode flow: - The initial ball-to-cup mapping is shown. - Cups swap positions several times during the shuffle phase. - The lamp reveals the target color and the robot selects a cup.

Success (success=True): - The robot must touch the cup that hides the ball whose color matches the

final lamp cue.

How to customize: - CUE_PHASE_STEPS changes how long the initial mapping is visible. - SHUFFLE_PHASE_STEPS changes how much time the swaps take in total. - NUM_SWAPS changes how many swaps the agent must track. - SWAP_ARC_HEIGHT changes how high cups lift while moving during swaps. - MIN_DIST changes the spacing between the three cup slots. - BALL_RADIUS and GOAL_THRESH change object size and touch tolerance.

BALL_RADIUS = 0.02#
COLOR_RGBA = tensor([[1., 0., 0., 1.],         [0., 1., 0., 1.],         [0., 0., 1., 1.]])#
CUE_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.08#
HEIGHT_OFFSET = 1000.0#
LAMP_BEHIND_OFFSET_X = 0.25#
LAMP_HEIGHT = 0.06#
LAMP_RADIUS = 0.018#
LANGUAGE_INSTRUCTION = 'Observe which color is under each cup, track the cups as they shuffle, then touch the cup matching the lamp color.'#
MIN_DIST = 0.2#
MUG_SCALE = 1.3#
NUM_SWAPS: List[int] = [2, 4]#
SHUFFLE_PHASE_STEPS: List[int] = [20, 35]#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

SWAP_ARC_HEIGHT = 0.06#
agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

class ShellGameShuffleColorLampTouchVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, num_envs=1, reconfiguration_freq=None, **kwargs)[source]#

Bases: ShellGameShuffleColorLampTouchVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
NUM_SWAPS: List[int] = [2, 4]#
SHUFFLE_PHASE_STEPS: List[int] = [20, 35]#

Remember Environments#

Remember-color tasks for the VLA memory benchmark.

class RememberColor3LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberColorVLABaseEnv

COLORS = 3#
CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 450]#
class RememberColor3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberColorVLABaseEnv

COLORS = 3#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
class RememberColor5LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberColorVLABaseEnv

COLORS = 5#
CUE_PHASE_STEPS: List[int] = [100, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 450]#
class RememberColor5VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberColorVLABaseEnv

COLORS = 5#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
class RememberColor9LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberColorVLABaseEnv

COLORS = 9#
CUE_PHASE_STEPS: List[int] = [100, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 450]#
class RememberColor9VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberColorVLABaseEnv

COLORS = 9#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
class RememberColorVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Remember one target color and pick it out after a delay.

The environment briefly shows a single target cube color. Then all cubes are hidden, and finally all candidate cubes reappear in randomized positions. The robot must remember only the color identity and ignore the new spatial arrangement during selection.

Episode flow: - One target color is shown in the center as the cue. - All cubes disappear during the memory phase. - All candidate cubes reappear and the robot selects the correct one.

Success (success=True): - The robot must reach the cube whose color matches the cue and satisfy the

environment reach threshold.

How to customize: - COLORS changes how many candidate colors compete with the target. - CUE_PHASE_STEPS changes how long the cue stays visible. - EMPTY_PHASE_STEPS changes the length of the memory delay. - GOAL_THRESH changes how strict the final selection reach criterion is. - CUBE_HALFSIZE changes cube size and indirectly affects spacing.

ACTION_DELTA_L2_COEF = 0.05#
ACTION_L2_COEF = 0.02#
COLORS = 3#
COLOR_MAPPING = {0: ('Red', [255, 0, 0, 255]), 1: ('Lime', [0, 255, 0, 255]), 2: ('Blue', [0, 0, 255, 255]), 3: ('Yellow', [255, 255, 0, 255]), 4: ('Magenta', [255, 0, 255, 255]), 5: ('Cyan', [0, 255, 255, 255]), 6: ('Maroon', [128, 0, 0, 255]), 7: ('Olive', [255, 128, 0, 255]), 8: ('Teal', [0, 128, 128, 255])}#
CUBE_HALFSIZE = 0.02#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.05#
LANGUAGE_INSTRUCTION = "Observe the cube's color, wait, then touch the cube of the same color."#
MANIP_MIN_CUBE_DISTANCE = 0.09#
MANIP_WIDTH_AXIS = 1#
MANIP_WIDTH_CLAMP = 0.5#
MANIP_WIDTH_SCALE = 2#
QVEL_L2_COEF = 0.01#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Remember-shape tasks for the VLA memory benchmark.

class RememberShape3LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 450]#
SHAPES = 3#
class RememberShape3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 3#
class RememberShape5LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 450]#
SHAPES = 5#
class RememberShape5VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 5#
class RememberShape9LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 450]#
SHAPES = 9#
class RememberShape9VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 9#
class RememberShapeVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Remember one target shape and find it again after a delay.

The cue presents a single target geometry. After a memory phase, several shapes reappear in new positions and the robot must identify the matching geometry regardless of where it was originally shown.

Episode flow: - One target shape is shown at the center as the cue. - All shapes disappear during the memory phase. - All shapes reappear and the robot selects the matching one.

Success (success=True): - The robot must reach the object whose geometry matches the cue.

How to customize: - SHAPES changes how many different shapes appear in the scene. - SHAPE_MAPPING changes which procedural geometries are actually used. - CUE_PHASE_STEPS and EMPTY_PHASE_STEPS control cue duration and memory delay. - GOAL_THRESH changes how strict the final reach criterion is. - SHAPE_SCALE changes the size of the generated objects.

ACTION_DELTA_L2_COEF = 0.05#
ACTION_L2_COEF = 0.02#
APPEAR_SETTLE_STEPS = 3#
COLOR = [0, 0, 255, 255]#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.05#
LANGUAGE_INSTRUCTION = "Observe the object's shape, wait, then touch the object of the same shape."#
MANIP_MIN_SHAPE_DISTANCE = 0.09#
MANIP_WIDTH_AXIS = 1#
MANIP_WIDTH_CLAMP = 0.5#
MANIP_WIDTH_SCALE = 2#
QVEL_L2_COEF = 0.01#
SHAPES = 3#
SHAPE_MAPPING = {0: 'cube', 1: 'sphere', 2: 'cylinder', 3: 'cross', 4: 'torus', 5: 'star', 6: 'pyramide', 7: 't_shape', 8: 'crescent'}#
SHAPE_SCALE = 0.02#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Remember-shape-and-color tasks for the VLA memory benchmark.

class RememberShapeAndColor3x2LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeAndColorVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 450]#
SHAPES = 6#
class RememberShapeAndColor3x2VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeAndColorVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 6#
class RememberShapeAndColor3x3LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeAndColorVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 450]#
SHAPES = 9#
class RememberShapeAndColor3x3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeAndColorVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 9#
class RememberShapeAndColor5x3LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeAndColorVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 450]#
SHAPES = 15#
class RememberShapeAndColor5x3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: RememberShapeAndColorVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 15#
class RememberShapeAndColorVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Remember a full object identity defined by both shape and color.

The cue shows one target object, but the later scene contains many objects that may share only the shape or only the color. The robot must therefore retain the full conjunction, not just a single attribute.

Episode flow: - One target object with a specific shape and color is shown. - All objects disappear for the memory delay. - All objects reappear in randomized positions and the robot selects one.

Success (success=True): - The robot must reach the object whose shape and color both match the cue.

How to customize: - SHAPES changes how many shape-color combinations appear in the episode. - BASE_SHAPES changes the geometry vocabulary available to the task. - COLOR_PALETTE changes the color vocabulary combined with those shapes. - CUE_PHASE_STEPS and EMPTY_PHASE_STEPS change cue duration and memory delay. - GOAL_THRESH changes how strict the final reach criterion is. - SHAPE_SCALE changes the size of all generated objects.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
APPEAR_SETTLE_STEPS = 3#
BASE_SHAPES = {0: 'cube', 1: 'sphere', 2: 't_shape', 3: 'cross', 4: 'torus'}#
COLOR_PALETTE = {0: array([1., 0., 0., 1.]), 1: array([0., 1., 0., 1.]), 2: array([0., 0., 1., 1.])}#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.05#
LANGUAGE_INSTRUCTION = "Observe the object's shape and color, wait, then touch the object of the same shape and color."#
MANIP_MIN_DISTANCE = 0.09#
MANIP_WIDTH_AXIS = 1#
MANIP_WIDTH_CLAMP = 0.5#
MANIP_WIDTH_SCALE = 2#
QVEL_L2_COEF = 0.0#
SHAPES = 6#
SHAPE_SCALE = 0.02#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Find-Imposter Environments#

Find-the-imposter-color tasks for the VLA memory benchmark.

class FindImposterColor3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: FindImposterColorVLABaseEnv

COLORS = 3#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
class FindImposterColor5VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: FindImposterColorVLABaseEnv

COLORS = 5#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
class FindImposterColor9VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: FindImposterColorVLABaseEnv

COLORS = 9#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
class FindImposterColorVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Find the cube whose color was NOT present in the first phase.

Episode flow: - Phase 1 (cue): COLORS-1 cubes are shown at spread positions.

One color from the pool is deliberately hidden.

  • Phase 2 (empty): All cubes disappear.

  • Phase 3 (manip): All COLORS cubes appear at spread positions.

    Touch the cube whose color was absent in the cue.

Success: TCP within GOAL_THRESH of the imposter cube in the manipulation phase.

ACTION_DELTA_L2_COEF = 0.05#
ACTION_L2_COEF = 0.02#
COLORS = 3#
COLOR_MAPPING = {0: ('Red', [255, 0, 0, 255]), 1: ('Lime', [0, 255, 0, 255]), 2: ('Blue', [0, 0, 255, 255]), 3: ('Yellow', [255, 255, 0, 255]), 4: ('Magenta', [255, 0, 255, 255]), 5: ('Cyan', [0, 255, 255, 255]), 6: ('Maroon', [128, 0, 0, 255]), 7: ('Olive', [255, 128, 0, 255]), 8: ('Teal', [0, 128, 128, 255])}#
CUBE_HALFSIZE = 0.02#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.05#
LANGUAGE_INSTRUCTION = 'Observe the cubes shown, wait, then touch the cube whose color was not present before.'#
MANIP_MIN_CUBE_DISTANCE = 0.09#
MANIP_WIDTH_AXIS = 1#
MANIP_WIDTH_CLAMP = 0.5#
MANIP_WIDTH_SCALE = 2#
QVEL_L2_COEF = 0.01#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Find-the-imposter-shape tasks for the VLA memory benchmark.

class FindImposterShape3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: FindImposterShapeVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 3#
class FindImposterShape5VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: FindImposterShapeVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 5#
class FindImposterShape9VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: FindImposterShapeVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 9#
class FindImposterShapeVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Find the shape whose geometry was NOT present in the first phase.

All shapes share the same blue color; only geometry distinguishes them.

Episode flow: - Phase 1 (cue): SHAPES-1 shapes are shown at spread positions.

One geometry from the pool is deliberately hidden.

  • Phase 2 (empty): All shapes disappear.

  • Phase 3 (manip): All SHAPES objects appear at spread positions.

    Touch the shape whose geometry was absent in the cue.

Success: TCP within GOAL_THRESH of the imposter shape in the manipulation phase.

ACTION_DELTA_L2_COEF = 0.05#
ACTION_L2_COEF = 0.02#
COLOR = [0, 0, 255, 255]#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.05#
LANGUAGE_INSTRUCTION = 'Observe the shapes shown, wait, then touch the object whose shape was not present before.'#
MANIP_MIN_SHAPE_DISTANCE = 0.09#
MANIP_WIDTH_AXIS = 1#
MANIP_WIDTH_CLAMP = 0.5#
MANIP_WIDTH_SCALE = 2#
QVEL_L2_COEF = 0.01#
SHAPES = 3#
SHAPE_MAPPING = {0: 'cube', 1: 'sphere', 2: 'cylinder', 3: 'cross', 4: 'torus', 5: 'star', 6: 'pyramide', 7: 't_shape', 8: 'crescent'}#
SHAPE_SCALE = 0.02#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Find-the-imposter-shape-and-color tasks for the VLA memory benchmark.

class FindImposterShapeAndColor3x2VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: FindImposterShapeAndColorVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 6#
class FindImposterShapeAndColor3x3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: FindImposterShapeAndColorVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 9#
class FindImposterShapeAndColor5x3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: FindImposterShapeAndColorVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SHAPES = 15#
class FindImposterShapeAndColorVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Find the object whose (shape, color) combination was NOT in the first phase.

Episode flow: - Phase 1 (cue): SHAPES-1 objects are shown at spread positions.

One (shape, color) combo from the pool is deliberately hidden.

  • Phase 2 (empty): All objects disappear.

  • Phase 3 (manip): All SHAPES objects appear at spread positions.

    Touch the object whose combo was absent in the cue.

Success: TCP within GOAL_THRESH of the imposter object in the manipulation phase.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
BASE_SHAPES = {0: 'cube', 1: 'sphere', 2: 't_shape', 3: 'cross', 4: 'torus'}#
COLOR_PALETTE = {0: array([1., 0., 0., 1.]), 1: array([0., 1., 0., 1.]), 2: array([0., 0., 1., 1.])}#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.05#
LANGUAGE_INSTRUCTION = 'Observe the objects shown, wait, then touch the object whose shape and color combination was not present before.'#
MANIP_MIN_DISTANCE = 0.09#
MANIP_WIDTH_AXIS = 1#
MANIP_WIDTH_CLAMP = 0.5#
MANIP_WIDTH_SCALE = 2#
QVEL_L2_COEF = 0.0#
SHAPES = 6#
SHAPE_SCALE = 0.02#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Colors Sequence Environments#

Sequential color-memory tasks for the VLA benchmark.

class SeqOfColors3LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: SeqOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 400]#
SEQUENCE_LENGTH = 3#
class SeqOfColors3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: SeqOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SEQUENCE_LENGTH = 3#
class SeqOfColors5LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: SeqOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 400]#
SEQUENCE_LENGTH = 5#
class SeqOfColors5VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: SeqOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SEQUENCE_LENGTH = 5#
class SeqOfColors7LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: SeqOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 400]#
SEQUENCE_LENGTH = 7#
class SeqOfColors7VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: SeqOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SEQUENCE_LENGTH = 7#
class SeqOfColorsVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Remember a color sequence, but only recover the set of its items.

The cue is presented as an ordered sequence, but during manipulation the robot is allowed to touch the target colors in any order. This makes the task easier than ChainOfColors, because the agent must remember membership but not replay the temporal order.

Episode flow: - A sequence of cue colors is shown one by one. - All cubes disappear during the memory phase. - All cubes reappear and the robot selects the target subset.

Success (success=True): - The robot must touch every color that appeared in the cue sequence and

avoid touching colors that were not part of that sequence.

How to customize: - SEQUENCE_LENGTH changes how many cue items become targets. - COLORS changes how many total candidate colors exist. - CUE_STEP_DURATION changes the visibility duration of each cue item. - EMPTY_PHASE_STEPS changes the length of the memory delay. - GOAL_THRESH changes how close the robot must get for a touch to count. - CUBE_HALFSIZE changes cube size and spacing.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
BUTTON_BASE_HALF_SIZE = array([0.065, 0.065, 0.015], dtype=float32)#
BUTTON_CAP_HALF_HEIGHT = 0.014#
BUTTON_CAP_RADIUS = 0.03#
BUTTON_CAP_TRAVEL = 0.014#
BUTTON_HIDDEN_Z = 1000.0#
BUTTON_PRESS_EVENT_RATIO = 0.35#
BUTTON_PRESS_XY_RADIUS = 0.065#
BUTTON_PRESS_Z_MARGIN = 0.03#
BUTTON_RELEASE_READY_RATIO = 0.2#
BUTTON_X_SHIFT_TOWARD_ROBOT = -0.04#
COLORS = 9#
COLOR_MAPPING = {0: ('Red', [255, 0, 0, 255]), 1: ('Lime', [0, 255, 0, 255]), 2: ('Blue', [0, 0, 255, 255]), 3: ('Yellow', [255, 255, 0, 255]), 4: ('Magenta', [255, 0, 255, 255]), 5: ('Cyan', [0, 255, 255, 255]), 6: ('Maroon', [128, 0, 0, 255]), 7: ('Olive', [255, 128, 0, 255]), 8: ('Teal', [0, 128, 128, 255])}#
CUBE_DISPLACEMENT_PENALTY_COEF = 20.0#
CUBE_HALFSIZE = 0.02#
CUE_STEP_DURATION: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.05#
LANGUAGE_INSTRUCTION = 'Observe which colored cubes appear during the cue, wait, then touch all of them in any order and press the center button.'#
LIFT_CONFIRM_TOL = 0.015#
MAX_ALLOWED_CUBE_DISPLACEMENT = 0.06#
QVEL_L2_COEF = 0.0#
REQUIRED_LIFT_HEIGHT = 0.1#
SEQUENCE_LENGTH = 5#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Ordered color-chain memory tasks for the VLA benchmark.

class ChainOfColors3LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: ChainOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 400]#
SEQUENCE_LENGTH = 3#
class ChainOfColors3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: ChainOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SEQUENCE_LENGTH = 3#
class ChainOfColors5LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: ChainOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 400]#
SEQUENCE_LENGTH = 5#
class ChainOfColors5VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: ChainOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SEQUENCE_LENGTH = 5#
class ChainOfColors7LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: ChainOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 400]#
SEQUENCE_LENGTH = 7#
class ChainOfColors7VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: ChainOfColorsVLABaseEnv

CUE_STEP_DURATION: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SEQUENCE_LENGTH = 7#
class ChainOfColorsVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Remember an ordered color sequence and replay it exactly.

Colors are shown one at a time during the cue. After the delay, all cubes reappear and the robot must touch them in the same order in which they were presented. This makes the task stricter than unordered color recall because the agent must preserve both identity and temporal order.

Episode flow: - The environment presents a sequence of cue colors one by one. - All cubes disappear for the memory phase. - All cubes return and the robot starts replaying the sequence.

Success (success=True): - Every target color must be touched in the correct order, with no wrong or

out-of-order selections.

How to customize: - SEQUENCE_LENGTH changes how many ordered items must be remembered. - COLORS changes the total number of available color choices. - CUE_STEP_DURATION changes how long each individual cue item is shown. - EMPTY_PHASE_STEPS changes the memory delay before manipulation starts. - GOAL_THRESH controls how close the robot must get for a touch to count. - CUBE_HALFSIZE changes the cube geometry and indirectly the scene layout.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
BUTTON_BASE_HALF_SIZE = array([0.065, 0.065, 0.015], dtype=float32)#
BUTTON_CAP_HALF_HEIGHT = 0.014#
BUTTON_CAP_RADIUS = 0.03#
BUTTON_CAP_TRAVEL = 0.014#
BUTTON_HIDDEN_Z = 1000.0#
BUTTON_PRESS_EVENT_RATIO = 0.35#
BUTTON_PRESS_XY_RADIUS = 0.065#
BUTTON_PRESS_Z_MARGIN = 0.03#
BUTTON_RELEASE_READY_RATIO = 0.2#
BUTTON_X_SHIFT_TOWARD_ROBOT = -0.04#
COLORS = 9#
COLOR_MAPPING = {0: ('Red', [255, 0, 0, 255]), 1: ('Lime', [0, 255, 0, 255]), 2: ('Blue', [0, 0, 255, 255]), 3: ('Yellow', [255, 255, 0, 255]), 4: ('Magenta', [255, 0, 255, 255]), 5: ('Cyan', [0, 255, 255, 255]), 6: ('Maroon', [128, 0, 0, 255]), 7: ('Olive', [255, 128, 0, 255]), 8: ('Teal', [0, 128, 128, 255])}#
CUBE_DISPLACEMENT_PENALTY_COEF = 20.0#
CUBE_HALFSIZE = 0.02#
CUE_STEP_DURATION: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.05#
LANGUAGE_INSTRUCTION = 'Observe which colored cubes appear during the cue, wait, then touch all of them in the same order as the cubes were shown and press the center button.'#
LIFT_CONFIRM_TOL = 0.015#
MAX_ALLOWED_CUBE_DISPLACEMENT = 0.06#
QVEL_L2_COEF = 0.0#
REQUIRED_LIFT_HEIGHT = 0.1#
SEQUENCE_LENGTH = 5#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Bunch-of-colors memory tasks for the VLA benchmark.

class BunchOfColors3LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BunchOfColorsVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 400]#
SEQUENCE_LENGTH = 3#
class BunchOfColors3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BunchOfColorsVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SEQUENCE_LENGTH = 3#
class BunchOfColors5LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BunchOfColorsVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 400]#
SEQUENCE_LENGTH = 5#
class BunchOfColors5VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BunchOfColorsVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SEQUENCE_LENGTH = 5#
class BunchOfColors7LongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BunchOfColorsVLABaseEnv

CUE_PHASE_STEPS: List[int] = [10, 100]#
EMPTY_PHASE_STEPS: List[int] = [50, 400]#
SEQUENCE_LENGTH = 7#
class BunchOfColors7VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BunchOfColorsVLABaseEnv

CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
SEQUENCE_LENGTH = 7#
class BunchOfColorsVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Remember a set of colors shown at once, then recover that set later.

During the cue, several target colors are shown simultaneously. After a delay, all cubes reappear in randomized positions and the robot must touch every cube that belonged to the original set while avoiding distractors. Unlike sequence tasks, order does not matter here: only set membership does.

Episode flow: - A target subset of colors is displayed together. - All cubes disappear for a short memory delay. - All cubes return and the robot starts selecting targets.

Success (success=True): - The robot must touch all target colors and avoid wrong selections.

How to customize: - SEQUENCE_LENGTH controls how many target colors the agent must remember. - COLORS controls how many total color choices exist in the scene. - CUE_PHASE_STEPS controls how long the target set is visible. - EMPTY_PHASE_STEPS controls the length of the memory gap before action. - GOAL_THRESH controls how close the tool center point must get for a cube

touch to count.

  • CUBE_HALFSIZE changes cube size, which also affects spacing and contact geometry.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
BUTTON_BASE_HALF_SIZE = array([0.065, 0.065, 0.015], dtype=float32)#
BUTTON_CAP_HALF_HEIGHT = 0.014#
BUTTON_CAP_RADIUS = 0.03#
BUTTON_CAP_TRAVEL = 0.014#
BUTTON_HIDDEN_Z = 1000.0#
BUTTON_PRESS_EVENT_RATIO = 0.35#
BUTTON_PRESS_XY_RADIUS = 0.065#
BUTTON_PRESS_Z_MARGIN = 0.03#
BUTTON_RELEASE_READY_RATIO = 0.2#
BUTTON_X_SHIFT_TOWARD_ROBOT = -0.04#
COLORS = 9#
COLOR_MAPPING = {0: ('Red', [255, 0, 0, 255]), 1: ('Lime', [0, 255, 0, 255]), 2: ('Blue', [0, 0, 255, 255]), 3: ('Yellow', [255, 255, 0, 255]), 4: ('Magenta', [255, 0, 255, 255]), 5: ('Cyan', [0, 255, 255, 255]), 6: ('Maroon', [128, 0, 0, 255]), 7: ('Olive', [255, 128, 0, 255]), 8: ('Teal', [0, 128, 128, 255])}#
CUBE_DISPLACEMENT_PENALTY_COEF = 20.0#
CUBE_HALFSIZE = 0.02#
CUE_PHASE_STEPS: List[int] = [1, 5]#
EMPTY_PHASE_STEPS: List[int] = [1, 5]#
GOAL_THRESH = 0.05#
LANGUAGE_INSTRUCTION = 'Observe which colored cubes appear during the cue, wait, then touch all of them in any order and press the center button.'#
LIFT_CONFIRM_TOL = 0.015#
MAX_ALLOWED_CUBE_DISPLACEMENT = 0.06#
QVEL_L2_COEF = 0.0#
REQUIRED_LIFT_HEIGHT = 0.1#
SEQUENCE_LENGTH = 5#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Intercept Environments#

Interception-and-push tasks for the VLA benchmark.

class InterceptFastVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: InterceptVLABaseEnv

VELOCITY_RANGE = (0.75, 1.0)#
class InterceptMediumVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: InterceptVLABaseEnv

VELOCITY_RANGE = (0.5, 0.75)#
class InterceptSlowVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: InterceptVLABaseEnv

VELOCITY_RANGE = (0.25, 0.5)#
class InterceptVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Intercept a moving ball and push it into a target region.

A ball starts with random motion, and the robot must make contact at the right time and direction so that the ball rolls into the goal area. Unlike the grasping variant, the task ends with successful redirection rather than object pickup.

Episode flow: - The ball is launched with sampled initial velocity. - The robot moves to an effective hitting position behind the ball. - The ball is redirected toward the goal region.

Success (success=True): - The ball center must end up inside the goal radius.

How to customize: - VELOCITY_RANGE changes how difficult interception timing becomes. - BALL_RADIUS changes contact geometry and how easy the ball is to push. - GOAL_RADIUS changes how forgiving the final placement criterion is. - ACTION_L2_COEF, ACTION_DELTA_L2_COEF, and QVEL_L2_COEF can be used

to penalize unstable motions if needed.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
BALL_RADIUS: float = 0.02#
GOAL_RADIUS: float = 0.1#
LANGUAGE_INSTRUCTION = 'Intercept the rolling ball by moving to its path and deflecting it toward the target.'#
QVEL_L2_COEF = 0.0#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

VELOCITY_RANGE = (0.0, 0.0)#
agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Interception-and-grasp tasks for the VLA benchmark.

class InterceptGrabFastVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: InterceptGrabVLABaseEnv

VELOCITY_RANGE = (0.75, 1.0)#
class InterceptGrabMediumVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: InterceptGrabVLABaseEnv

VELOCITY_RANGE = (0.5, 0.75)#
class InterceptGrabSlowVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: InterceptGrabVLABaseEnv

VELOCITY_RANGE = (0.25, 0.5)#
class InterceptGrabVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Intercept a moving ball and finish with a stable grasp.

A ball is launched across the table with random velocity. The robot has no separate observation phase: it must react immediately, intercept the ball, close the gripper around it, and settle into a stable final state.

Episode flow: - The ball is spawned with a sampled initial velocity. - The robot moves to intercept its trajectory. - The robot grasps the ball and stabilizes.

Success (success=True): - The ball must be grasped and the robot must be static at the end of the

episode step.

How to customize: - VELOCITY_RANGE changes how fast the ball moves and therefore how much

anticipation the policy needs.

  • BALL_RADIUS changes grasp geometry and contact difficulty.

  • ACTION_L2_COEF, ACTION_DELTA_L2_COEF, and QVEL_L2_COEF can be used to regularize aggressive or jerky behavior.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
BALL_RADIUS: float = 0.02#
LANGUAGE_INSTRUCTION = 'Intercept the rolling ball and grasp it to stop it.'#
QVEL_L2_COEF = 0.0#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

VELOCITY_RANGE = (0.0, 0.0)#
agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Rotate Environments#

Lenient rotation-control tasks for the VLA benchmark.

class RotateLenientPosNegVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, angle_threshold=0.1, **kwargs)[source]#

Bases: RotateLenientVLABaseEnv

MODE = 'pos_neg_angle'#
class RotateLenientPosVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, angle_threshold=0.1, **kwargs)[source]#

Bases: RotateLenientVLABaseEnv

MODE = 'pos_angle'#
class RotateLenientVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, angle_threshold=0.1, **kwargs)[source]#

Bases: BaseEnv

Rotate a peg to a target angle when only orientation matters.

A peg is placed on the table with random initial orientation, and the task provides a target rotation. The robot must rotate the peg until the angle is correct. Unlike the strict variant, translational drift is tolerated here.

Episode flow: - The environment samples an initial peg orientation and a target angle. - The robot contacts the peg and rotates it. - The episode checks the final angular error.

Success (success=True): - The peg angle must be within angle_threshold of the target and the robot

must be static.

How to customize: - MODE controls how target angles are sampled. - angle_threshold changes how precise the rotation must be. - PEG_HALF_WIDTH and PEG_HALF_LENGTH change the peg geometry and how it

behaves under contact.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
LANGUAGE_INSTRUCTION_TEMPLATE = 'Rotate the peg by {angle_deg} degrees to match the target angle.'#
MODE = 'pos_angle'#
PEG_HALF_LENGTH = 0.12#
PEG_HALF_WIDTH = 0.025#
QVEL_L2_COEF = 0.0#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Strict rotation-control tasks for the VLA benchmark.

class RotateStrictPosNegVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, angle_threshold=0.1, **kwargs)[source]#

Bases: RotateStrictVLABaseEnv

MODE = 'pos_neg_angle'#
class RotateStrictPosVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, angle_threshold=0.1, **kwargs)[source]#

Bases: RotateStrictVLABaseEnv

MODE = 'pos_angle'#
class RotateStrictVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, angle_threshold=0.1, **kwargs)[source]#

Bases: BaseEnv

Rotate a peg to a target angle while keeping it in place.

This is the stricter rotation variant. The robot must not only match the target angle but also avoid pushing the peg far from its original position. The task therefore tests controlled in-place rotation rather than just any successful reorientation.

Episode flow: - The environment samples an initial peg orientation and a target angle. - The robot contacts the peg and rotates it. - The final state is checked for both angle accuracy and position drift.

Success (success=True): - The peg angle must be within angle_threshold, the peg must stay close to

its initial XY position, and the robot must be static.

How to customize: - MODE controls whether only positive or both positive and negative target

angles can be sampled.

  • angle_threshold changes how precise the final rotation must be.

  • PEG_HALF_WIDTH and PEG_HALF_LENGTH change the peg geometry and the leverage available during contact.

ACTION_DELTA_L2_COEF = 0.0#
ACTION_L2_COEF = 0.0#
LANGUAGE_INSTRUCTION_TEMPLATE = 'Rotate the peg by {angle_deg} degrees to match the target angle while keeping the center of the peg in place.'#
MODE = 'pos_angle'#
PEG_HALF_LENGTH = 0.12#
PEG_HALF_WIDTH = 0.025#
QVEL_L2_COEF = 0.0#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Trace-Shape Environments#

Trace-shape procedural memory task for the VLA benchmark.

class TraceShapeEasyVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TraceShapeVLABaseEnv

Circle only.

AVAILABLE_SHAPES: List[int] = [0]#
class TraceShapeHardVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TraceShapeVLABaseEnv

Circle, square, or triangle.

AVAILABLE_SHAPES: List[int] = [0, 1, 2]#
class TraceShapeMediumVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TraceShapeVLABaseEnv

Circle or square.

AVAILABLE_SHAPES: List[int] = [0, 1]#
class TraceShapeVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Watch a red cube trace a shape, then reproduce it with a green cube.

The robot observes a demonstration in which a red cube traces a geometric contour (circle, square, or triangle) on the table while a lamp glows red. Once the demonstration ends the lamp turns green, the red cube disappears, and the robot must pick up the nearby green cube and replicate the same contour.

Episode flow: - Pre-demo: white lamp, both cubes visible on the table, nothing moves. - Demo: lamp turns red, the red cube traces the target shape. - Action: lamp turns green, red cube hidden, robot traces with green cube.

Success (success=True): - The green cube must visit every checkpoint along the demonstrated path. - After that, the contour must be explicitly closed by returning near the

first checkpoint (start point) within CHECKPOINT_THRESH.

How to customize: - AVAILABLE_SHAPES controls which shapes can appear (difficulty). - NUM_WAYPOINTS controls the path resolution of the demonstration. - NUM_CHECKPOINTS controls how many points are checked for success. - CHECKPOINT_THRESH controls the required tracing accuracy. - SHAPE_RADIUS_RANGE controls shape size randomisation.

ACTION_DELTA_L2_COEF = 0.03#
ACTION_L2_COEF = 0.01#
AVAILABLE_SHAPES: List[int] = [0]#
CHECKPOINT_THRESH = 0.035#
CUBE_HALFSIZE = 0.02#
GREEN_CUBE_OFFSET_X = -0.16#
HEIGHT_OFFSET = 1000.0#
LAMP_BASE_HALF_HEIGHT = 0.008#
LAMP_BASE_RADIUS = 0.018#
LAMP_BULB_RADIUS = 0.012#
LAMP_OFFSET_X = 0.25#
LAMP_STEM_HALF_HEIGHT = 0.02#
LAMP_STEM_RADIUS = 0.004#
LANGUAGE_INSTRUCTION = 'Watch the red cube trace a shape on the table. When the lamp turns green, pick up the green cube and trace exactly the same shape.'#
NUM_CHECKPOINTS = 12#
NUM_WAYPOINTS = 64#
PRE_DEMO_STEPS: List[int] = [3, 8]#
QVEL_L2_COEF = 0.01#
SHAPE_CENTER_X_RANGE = [-0.15, -0.05]#
SHAPE_CENTER_Y_RANGE = [-0.1, 0.1]#
SHAPE_CIRCLE = 0#
SHAPE_RADIUS_RANGE = [0.078, 0.13]#
SHAPE_SQUARE = 1#
SHAPE_TRIANGLE = 2#
STEPS_PER_WAYPOINT = 1#
SUCCESS_BONUS = 30.0#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Trace-shape-sequence procedural memory tasks for the VLA benchmark.

class TraceShapeSeqEasyVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TraceShapeSeqVLABaseEnv

Sequence with circles only.

AVAILABLE_SHAPES: List[int] = [0]#
class TraceShapeSeqHardVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TraceShapeSeqVLABaseEnv

Sequence with circles, squares, and triangles.

AVAILABLE_SHAPES: List[int] = [0, 1, 2]#
class TraceShapeSeqMediumVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TraceShapeSeqVLABaseEnv

Sequence with circles and squares.

AVAILABLE_SHAPES: List[int] = [0, 1]#
class TraceShapeSeqVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Watch a sequence of red traces, then reproduce all traces in order.

The robot observes multiple demonstrations in sequence. For each element, the red cube traces one shape (circle / square / triangle depending on the difficulty variant). During the action phase, the robot must reproduce the same sequence with the green cube. After finishing all traces, the robot must press the submit button.

Success (success=True): - Every sequence element must be completed in order. - A sequence element is complete only when all its checkpoints are visited

and the contour is closed (return near checkpoint[0]).

  • After all elements are complete, the robot must press the button.

ACTION_DELTA_L2_COEF = 0.03#
ACTION_L2_COEF = 0.01#
AVAILABLE_SHAPES: List[int] = [0]#
BUTTON_BASE_HALF_SIZE = array([0.065, 0.065, 0.015], dtype=float32)#
BUTTON_CAP_HALF_HEIGHT = 0.014#
BUTTON_CAP_RADIUS = 0.03#
BUTTON_CAP_TRAVEL = 0.014#
BUTTON_OFFSET_FROM_LAMP_X = 0.02#
BUTTON_OFFSET_FROM_LAMP_Y = 0.16#
BUTTON_PRESS_EVENT_RATIO = 0.35#
BUTTON_PRESS_XY_RADIUS = 0.065#
BUTTON_PRESS_Z_MARGIN = 0.03#
BUTTON_RELEASE_READY_RATIO = 0.2#
CHECKPOINT_THRESH = 0.035#
CUBE_HALFSIZE = 0.02#
GREEN_CUBE_OFFSET_X = -0.16#
HEIGHT_OFFSET = 1000.0#
LAMP_BASE_HALF_HEIGHT = 0.008#
LAMP_BASE_RADIUS = 0.018#
LAMP_BULB_RADIUS = 0.012#
LAMP_OFFSET_X = 0.25#
LAMP_STEM_HALF_HEIGHT = 0.02#
LAMP_STEM_RADIUS = 0.004#
LANGUAGE_INSTRUCTION = 'Watch the red cube trace a sequence of shapes. When the lamp turns green, pick up the green cube and trace the same sequence in order. After finishing all shapes, press the button to submit your answer.'#
MAX_SEQUENCE_LENGTH = 5#
MIN_SEQUENCE_LENGTH = 2#
NUM_CHECKPOINTS = 12#
NUM_WAYPOINTS = 64#
PRE_DEMO_STEPS: List[int] = [3, 8]#
QVEL_L2_COEF = 0.01#
SHAPE_CENTER_X_RANGE = [-0.15, -0.05]#
SHAPE_CENTER_Y_RANGE = [-0.1, 0.1]#
SHAPE_CIRCLE = 0#
SHAPE_RADIUS_RANGE = [0.078, 0.13]#
SHAPE_SQUARE = 1#
SHAPE_TRIANGLE = 2#
STEPS_PER_WAYPOINT = 1#
SUCCESS_BONUS = 40.0#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Batteries-Checker Environments#

Easy Batteries Checker variants for the VLA benchmark.

class BatteriesCheckerEasy12VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerEasyVLABaseEnv

ACTIVE_BATTERY_COUNT = 12#
WORKING_BATTERY_COUNT = 7#
class BatteriesCheckerEasy15VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerEasyVLABaseEnv

ACTIVE_BATTERY_COUNT = 15#
WORKING_BATTERY_COUNT = 9#
class BatteriesCheckerEasy3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerEasyVLABaseEnv

ACTIVE_BATTERY_COUNT = 3#
WORKING_BATTERY_COUNT = 1#
class BatteriesCheckerEasy6VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerEasyVLABaseEnv

ACTIVE_BATTERY_COUNT = 6#
WORKING_BATTERY_COUNT = 3#
class BatteriesCheckerEasy9VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerEasyVLABaseEnv

ACTIVE_BATTERY_COUNT = 9#
WORKING_BATTERY_COUNT = 5#
class BatteriesCheckerEasyVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerVLABaseEnv

Simplified version of Batteries Checker.

In this task, the robot checks batteries one by one using the socket and the lamp. A tested battery does not need to be placed back manually: once the check is complete, the environment returns it to its original tray slot automatically. This keeps the memory component but removes part of the manipulation burden.

Episode flow: - The robot picks one battery and inserts it into the socket. - The lamp reveals whether that battery is working. - The environment snaps the battery back to its home slot. - The robot presses the button to confirm that this battery was checked.

Success (success=True): - Same criterion as the hard variant: all working batteries must be found

through completed check-confirm cycles.

How to customize: - ACTIVE_BATTERY_COUNT changes how many batteries are present and therefore

how much search the agent must do.

  • WORKING_BATTERY_COUNT changes how many positive findings exist in the tray.

  • The hard-variant socket, tray, and button thresholds still control how strict insertion and confirmation are.

LANGUAGE_INSTRUCTION = 'Find all working batteries by inserting each one into the socket, observing the lamp result, and then pressing the button to confirm.'#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

Batteries Checker tasks for the VLA memory benchmark.

class BatteriesChecker12VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerVLABaseEnv

ACTIVE_BATTERY_COUNT = 12#
WORKING_BATTERY_COUNT = 7#
class BatteriesChecker15VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerVLABaseEnv

ACTIVE_BATTERY_COUNT = 15#
WORKING_BATTERY_COUNT = 9#
class BatteriesChecker3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerVLABaseEnv

ACTIVE_BATTERY_COUNT = 3#
WORKING_BATTERY_COUNT = 1#
class BatteriesChecker6VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerVLABaseEnv

ACTIVE_BATTERY_COUNT = 6#
WORKING_BATTERY_COUNT = 3#
class BatteriesChecker9VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BatteriesCheckerVLABaseEnv

ACTIVE_BATTERY_COUNT = 9#
WORKING_BATTERY_COUNT = 5#
class BatteriesCheckerVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Battery-testing task with memory over repeated check cycles.

The scene contains a tray of batteries, a socket with a lamp, and a button. The robot must test batteries one at a time, remember which ones worked, and only then move on to the next candidate. Because each battery must be returned to its original slot before the next confirmation, the task mixes memory with careful sequential manipulation.

Episode flow: - Pick one battery from the tray and insert it into the socket. - Read the lamp outcome: lit means the battery is working. - Remove the battery and return it to its original tray slot. - Press the button to mark that this battery has been checked.

Success (success=True): - Every working battery must be discovered through the full insert-return-

confirm procedure. Partial progress does not count as success.

How to customize: - ACTIVE_BATTERY_COUNT controls how many batteries are present in the

episode and therefore how long the search can become.

  • WORKING_BATTERY_COUNT controls how many of those batteries are true positives that the agent must eventually identify.

  • SOCKET_INSERT_XY_TOL and SOCKET_INSERT_Z_TOL control how precisely a battery must be placed before the environment counts it as inserted.

  • SLOT_RETURN_XY_TOL and SLOT_RETURN_Z_TOL control how accurately the battery must be put back into its home slot.

  • BUTTON_* parameters control the size, travel, and press thresholds of the confirmation button.

  • LAMP_AFTERGLOW_STEPS controls how long the lamp remains visibly on after a successful working-battery test.

ACTION_DELTA_L2_COEF = 0.02#
ACTION_L2_COEF = 0.01#
ACTIVE_BATTERY_COUNT = 15#
BATTERY_COLOR = array([0.27450982, 0.74509805, 0.3529412 , 1.        ], dtype=float32)#
BATTERY_DYNAMIC_FRICTION = 2.0#
BATTERY_HALF_HEIGHT = 0.03#
BATTERY_RADIUS = 0.01#
BATTERY_RESTITUTION = 0.0#
BATTERY_STATIC_FRICTION = 2.0#
BUTTON_BASE_HALF_SIZE = array([0.075, 0.075, 0.015], dtype=float32)#
BUTTON_CAP_HALF_HEIGHT = 0.014#
BUTTON_CAP_RADIUS = 0.033#
BUTTON_CAP_TRAVEL = 0.014#
BUTTON_PRESS_EVENT_RATIO = 0.35#
BUTTON_PRESS_XY_RADIUS = 0.075#
BUTTON_PRESS_Z_MARGIN = 0.03#
BUTTON_RELEASE_READY_RATIO = 0.2#
BUTTON_X_OFFSET_FROM_TRAY = 0.0#
BUTTON_Y_OFFSET_FROM_TRAY = 0.24#
HEIGHT_OFFSET = 1000.0#
LAMP_AFTERGLOW_STEPS = 7#
LAMP_BASE_HALF_HEIGHT = 0.008#
LAMP_BASE_RADIUS = 0.018#
LAMP_BULB_RADIUS = 0.012#
LAMP_HEIGHT = 0.0#
LAMP_OFF_COLOR = array([1., 1., 1., 1.], dtype=float32)#
LAMP_ON_COLOR = array([1.        , 0.9254902 , 0.43137255, 1.        ], dtype=float32)#
LAMP_STEM_HALF_HEIGHT = 0.02#
LAMP_STEM_RADIUS = 0.004#
LAMP_X_OFFSET_FROM_SOCKET = 0.08#
LAMP_Y_OFFSET_FROM_SOCKET = 0.0#
LANGUAGE_INSTRUCTION = 'Find all working batteries by inserting each one into the socket, observing the lamp result, returning it from the socket to its initial slot, and then pressing the button to confirm.'#
NUM_BATTERIES = 15#
QVEL_L2_COEF = 0.01#
SLOT_RETURN_XY_TOL = 0.023#
SLOT_RETURN_Z_TOL = 0.024#
SLOT_SPACING_X = 0.052#
SLOT_SPACING_Y = 0.05#
SLOT_VISUAL_COLOR = array([0.16470589, 0.1882353 , 0.24313726, 1.        ], dtype=float32)#
SOCKET_COLOR = array([0.34509805, 0.36078432, 0.38431373, 1.        ], dtype=float32)#
SOCKET_HALF_SIZE = array([0.048, 0.04 , 0.015], dtype=float32)#
SOCKET_INSERT_XY_TOL = 0.01#
SOCKET_INSERT_Z_TOL = 0.025#
SOCKET_SLOT_COLOR = array([0.1254902 , 0.1254902 , 0.14117648, 1.        ], dtype=float32)#
SOCKET_SLOT_RADIUS = 0.016#
SOCKET_X_OFFSET_FROM_TRAY = 0.2#
STAGE_CONFIRM = 2#
STAGE_INSERT = 0#
STAGE_RETURN = 1#
SUCCESS_BONUS = 40.0#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

TRAY_COLOR = array([0.28235295, 0.3372549 , 0.42352942, 1.        ], dtype=float32)#
TRAY_COLS = 3#
TRAY_HALF_HEIGHT = 0.015#
TRAY_PADDING_X = 0.028#
TRAY_PADDING_Y = 0.026#
TRAY_ROWS = 5#
WORKING_BATTERY_COUNT = 3#
agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Other Environments#

Take-it-back manipulation tasks for the VLA benchmark.

class TakeItBackVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Push an object away, then bring it back in the same episode.

The cube starts on an initial target region. The robot must first move it to a separate goal region. Once that happens, the task switches and the robot must return the same cube back to where it started. The challenge is not only reaching both targets, but also reacting correctly to the stage change.

Episode flow: - The cube starts on the initial region. - The robot pushes the cube to the goal region. - After the stage switch, the robot brings the cube back to the initial region.

Success (success=True): - The cube must first reach the goal region and then end up back inside the

initial region within the same episode.

How to customize: - GOAL_RADIUS changes how large both target regions are and therefore how

forgiving both placement checks become.

  • CUBE_HALFSIZE changes the cube geometry and contact behavior.

  • The stage switch is implicit: if you change goal logic in the task, you are also changing when the return stage begins.

ACTION_DELTA_L2_COEF = 0.08#
ACTION_L2_COEF = 0.01#
CUBE_HALFSIZE: float = 0.02#
GOAL_RADIUS: float = 0.08#
LANGUAGE_INSTRUCTION = 'Push the cube onto the red target, and when the target changes color, return the cube to its original position.'#
QVEL_L2_COEF = 0.01#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

class TakeItBackVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TakeItBackVLABaseEnv

Bases: BlinkCountButtonPressVLABaseEnv

Bases: BlinkCountButtonPressVLABaseEnv

Bases: BlinkCountButtonPressVLABaseEnv

Bases: BlinkCountButtonPressVLABaseEnv

Bases: BlinkCountButtonPressVLABaseEnv

Bases: BlinkCountButtonPressVLABaseEnv

Bases: BaseEnv

Count a visual cue and reproduce it with discrete button presses.

The robot first observes a lamp blinking a sampled number of times. After the cue ends, it must press the button exactly that many times. This task is simple to understand but sensitive to temporal memory and to clean press cycles, because repeated partial contacts should not be mistaken for new presses.

Episode flow: - The lamp waits briefly, then blinks N times. - After the cue phase, the robot starts pressing the red button. - Each press must be followed by a release and lift before the next one. - When done counting, the robot presses the black button to submit.

Success (success=True): - Success is produced only when the black submit button is pressed. - At submit time, the counted number of valid red-button presses must

exactly match the target blink count.

How to customize: - BLINK_COUNT_RANGE changes the memory difficulty by changing how many

blinks the agent may need to remember.

  • PRE_BLINK_OFF_STEPS changes how long the task waits before cue onset.

  • BLINK_ON_STEPS and BLINK_OFF_STEPS change the timing pattern of each blink and therefore how easy the cue is to parse visually.

  • BUTTON_* parameters change the physical button geometry and the press detection thresholds.

  • REQUIRED_LIFT_HEIGHT changes how much the end effector must lift after a press before the next press can be counted reliably.

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Gather-and-recall VLA task: move cubes to a disc and remember a lamp flash color.

class GatherAndRecall1VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: GatherAndRecallVLABaseEnv

N_CUBES: int = 1#
class GatherAndRecall3VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: GatherAndRecallVLABaseEnv

N_CUBES: int = 3#
class GatherAndRecall5VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: GatherAndRecallVLABaseEnv

N_CUBES: int = 5#
class GatherAndRecall7VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: GatherAndRecallVLABaseEnv

N_CUBES: int = 7#
class GatherAndRecall9VLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: GatherAndRecallVLABaseEnv

N_CUBES: int = 9#
class GatherAndRecallVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Move cubes onto a target disc while remembering a brief lamp color flash.

Cubes start in a cluster on one side of the table, with a target disc on the other side. The agent picks up cubes one by one and places them on the disc. While the agent is moving cubes (after the first cube lands on the disc but before the last), a signal lamp briefly flashes one of three colors: red, green, or blue. After all cubes are placed on the disc, the agent must press the button whose color matches the flash.

Episode flow: 1. MOVE phase: pick and place cubes onto the disc. 2. During moving, the lamp flashes a random color once (briefly). 3. PRESS phase: once all cubes are on the disc, press the matching button.

Success (success=True): - All cubes detected on the disc AND the correct color button is pressed.

Failure (failed=True): - A wrong color button is pressed after all cubes are placed.

How to customize: - N_CUBES controls difficulty (more cubes = longer distraction, harder memory). - FLASH_DURATION_STEPS controls how long the lamp stays on ([min, max]). - DISC_RADIUS controls the target disc size. - DISC_ON_THRESH controls the XY distance threshold for cube-on-disc detection.

ACTION_DELTA_L2_COEF = 0.03#
ACTION_L2_COEF = 0.01#
BUTTON_BASE_HALF_SIZE = array([0.04 , 0.04 , 0.015], dtype=float32)#
BUTTON_CAP_HALF_HEIGHT = 0.014#
BUTTON_CAP_RADIUS = 0.025#
BUTTON_CAP_TRAVEL = 0.014#
BUTTON_PRESS_EVENT_RATIO = 0.35#
BUTTON_PRESS_XY_RADIUS = 0.04#
BUTTON_PRESS_Z_MARGIN = 0.03#
BUTTON_SPACING = 0.14#
BUTTON_X_OFFSET_FROM_DISC = -0.22#
CUBE_CLUSTER_CENTER_Y = -0.26#
CUBE_CLUSTER_SPACING_SCALE = 4.5#
CUBE_CLUSTER_X_OFFSET = -0.08#
CUBE_COLORS = [array([1., 0., 0., 1.], dtype=float32), array([0., 1., 0., 1.], dtype=float32), array([0., 0., 1., 1.], dtype=float32)]#
CUBE_HALF_SIZE: float = 0.02#
CUBE_VEL_THRESH = 0.15#
DISC_HALF_HEIGHT: float = 0.003#
DISC_ON_THRESH: float = 0.1#
DISC_RADIUS: float = 0.12#
DISC_X_MAX = 0.08#
DISC_X_MIN = 0.0#
DISC_Y_MAX = 0.18#
DISC_Y_MIN = 0.1#
FAILURE_PENALTY = 25.0#
FLASH_COLORS = [array([1., 0., 0., 1.], dtype=float32), array([0., 1., 0., 1.], dtype=float32), array([0., 0., 1., 1.], dtype=float32)]#
FLASH_DURATION_STEPS: List[int] = [8, 14]#
GRASP_THRESH = 0.05#
HEIGHT_OFFSET: float = 1000.0#
LAMP_BASE_HALF_HEIGHT = 0.008#
LAMP_BASE_RADIUS = 0.018#
LAMP_BULB_RADIUS = 0.012#
LAMP_STEM_HALF_HEIGHT = 0.02#
LAMP_STEM_RADIUS = 0.004#
LAMP_X_OFFSET_FROM_DISC = 0.24#
LAMP_Y_OFFSET_FROM_DISC = 0.06#
LANGUAGE_INSTRUCTION = 'Move all cubes onto the disc. A lamp will briefly flash while you work. After all cubes are placed, press the button matching the flash color.'#
N_CUBES: int = 5#
QVEL_L2_COEF = 0.01#
SUCCESS_BONUS = 50.0#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent

Timed-transfer tasks for the VLA memory benchmark.

The agent must wait a precise number of steps after a visual signal, then transfer a cube from one disc to another.

class TimedTransferEasyLongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TimedTransferVLABaseEnv

Delay = 200 steps, tolerance +/-5 %.

DELAY_STEPS: int = 300#
class TimedTransferEasyVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TimedTransferVLABaseEnv

Delay = 10 steps, tolerance +/-5 %.

DELAY_STEPS: int = 100#
class TimedTransferHardLongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TimedTransferVLABaseEnv

Delay = 1000 steps, tolerance +/-5 %.

DELAY_STEPS: int = 1000#
class TimedTransferHardVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TimedTransferVLABaseEnv

Delay = 100 steps, tolerance +/-5 %.

DELAY_STEPS: int = 200#
class TimedTransferMediumLongVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TimedTransferVLABaseEnv

Delay = 500 steps, tolerance +/-5 %.

DELAY_STEPS: int = 500#
class TimedTransferMediumVLAEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: TimedTransferVLABaseEnv

Delay = 50 steps, tolerance +/-5 %.

DELAY_STEPS: int = 150#
class TimedTransferVLABaseEnv(*args, robot_uids='panda_wristcam', robot_init_qpos_noise=0.02, **kwargs)[source]#

Bases: BaseEnv

Wait a precise number of steps after a signal, then move a cube.

Scene: two flat discs (green and red) on the table, a blue cube resting on the green disc, and a white lamp.

Episode flow: 1. The lamp is off. The blue cube sits on the green disc. 2. After a brief random delay the lamp turns green (the signal). 3. The agent must internally count DELAY_STEPS from the signal. 4. The agent picks the blue cube and places it on the red disc. 5. The cube must be on the red disc within +/-TOLERANCE_FRAC of the target

step. Placing too early or too late is a failure.

Customize difficulty by changing DELAY_STEPS.

ACTION_DELTA_L2_COEF = 0.03#
ACTION_L2_COEF = 0.01#
CUBE_HALF_SIZE = 0.02#
DELAY_STEPS: int = 100#
DISC_HALF_HEIGHT = 0.003#
DISC_RADIUS = 0.07#
DISC_SEPARATION = 0.22#
FAILURE_PENALTY = 25.0#
GOAL_THRESH = 0.05#
HEIGHT_OFFSET = 1000.0#
LAMP_BASE_HALF_HEIGHT = 0.008#
LAMP_BASE_RADIUS = 0.018#
LAMP_BULB_RADIUS = 0.012#
LAMP_FORWARD_OFFSET = 0.22#
LAMP_STEM_HALF_HEIGHT = 0.02#
LAMP_STEM_RADIUS = 0.004#
PRE_SIGNAL_STEPS: List[int] = [1, 3]#
QVEL_L2_COEF = 0.01#
SUCCESS_BONUS = 30.0#
SUPPORTED_ROBOTS: List[str | Tuple[str]] = ['panda', 'panda_wristcam']#

Override this to enforce which robots or tuples of robots together are supported in the task. During env creation, setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups

TOLERANCE_FRAC: float = 0.05#
agent: Panda | PandaWristCam#
compute_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
compute_normalized_dense_reward(obs: Any, action: Tensor | ndarray | Sequence, info: Dict)[source]#
evaluate()[source]#

Evaluate whether the environment is currently in a success state by returning a dictionary with a “success” key or a failure state via a “fail” key

This function may also return additional data that has been computed (e.g. is the robot grasping some object) that may be reused when generating observations and rewards.

By default if not overriden this function returns an empty dictionary

step(action)[source]#

Take a step through the environment with an action. Actions are automatically clipped to the action space.

If action is None, the environment will proceed forward in time without sending any actions/control signals to the agent