mimicgen.scripts package#


mimicgen.scripts.annotate_subtasks module#

A script to playback demonstrations (using visual observations and the pygame renderer) in order to allow a user to annotate portions of the demonstrations. This is useful to annotate the end of each object-centric subtask in each source demonstration used by MimicGen, as an alternative to implementing subtask termination signals directly in the simulation environment.


# specify the sequence of signals that should be annotated and the dataset images to render on-screen python annotate_subtasks.py –dataset /path/to/demo.hdf5 –signals grasp_1 insert_1 grasp_2 –render_image_names agentview_image robot0_eye_in_hand_image

# limit annotation to first 2 demos python annotate_subtasks.py –dataset /path/to/demo.hdf5 –signals grasp_1 insert_1 grasp_2 –render_image_names agentview_image robot0_eye_in_hand_image –n 2

# limit annotation to demo 2 and 3 python annotate_subtasks.py –dataset /path/to/demo.hdf5 –signals grasp_1 insert_1 grasp_2 –render_image_names agentview_image robot0_eye_in_hand_image –n 2 –start 1

# scale up dataset images when rendering to screen by factor of 10 python annotate_subtasks.py –dataset /path/to/demo.hdf5 –signals grasp_1 insert_1 grasp_2 –render_image_names agentview_image robot0_eye_in_hand_image –image_scale 10

mimicgen.scripts.annotate_subtasks.annotate_subtasks_in_trajectory(ep, traj_grp, subtask_signals, screen, video_skip, image_names, playback_rate_grid)#

This function reads all “rgb” observations in the dataset trajectory and writes them into a video.

  • ep (str) – name of hdf5 group for this demo

  • traj_grp (hdf5 file group) – hdf5 group which corresponds to the dataset trajectory to annotate

  • subtask_signals (list) – list of subtask termination signals that will be annotated

  • screen – pygame screen

  • video_skip (int) – determines rate at which environment frames are written to video

  • image_names (list) – determines which image observations are used for rendering. Pass more than one to output a video with multiple image observations concatenated horizontally.

  • playback_rate_grid (Grid) – grid object to easily toggle between different playback rates

mimicgen.scripts.annotate_subtasks.handle_pygame_events(frame_ind, subtask_signals, subtask_ind, rate_obj, need_repeat, annotation_done, playback_rate_grid)#

Reads events from pygame window in order to provide the following keyboard annotation functionality:

up-down | increase / decrease playback speed left-right | seek left / right by N frames spacebar | press and release to annotate the end of a subtask f | next demo and save annotations r | repeat demo and clear annotations

  • frame_ind (int) – index of current frame in demonstration

  • subtask_signals (list) – list of subtask termination signals that we will annotate

  • subtask_ind (int) – index of current subtask (state variable)

  • rate_obj (Rate) – rate object to maintain playback rate

  • need_repeat (bool) – whether the demo should be repeated (state variable)

  • annotation_done (bool) – whether user is done annotating this demo (state variable)

  • playback_rate_grid (Grid) – grid object to easily toggle between different playback rates


end index for current subtask, annotated by human, or None if no annotation subtask_ind (int): possibly updated subtask index need_repeat (bool): possibly updated value annotation_done (bool): possibly updated value seek (int): how much to seek forward or backward in demonstration (value read from user command)

Return type

subtask_end_ind (int or None)

mimicgen.scripts.annotate_subtasks.make_pygame_screen(traj_grp, image_names, image_scale)#

Makes pygame screen.

  • traj_grp (h5py.Group) – group for a demonstration trajectory

  • image_names (list) – list of image names that will be used for rendering

  • image_scale (int) – scaling factor for the image to diplay in window


pygame screen object

Return type



Helper function to print keyboard annotation commands.

mimicgen.scripts.demo_random_action module#

Script that offers an easy way to test random actions in a MimicGen environment. Similar to the demo_random_action.py script from robosuite.


Prints out environment options, and returns the selected env_name choice


Chosen environment name

Return type


mimicgen.scripts.download_datasets module#

Script to download datasets packaged with the repository.

mimicgen.scripts.generate_config_templates module#

Helpful script to generate example config files, one per config class. These should be re-generated when new config options are added, or when default settings in the config classes are modified.


mimicgen.scripts.generate_core_configs module#

We utilize robomimic’s config generator class to easily generate data generation configs for our core set of tasks in the paper. It can be modified easily to generate other configs.

The global variables at the top of the file should be configured manually.

See https://robomimic.github.io/docs/tutorials/hyperparam_scan.html for more info.

mimicgen.scripts.generate_core_configs.make_generator(config_file, settings)#

Implement this function to setup your own hyperparameter scan. Each config generator is created using a base config file (@config_file) and a @settings dictionary that can be used to modify which parameters are set.


An easy way to make multiple config generators by using different settings for each.

mimicgen.scripts.generate_core_training_configs module#

We utilize robomimic’s config generator class to easily generate policy training configs for the core set of experiments in the paper. It can be modified easily to generate other training configs.

See https://robomimic.github.io/docs/tutorials/hyperparam_scan.html for more info.

mimicgen.scripts.generate_core_training_configs.make_gen(base_config, settings, output_dir, mod)#

Specify training configs to generate here.

mimicgen.scripts.generate_core_training_configs.make_generators(base_config, dataset_dir, output_dir)#

An easy way to make multiple config generators by using different settings for each.

mimicgen.scripts.generate_dataset module#

Main data generation script.


# run normal data generation python generate_dataset.py –config /path/to/config.json

# render all data generation attempts on-screen python generate_dataset.py –config /path/to/config.json –render

# render all data generation attempts to a video python generate_dataset.py –config /path/to/config.json –video_path /path/to/video.mp4

# run a quick debug run python generate_dataset.py –config /path/to/config.json –debug

# pause after every subtask to debug data generation python generate_dataset.py –config /path/to/config.json –render –pause_subtask

mimicgen.scripts.generate_dataset.generate_dataset(mg_config, auto_remove_exp=False, render=False, video_path=None, video_skip=5, render_image_names=None, pause_subtask=False)#

Main function to collect a new dataset with MimicGen.

  • mg_config (MG_Config instance) – MimicGen config object

  • auto_remove_exp (bool) – if True, will remove generation folder if it exists, else user will be prompted to decide whether to keep existing folder or not

  • render (bool) – if True, render each data generation attempt on-screen

  • video_path (str or None) – if provided, render the data generation attempts to the provided video path

  • video_skip (int) – skip every nth frame when writing video

  • render_image_names (list of str or None) – if provided, specify camera names to use during on-screen / off-screen rendering to override defaults

  • pause_subtask (bool) – if True, pause after every subtask during generation, for debugging.

mimicgen.scripts.generate_dataset.get_important_stats(new_dataset_folder_path, num_success, num_failures, num_attempts, num_problematic, start_time=None, ep_length_stats=None)#

Return a summary of important stats to write to json.

  • new_dataset_folder_path (str) – path to folder that will contain generated dataset

  • num_success (int) – number of successful trajectories generated

  • num_failures (int) – number of failed trajectories

  • num_attempts (int) – number of total attempts

  • num_problematic (int) – number of problematic trajectories that failed due to a specific exception that was caught

  • start_time (float or None) – starting time for this run from time.time()

  • ep_length_stats (dict or None) – if provided, should have entries that summarize the episode length statistics over the successfully generated trajectories


dictionary with useful summary of statistics

Return type

important_stats (dict)


mimicgen.scripts.generate_robot_transfer_configs module#

We utilize robomimic’s config generator class to easily generate data generation configs for the robot transfer set of experiments in the paper, where we use source data collected on the Panda arm to generate demonstrations for other robot arms. It can be modified easily to generate other configs.

The global variables at the top of the file should be configured manually.

See https://robomimic.github.io/docs/tutorials/hyperparam_scan.html for more info.

mimicgen.scripts.generate_robot_transfer_configs.make_generator(config_file, settings)#

Implement this function to setup your own hyperparameter scan. Each config generator is created using a base config file (@config_file) and a @settings dictionary that can be used to modify which parameters are set.


An easy way to make multiple config generators by using different settings for each.

mimicgen.scripts.generate_robot_transfer_training_configs module#

We utilize robomimic’s config generator class to easily generate policy training configs for the robot transfer set of experiments in the paper, where we use source data collected on the Panda arm to generate demonstrations for other robot arms. It can be modified easily to generate other training configs.

See https://robomimic.github.io/docs/tutorials/hyperparam_scan.html for more info.

mimicgen.scripts.generate_robot_transfer_training_configs.make_gen(base_config, settings, output_dir, mod)#

Specify training configs to generate here.

mimicgen.scripts.generate_robot_transfer_training_configs.make_generators(base_config, dataset_dir, output_dir)#

An easy way to make multiple config generators by using different settings for each.

mimicgen.scripts.generate_training_configs_for_public_datasets module#

Script to generate json configs for use with robomimic to reproduce the policy learning results in the MimicGen paper.

mimicgen.scripts.generate_training_configs_for_public_datasets.generate_all_configs(base_config_dir, base_dataset_dir, base_output_dir)#

Helper function to generate all configs.

  • base_config_dir (str) – base directory to place generated configs

  • base_dataset_dir (str) – path to directory where datasets are on disk. Directory structure is expected to be consistent with the output of @make_dataset_dirs in the download_datasets.py script.

  • base_output_dir (str) – directory to save training results to. If None, will use the directory from the default algorithm configs.

  • algo_to_config_modifier (dict) – dictionary that maps algo name to a function that modifies configs to add algo hyperparameter settings, given the task, dataset, and hdf5 types.

mimicgen.scripts.generate_training_configs_for_public_datasets.generate_experiment_config(base_exp_name, base_config_dir, base_dataset_dir, base_output_dir, dataset_type, task_name, obs_modality)#

Helper function to generate a config for a particular experiment.

  • base_exp_name (str) – name that identifies this set of experiments

  • base_config_dir (str) – base directory to place generated configs

  • base_dataset_dir (str) – path to directory where datasets are on disk. Directory structure is expected to be consistent with the output of @make_dataset_dirs in the download_datasets.py script.

  • base_output_dir (str) – directory to save training results to. If None, will use the directory from the default algorithm configs.

  • dataset_type (str) – identifies the type of dataset (e.g. source human data, core experiment data, object transfer data)

  • task_name (str) – identify task that dataset was collected on

  • obs_modality (str) – observation modality (either low-dim or image)

mimicgen.scripts.generate_training_configs_for_public_datasets.modify_config_for_dataset(config, dataset_type, task_name, obs_modality, base_dataset_dir)#

Modifies a Config object with experiment, training, and observation settings to correspond to experiment settings for the dataset of type @dataset_type collected on @task_name. This mostly just sets the rollout horizon.

  • config (Config instance) – config to modify

  • dataset_type (str) – identifies the type of dataset (e.g. source human data, core experiment data, object transfer data)

  • task_name (str) – identify task that dataset was collected on

  • obs_modality (str) – observation modality (either low-dim or image)

  • base_dataset_dir (str) – path to directory where datasets are on disk. Directory structure is expected to be consistent with the output of @make_dataset_dirs in the download_datasets.py script.

mimicgen.scripts.generate_training_configs_for_public_datasets.set_obs_config(config, obs_modality)#

Sets specific config settings related to running low-dim or image experiments.

  • config (BCConfig instance) – config to modify

  • obs_modality (str) – observation modality (either low-dim or image)


Sets RNN settings in config.

  • config (BCConfig instance) – config to modify

  • obs_modality (str) – observation modality (either low-dim or image)

mimicgen.scripts.get_reset_videos module#

Helper script to get task reset distribution videos.

mimicgen.scripts.get_reset_videos.make_reset_video(env_name, robot_name, camera_name, video_path, num_frames, gripper_name=None)#

mimicgen.scripts.get_source_info module#

Helper script to report source dataset information. It verifies that the dataset has a “datagen_info” field for the first episode and prints its structure.

mimicgen.scripts.merge_hdf5 module#

Script to merge all hdf5s if scripts/generate_dataset.py is incomplete, and doesn’t make it to the line that merges all the hdf5s.


Main function to collect a new dataset using trajectory transforms from an existing dataset.

mimicgen.scripts.prepare_src_dataset module#

Script to extract information needed for data generation from low-dimensional simulation states in a source dataset and add it to the source dataset. Basically a stripped down version of dataset_states_to_obs.py script in the robomimic codebase, with a handful of modifications.

Example usage:

# prepare a source dataset collected on robosuite Stack task python prepare_src_dataset.py –dataset /path/to/stack.hdf5 –env_interface MG_Stack –env_interface_type robosuite

# prepare a source dataset collected on robosuite Square task, but only use first 10 demos, and write output to new hdf5 python prepare_src_dataset.py –dataset /path/to/square.hdf5 –env_interface MG_Square –env_interface_type robosuite –n 10 –output /tmp/square_new.hdf5

mimicgen.scripts.prepare_src_dataset.extract_datagen_info_from_trajectory(env, env_interface, initial_state, states, actions)#

Helper function to extract observations, rewards, and dones along a trajectory using the simulator environment.

  • env (instance of robomimic EnvBase) – environment

  • env_interface (MG_EnvInterface instance) – environment interface for some data generation operations

  • initial_state (dict) – initial simulation state to load

  • states (np.array) – array of simulation states to load to extract information

  • actions (np.array) – array of actions


the datagen info objects across all timesteps represented as a dictionary of

numpy arrays, for easy writes to an hdf5

Return type

datagen_infos (dict)

mimicgen.scripts.prepare_src_dataset.prepare_src_dataset(dataset_path, env_interface_name, env_interface_type, filter_key=None, n=None, output_path=None)#

Adds DatagenInfo object instance for each timestep in each source demonstration trajectory and stores it under the “datagen_info” key for each episode. Also store the @env_interface_name and @env_interface_type used in the attribute of each key. This information is used during MimicGen data generation.

  • dataset_path (str) – path to input hdf5 dataset, which will be modified in-place unless @output_path is provided

  • env_interface_name (str) – name of environment interface class to use for this source dataset

  • env_interface_type (str) – type of environment interface to use for this source dataset

  • filter_key (str or None) – name of filter key

  • n (int or None) – if provided, stop after n trajectories are processed

  • output_path (str or None) – if provided, write a new hdf5 here instead of modifying the original dataset in-place

mimicgen.scripts.visualize_subtasks module#

A script to visualize each subtask in a source demonstration. This is a useful way to debug the subtask termination signals in a set of source demonstrations, as well as the choice of maximum subtask termination offsets.


# render on-screen python visualize_subtasks.py –dataset /path/to/demo.hdf5 –config /path/to/config.json –render

# render to video python visualize_subtasks.py –dataset /path/to/demo.hdf5 –config /path/to/config.json –video_path /path/to/video.mp4

# specify subtask information manually instead of using a config python visualize_subtasks.py –dataset /path/to/demo.hdf5 –signals grasp_1 insert_1 grasp_2 –offsets 10 10 10 –render

mimicgen.scripts.visualize_subtasks.visualize_subtasks_with_env(env, initial_state, states, subtask_end_indices, render=False, video_writer=None, video_skip=5, camera_names=None)#

Helper function to visualize each subtask in a trajectory using the simulator environment. If using on-screen rendering, the script will pause for input at the end of each subtask. If writing to a video, each subtask will toggle between having a red border around each frame and no border in the video.

  • env (instance of EnvBase) – environment

  • initial_state (dict) – initial simulation state to load

  • states (list) – list of simulation states to load

  • subtask_end_indices (list) – list containing the end index for each subtask

  • render (bool) – if True, render on-screen

  • video_writer (imageio writer) – video writer

  • video_skip (int) – determines rate at which environment frames are written to video

  • camera_names (list) – determines which camera(s) are used for rendering. Pass more than one to output a video with multiple camera views concatenated horizontally.

mimicgen.scripts.visualize_subtasks.visualize_subtasks_with_obs(traj_grp, subtask_end_indices, video_writer, video_skip=5, image_names=None)#

Helper function to visualize each subtask in a trajectory by writing image observations to a video. Each subtask will toggle between having a red border around each frame and no border in the video.

  • traj_grp (hdf5 file group) – hdf5 group which corresponds to the dataset trajectory to playback

  • subtask_end_indices (list) – list containing the end index for each subtask

  • video_writer (imageio writer) – video writer

  • video_skip (int) – determines rate at which environment frames are written to video

  • image_names (list) – determines which image observations are used for rendering. Pass more than one to output a video with multiple image observations concatenated horizontally.

