src.runners package

Submodules

src.runners.ModelFreeRunner module

Generalized runner for single-process RL. Takes in encoded observations, applies them to a buffer, and trains.

class src.runners.ModelFreeRunner.ModelFreeRunner(agent_config_path: str, buffer_config_path: str, encoder_config_path: str, model_save_dir: str, experiment_name: str, experiment_state_path: str, num_test_episodes: int, num_run_episodes: int, save_every_nth_episode: int, update_model_after: int, update_model_every: int, eval_every: int, max_episode_length: int, resume_training: bool = False, use_container: bool = True)

Bases: BaseRunner

Main configurable runner.

Initialize ModelFreeRunner.

Parameters
  • agent_config_path (str) – Path to agent configuration YAML.

  • buffer_config_path (str) – Path to replay buffer configuration YAML.

  • encoder_config_path (str) – Path to encoder configuration YAML.

  • model_save_dir (str) – Path to save model

  • experiment_name (str) – Experiment name in WandB

  • experiment_state_path (str) – Path to save experiment state for resuming.

  • num_test_episodes (int) – Number of test episodes

  • num_run_episodes (int) – Number of training episodes

  • save_every_nth_episode (int) – Train save frequency ( episode ).

  • update_model_after (int) – Update model after some number of training steps.

  • update_model_every (int) – Update model every __ training steps, for ___ training steps.

  • eval_every (int) – Evaluate every ___ episodes.

  • max_episode_length (int) – Maximum episode length ( BAD PARAM / BUGGY. )

  • use_container (bool, optional) – Whether to use the provided wrapper (HIGHLY ENCOURAGED). Defaults to True.

checkpoint_model(ep_ret, ep_number)

Conditionally save a checkpoint of the model if return is larger than best, or if self.save_every_nth_episode episodes have passed.

Parameters
  • ep_ret (float) – Current return

  • ep_number (int) – Current episode number

eval(env)

Evaluate model on the evaluation environment, using a deterministic agent if possible.

Parameters

env (gym.env) – Some gym-compliant environment.

Returns

The max reward for each test session.

Return type

float

classmethod instantiate_from_config(config_file_location)

Initialize class from config file

Parameters

config_file_location (path) – Path to config file

Raises

ValueError – Error loading file

Returns

object from class.

Return type

cls

classmethod instantiate_from_config_dict(config)

Initialize class from config dictionary

Parameters

config (dictionary) – Create instance of class from dictionary.

Returns

Object from class and config.

Return type

cls

run(env, api_key: str = '')

Train an agent, with our given parameters, on the environment in question.

Parameters
  • env (gym.env) – Some gym-compliant environment, preferrably wrapped using a wrapper

  • api_key (str, optional) – Wandb API key for logging. Defaults to ‘’.

save_experiment_state(ep_number)

Save running variables for experiment state resuming.

Parameters

ep_number (int) – Current episode number

Raises
  • Exception – Must specify json file name with experiment state path.

  • Exception – Must specify experiment state path

schema = Map({'agent_config_path': Str(), 'buffer_config_path': Str(), 'encoder_config_path': Str(), 'model_save_dir': Str(), 'experiment_name': Str(), 'experiment_state_path': Str(), 'num_test_episodes': Int(), 'num_run_episodes': Int(), 'save_every_nth_episode': Int(), 'update_model_after': Int(), 'update_model_every': Int(), 'eval_every': Int(), 'max_episode_length': Int(), Optional("resume_training"): Bool(), Optional("use_container"): Bool()})

src.runners.base module

Base Runner. Inherit from here, and respect the protocol.

class src.runners.base.BaseRunner

Bases: ABC

ABC for BaseRunner.

Initialize Base Runner

evaluation(env)

Eval Loop

training(env)

Training Loop

Module contents

Runners for different training systems.