src.runners package

Submodules

src.runners.ModelFreeRunner module

Generalized runner for single-process RL. Takes in encoded observations, applies them to a buffer, and trains.

class src.runners.ModelFreeRunner.ModelFreeRunner(agent_config_path: str, buffer_config_path: str, encoder_config_path: str, model_save_dir: str, experiment_name: str, experiment_state_path: str, num_test_episodes: int, num_run_episodes: int, save_every_nth_episode: int, update_model_after: int, update_model_every: int, eval_every: int, max_episode_length: int, resume_training: bool = False, use_container: bool = True)

Bases: BaseRunner

Main configurable runner.

Initialize ModelFreeRunner.

Parameters

agent_config_path (str) – Path to agent configuration YAML.
buffer_config_path (str) – Path to replay buffer configuration YAML.
encoder_config_path (str) – Path to encoder configuration YAML.
model_save_dir (str) – Path to save model
experiment_name (str) – Experiment name in WandB
experiment_state_path (str) – Path to save experiment state for resuming.
num_test_episodes (int) – Number of test episodes
num_run_episodes (int) – Number of training episodes
save_every_nth_episode (int) – Train save frequency ( episode ).
update_model_after (int) – Update model after some number of training steps.
update_model_every (int) – Update model every __ training steps, for ___ training steps.
eval_every (int) – Evaluate every ___ episodes.
max_episode_length (int) – Maximum episode length ( BAD PARAM / BUGGY. )
use_container (bool, optional) – Whether to use the provided wrapper (HIGHLY ENCOURAGED). Defaults to True.

checkpoint_model(ep_ret, ep_number)

Conditionally save a checkpoint of the model if return is larger than best, or if self.save_every_nth_episode episodes have passed.

Parameters

ep_ret (float) – Current return
ep_number (int) – Current episode number

eval(env)

Evaluate model on the evaluation environment, using a deterministic agent if possible.

Parameters: env (gym.env) – Some gym-compliant environment.
Returns: The max reward for each test session.
Return type: float

classmethod instantiate_from_config(config_file_location)

Initialize class from config file

Parameters: config_file_location (path) – Path to config file
Raises: ValueError – Error loading file
Returns: object from class.
Return type: cls

classmethod instantiate_from_config_dict(config)

Initialize class from config dictionary

Parameters: config (dictionary) – Create instance of class from dictionary.
Returns: Object from class and config.
Return type: cls

run(env, api_key: str = '')

Train an agent, with our given parameters, on the environment in question.

Parameters

env (gym.env) – Some gym-compliant environment, preferrably wrapped using a wrapper
api_key (str, optional) – Wandb API key for logging. Defaults to ‘’.

save_experiment_state(ep_number)

Save running variables for experiment state resuming.

Parameters

ep_number (int) – Current episode number

Raises

Exception – Must specify json file name with experiment state path.
Exception – Must specify experiment state path

schema = Map({'agent_config_path': Str(), 'buffer_config_path': Str(), 'encoder_config_path': Str(), 'model_save_dir': Str(), 'experiment_name': Str(), 'experiment_state_path': Str(), 'num_test_episodes': Int(), 'num_run_episodes': Int(), 'save_every_nth_episode': Int(), 'update_model_after': Int(), 'update_model_every': Int(), 'eval_every': Int(), 'max_episode_length': Int(), Optional("resume_training"): Bool(), Optional("use_container"): Bool()})

src.runners.base module

Base Runner. Inherit from here, and respect the protocol.

class src.runners.base.BaseRunner

Bases: ABC

ABC for BaseRunner.

Initialize Base Runner

evaluation(env): Eval Loop

training(env): Training Loop

Module contents

Runners for different training systems.