src.buffers package

Submodules

src.buffers.PPOBuffer module

Buffer for PPO.

class src.buffers.PPOBuffer.PPOBuffer(obs_dim: int, act_dim: int, size: int, batch_size: int, gamma: float = 0.99, lam: float = 0.95, eps: float = 0.001)

Bases: object

A buffer for storing trajectories experienced by a PPO agent interacting with the environment, and using Generalized Advantage Estimation (GAE-Lambda) for calculating the advantages of state-action pairs.

Initialize PPOBuffer

Parameters

obs_dim (int) – Observation Dimension
act_dim (int) – Action Dimension
size (int) – Size of Replay Buffer
batch_size (int) – Batch Size
gamma (float, optional) – Gamma. Defaults to 0.99.
lam (float, optional) – Lambda. Defaults to 0.95.
eps (_type_, optional) – Epsilon. Defaults to 1e-3.

finish_path(action_obj=None): Call this at the end of a trajectory, or when one gets cut off by an epoch ending. This looks back in the buffer to where the trajectory started, and uses rewards and value estimates from the whole trajectory to compute advantage estimates with GAE-Lambda, as well as compute the rewards-to-go for each state, to use as the targets for the value function. The “last_val” argument should be 0 if the trajectory ended because the agent reached a terminal state (died), and otherwise should be V(s_T), the value function estimated for the last state. This allows us to bootstrap the reward-to-go calculation to account for timesteps beyond the arbitrary episode horizon (or epoch cutoff).

classmethod instantiate_from_config(config_file_location)

Initialize class from config file

Parameters: config_file_location (path) – Path to config file
Raises: ValueError – Error loading file
Returns: object from class.
Return type: cls

classmethod instantiate_from_config_dict(config)

Initialize class from config dictionary

Parameters: config (dictionary) – Create instance of class from dictionary.
Returns: Object from class and config.
Return type: cls

sample_batch(): Call this at the end of an epoch to get all of the data from the buffer, with advantages appropriately normalized (shifted to have mean zero and std one). Also, resets some pointers in the buffer.

schema = Map({'obs_dim': Int(), 'act_dim': Int(), 'size': Int(), 'batch_size': Int(), Optional("gamma"): Float(), Optional("lam"): Float(), Optional("eps"): Float()})

store(buffer_dict): Append one timestep of agent-environment interaction to the buffer.

src.buffers.PPOBuffer.discount_cumsum(x, discount)

Wraper arround discounted cumulative summation.

Parameters

x (np.array) – Array to sum over
discount (float) – discount parameter.

Returns

output

Return type

float

src.buffers.SimpleReplayBuffer module

Default Replay Buffer.

class src.buffers.SimpleReplayBuffer.SimpleReplayBuffer(obs_dim: int, act_dim: int, size: int, batch_size: int)

Bases: object

A simple FIFO experience replay buffer for SAC agents.

Initialize simple replay buffer

Parameters

obs_dim (int) – Observation dimension
act_dim (int) – Action dimension
size (int) – Buffer size
batch_size (int) – Batch size

finish_path(action_obj=None): Call this at the end of a trajectory, or when one gets cut off by an epoch ending. This looks back in the buffer to where the trajectory started, and uses rewards and value estimates from the whole trajectory to compute advantage estimates with GAE-Lambda, as well as compute the rewards-to-go for each state, to use as the targets for the value function. The “last_val” argument should be 0 if the trajectory ended because the agent reached a terminal state (died), and otherwise should be V(s_T), the value function estimated for the last state. This allows us to bootstrap the reward-to-go calculation to account for timesteps beyond the arbitrary episode horizon (or epoch cutoff).

classmethod instantiate_from_config(config_file_location)

Initialize class from config file

Parameters: config_file_location (path) – Path to config file
Raises: ValueError – Error loading file
Returns: object from class.
Return type: cls

classmethod instantiate_from_config_dict(config)

Initialize class from config dictionary

Parameters: config (dictionary) – Create instance of class from dictionary.
Returns: Object from class and config.
Return type: cls

sample_batch()

Sample batch from self.

Returns: Dictionary of batched information.
Return type: dict

schema = Map({'obs_dim': Int(), 'act_dim': Int(), 'size': Int(), 'batch_size': Int()})

store(buffer_dict)

Store data from buffer_dict

Parameters: buffer_dict (_type_) – Buffer dict

Module contents

Replay buffer definitions, for storing past data for learning.