src.buffers package
Submodules
src.buffers.PPOBuffer module
Buffer for PPO.
- class src.buffers.PPOBuffer.PPOBuffer(obs_dim: int, act_dim: int, size: int, batch_size: int, gamma: float = 0.99, lam: float = 0.95, eps: float = 0.001)
Bases:
objectA buffer for storing trajectories experienced by a PPO agent interacting with the environment, and using Generalized Advantage Estimation (GAE-Lambda) for calculating the advantages of state-action pairs.
Initialize PPOBuffer
- Parameters
obs_dim (int) – Observation Dimension
act_dim (int) – Action Dimension
size (int) – Size of Replay Buffer
batch_size (int) – Batch Size
gamma (float, optional) – Gamma. Defaults to 0.99.
lam (float, optional) – Lambda. Defaults to 0.95.
eps (_type_, optional) – Epsilon. Defaults to 1e-3.
- finish_path(action_obj=None)
Call this at the end of a trajectory, or when one gets cut off by an epoch ending. This looks back in the buffer to where the trajectory started, and uses rewards and value estimates from the whole trajectory to compute advantage estimates with GAE-Lambda, as well as compute the rewards-to-go for each state, to use as the targets for the value function. The “last_val” argument should be 0 if the trajectory ended because the agent reached a terminal state (died), and otherwise should be V(s_T), the value function estimated for the last state. This allows us to bootstrap the reward-to-go calculation to account for timesteps beyond the arbitrary episode horizon (or epoch cutoff).
- classmethod instantiate_from_config(config_file_location)
Initialize class from config file
- Parameters
config_file_location (path) – Path to config file
- Raises
ValueError – Error loading file
- Returns
object from class.
- Return type
cls
- classmethod instantiate_from_config_dict(config)
Initialize class from config dictionary
- Parameters
config (dictionary) – Create instance of class from dictionary.
- Returns
Object from class and config.
- Return type
cls
- sample_batch()
Call this at the end of an epoch to get all of the data from the buffer, with advantages appropriately normalized (shifted to have mean zero and std one). Also, resets some pointers in the buffer.
- schema = Map({'obs_dim': Int(), 'act_dim': Int(), 'size': Int(), 'batch_size': Int(), Optional("gamma"): Float(), Optional("lam"): Float(), Optional("eps"): Float()})
- store(buffer_dict)
Append one timestep of agent-environment interaction to the buffer.
- src.buffers.PPOBuffer.discount_cumsum(x, discount)
Wraper arround discounted cumulative summation.
- Parameters
x (np.array) – Array to sum over
discount (float) – discount parameter.
- Returns
output
- Return type
float
src.buffers.SimpleReplayBuffer module
Default Replay Buffer.
- class src.buffers.SimpleReplayBuffer.SimpleReplayBuffer(obs_dim: int, act_dim: int, size: int, batch_size: int)
Bases:
objectA simple FIFO experience replay buffer for SAC agents.
Initialize simple replay buffer
- Parameters
obs_dim (int) – Observation dimension
act_dim (int) – Action dimension
size (int) – Buffer size
batch_size (int) – Batch size
- finish_path(action_obj=None)
Call this at the end of a trajectory, or when one gets cut off by an epoch ending. This looks back in the buffer to where the trajectory started, and uses rewards and value estimates from the whole trajectory to compute advantage estimates with GAE-Lambda, as well as compute the rewards-to-go for each state, to use as the targets for the value function. The “last_val” argument should be 0 if the trajectory ended because the agent reached a terminal state (died), and otherwise should be V(s_T), the value function estimated for the last state. This allows us to bootstrap the reward-to-go calculation to account for timesteps beyond the arbitrary episode horizon (or epoch cutoff).
- classmethod instantiate_from_config(config_file_location)
Initialize class from config file
- Parameters
config_file_location (path) – Path to config file
- Raises
ValueError – Error loading file
- Returns
object from class.
- Return type
cls
- classmethod instantiate_from_config_dict(config)
Initialize class from config dictionary
- Parameters
config (dictionary) – Create instance of class from dictionary.
- Returns
Object from class and config.
- Return type
cls
- sample_batch()
Sample batch from self.
- Returns
Dictionary of batched information.
- Return type
dict
- schema = Map({'obs_dim': Int(), 'act_dim': Int(), 'size': Int(), 'batch_size': Int()})
- store(buffer_dict)
Store data from buffer_dict
- Parameters
buffer_dict (_type_) – Buffer dict
Module contents
Replay buffer definitions, for storing past data for learning.