metarl.tf.policies.policy module¶

Base class for policies in TensorFlow.

class Policy(name, env_spec)[source]¶

Bases: metarl.tf.models.module.Module

Base class for policies in TensorFlow.

Parameters:	name (str) – Policy name, also the variable scope. env_spec (metarl.envs.env_spec.EnvSpec) – Environment specification.

action_space¶

Action space.

Returns:	The action space of the environment.
Return type:	akro.Space

env_spec¶

Policy environment specification.

Returns:	Environment specification.
Return type:	metarl.EnvSpec

get_action(observation)[source]¶

Get action sampled from the policy.

Parameters:	observation (np.ndarray) – Observation from the environment.
Returns:	Action sampled from the policy.
Return type:	(np.ndarray)

get_actions(observations)[source]¶

Get action sampled from the policy.

Parameters:	observations (list[np.ndarray]) – Observations from the environment.
Returns:	Actions sampled from the policy.
Return type:	(np.ndarray)

log_diagnostics(paths)[source]¶

Log extra information per iteration based on the collected paths.

Parameters:	paths (dict[numpy.ndarray]) – Sample paths.

observation_space¶

Observation space.

Returns:	The observation space of the environment.
Return type:	akro.Space

vectorized¶

Boolean for vectorized.

Returns:	Indicates whether the policy is vectorized. If True, it should implement get_actions(), and support resetting with multiple simultaneous states.
Return type:	bool

class StochasticPolicy(name, env_spec)[source]¶

Stochastic Policy.