class ReinforceAgent(GraphAgent):
Constructor: ReinforceAgent(environment, policy_network, optimizer, candidates_count, ...)
This class encapsulates a reinforcement learning agent for graph theory applications using the
PyTorch-based REINFORCE method. The agent operates on a configurable environment given as a
GraphEnvironment object. In each iteration of the learning process, the agent generates a
predetermined number of graphs by playing the graph building game defined by the environment
and computes the graph invariant values and all discounted returns for each episode run in
parallel. Here, while computing a discounted return, a reward is considered to be the increase
between two consecutive graph invariant values. The agent uses a torch.nn.Module model to
compute the probability of selecting each action in each step of every episode. Afterwards, the
log probabilities and discounted returns of a subset of top-performing episodes are used to
train the model according to the REINFORCE algorithm. This completes one iteration of the
learning process. The user provides the model, configures the optimizer, sets the discount
factor, decides whether to apply a baseline to reduce variance, and optionally provides a
random action mechanism. When a random action occurs, it is selected uniformly among all
actions available in the current state.
| Method | __init__ |
This constructor initializes an instance of the ReinforceAgent class. |
| Method | reset |
This abstract method must be implemented by any concrete subclass. It must initialize the agent and prepare it to begin the learning process. If the agent has been used previously, invoking this method must reset all internal state so that the learning restarts from scratch. |
| Method | step |
This abstract method must be implemented by any concrete subclass. It must perform a single iteration of the learning process, which may involve one or more interactions between the agent and the environment... |
| Property | best |
This abstract property must be implemented by any concrete subclass. It must return a graph attaining the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the result must be returned as a ... |
| Property | best |
This abstract property must be implemented by any concrete subclass. It must return the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the value is returned as a ... |
| Property | step |
This abstract property must be implemented by any concrete subclass. It must return the number of learning iterations executed so far. If the agent has been initialized, the returned value must be a nonnegative ... |
| Instance Variable | _apply |
A bool indicating whether a baseline should be applied to reduce variance. If True, the baseline is the mean return over all episodes, computed independently for each step. |
| Instance Variable | _best |
A Graph object representing a graph attaining the best achieved value for the graph invariant, or None if the agent has not been initialized or no iterations have been executed. |
| Instance Variable | _best |
A float representing the best achieved value for the graph invariant, or None if the agent has not been initialized. |
| Instance Variable | _candidates |
A positive int specifying the number of graphs constructed per iteration, i.e., the number of episodes run in parallel. |
| Instance Variable | _device |
A torch.device object indicating the device where the model resides. |
| Instance Variable | _discount |
A float from the interval [0, 1] representing the discount factor to be used while computing the discounted returns. |
| Instance Variable | _elite |
A positive int specifying the number of top-performing episodes used to train the model in each iteration, or None if all the episodes should be used. |
| Instance Variable | _environment |
A GraphEnvironment object defining the extremal problem and providing the graph building game used to construct all the graphs. |
| Instance Variable | _optimizer |
A torch.optim.Optimizer object that updates the model parameters. |
| Instance Variable | _policy |
A torch.nn.Module object predicting the action probabilities for each step in each episode. |
| Instance Variable | _population |
Either None if uninitialized, or a numpy.ndarray of type numpy.float32 storing the discounted returns for all executed episodes. Its shape is (episode_length, candidates_count), where episode_length is the episode length of the RL environment and ... |
| Instance Variable | _random |
A RandomActionMechanism object that determines the probability of executing a random action. When a random action is selected, it is sampled uniformly among all available actions in the current state. |
| Instance Variable | _random |
A numpy.random.Generator object used for all probabilistic decisions. |
| Instance Variable | _step |
A nonnegative int representing the number of executed iterations, or None if the agent has not been initialized. |
GraphEnvironment, policy_network: nn.Module, optimizer: torch.optim.Optimizer, candidates_count: int = 200, elite_count: int | None = None, discount_factor: float = 0.99, apply_baseline: bool = True, random_action_mechanism: RandomActionMechanism = NoRandomActionMechanism(), random_generator: np.random.Generator | None = None):
¶
This constructor initializes an instance of the ReinforceAgent class.
| Parameters | |
environment:GraphEnvironment | The RL environment defining the extremal problem and providing the
graph building game, given as a GraphEnvironment object. |
policynn.Module | The policy network used to compute the probability of each action in
each episode and step, given as a torch.nn.Module object. |
optimizer:torch.optim.Optimizer | The optimizer responsible for updating the model parameters, given as a
torch.optim.Optimizer object. The parameters of policy_network must be passed to
it. |
candidatesint | A positive int specifying how many graphs are generated in each
iteration by running the corresponding number of episodes in parallel. The default
value is 200. |
eliteint | None | A positive int specifying how many episodes with the greatest graph
invariant value are used to train the policy network in each iteration of the learning
process, or None to indicate that all executed episodes should be used. The default
value is None. |
discountfloat | A float from the interval [0, 1] representing the discount factor
to be used while computing the returns. The default value is 0.99. |
applybool | A bool indicating whether a baseline should be applied to reduce
variance. If True, the baseline is the mean return over all elite episodes, computed
independently for each step. The default value is True. |
randomRandomActionMechanism | A RandomActionMechanism object that governs the
probability of executing a random action in each step of the graph building game. When
a random action is triggered, the agent ignores the action predicted by the policy
network and instead selects an action uniformly at random among all available actions.
By default, this is NoRandomActionMechanism(), meaning that no random actions are
ever executed. |
randomnp.random.Generator | None | Either None, or a numpy.random.Generator used for
probabilistic decisions. If None, a default generator will be used. The default value
is None. |
rlgt.agents.graph_agent.GraphAgent.resetThis abstract method must be implemented by any concrete subclass. It must initialize the agent and prepare it to begin the learning process. If the agent has been used previously, invoking this method must reset all internal state so that the learning restarts from scratch.
rlgt.agents.graph_agent.GraphAgent.stepThis abstract method must be implemented by any concrete subclass. It must perform a single iteration of the learning process, which may involve one or more interactions between the agent and the environment. This iteration should update the agent's internal state and improve its policy or decision-making strategy based on the observed outcomes.
This abstract property must be implemented by any concrete subclass. It must return a graph
attaining the best value of the target graph invariant achieved so far. If at least one
learning iteration has been executed, the result must be returned as a Graph object.
Otherwise, if no iterations have been executed or the agent has not been initialized, the
value None must be returned.
This abstract property must be implemented by any concrete subclass. It must return the
best value of the target graph invariant achieved so far. If at least one learning
iteration has been executed, the value is returned as a float. If the agent has been
initialized but no iterations have yet been executed, the value −∞ must be returned. If the
agent has not been initialized, the value None must be returned.
A positive int specifying the number of graphs constructed per
iteration, i.e., the number of episodes run in parallel.
A float from the interval [0, 1] representing the discount factor to
be used while computing the discounted returns.
A GraphEnvironment object defining the extremal problem and providing the
graph building game used to construct all the graphs.
Either None if uninitialized, or a numpy.ndarray of type
numpy.float32 storing the discounted returns for all executed episodes. Its shape is
(episode_length, candidates_count), where episode_length is the episode length of
the RL environment and candidates_count is the number of episodes executed in parallel.
The first dimension corresponds to the timestamps (actions) within an episode, and the
second corresponds to the executed episodes.
A RandomActionMechanism object that determines the
probability of executing a random action. When a random action is selected, it is sampled
uniformly among all available actions in the current state.