class DeepCrossEntropyAgent(GraphAgent):
Constructor: DeepCrossEntropyAgent(environment, policy_network, optimizer, candidates_count, ...)
This class encapsulates a reinforcement learning agent for graph theory applications using the
PyTorch-based Deep Cross-Entropy method. The agent operates on a configurable environment
given as a GraphEnvironment object. In each iteration of the learning process, the agent
generates a predetermined number of graphs through the graph building game defined by the
environment and computes the graph invariant value for the final underlying graph in each
episode run in parallel. The agent uses a torch.nn.Module model to compute the probability of
selecting each action at each step of every episode. A configured number of episodes with the
highest graph invariant value are used to train the model, while another configured number of
top episodes are carried over to the next generation. This completes one iteration of the
learning process. The user provides the model, the optimizer for training the model with
cross-entropy loss, and a random action mechanism to guide exploration. When a random action
occurs, it is selected uniformly among all actions available in the current state.
| Method | __init__ |
This constructor initializes an instance of the DeepCrossEntropyAgent class. |
| Method | reset |
This abstract method must be implemented by any concrete subclass. It must initialize the agent and prepare it to begin the learning process. If the agent has been used previously, invoking this method must reset all internal state so that the learning restarts from scratch. |
| Method | step |
This abstract method must be implemented by any concrete subclass. It must perform a single iteration of the learning process, which may involve one or more interactions between the agent and the environment... |
| Property | best |
This abstract property must be implemented by any concrete subclass. It must return a graph attaining the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the result must be returned as a ... |
| Property | best |
This abstract property must be implemented by any concrete subclass. It must return the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the value is returned as a ... |
| Property | step |
This abstract property must be implemented by any concrete subclass. It must return the number of learning iterations executed so far. If the agent has been initialized, the returned value must be a nonnegative ... |
| Instance Variable | _best |
A float representing the best achieved value for the graph invariant, or None if the agent has not been initialized. |
| Instance Variable | _candidates |
A positive int specifying the number of graphs constructed per iteration, i.e., the number of episodes run in parallel. |
| Instance Variable | _device |
A torch.device object indicating the device where the model resides. |
| Instance Variable | _elite |
A positive int specifying the number of top-performing episodes used to train the model in each iteration. |
| Instance Variable | _environment |
A GraphEnvironment object defining the extremal problem and providing the graph building game used to construct all the graphs. |
| Instance Variable | _loss |
A function implementing the cross-entropy loss used for training. |
| Instance Variable | _optimizer |
A torch.optim.Optimizer object that updates the model parameters. |
| Instance Variable | _policy |
A torch.nn.Module object predicting the action probabilities for each step in each episode. |
| Instance Variable | _population |
Either None if uninitialized, or a numpy.ndarray of type numpy.int32 storing all actions during each episode trajectory. Its shape is (episode_length, total_population), where the first dimension corresponds to the action trajectory within an episode and the second to the executed episodes... |
| Instance Variable | _population |
Either None if uninitialized, or a numpy.ndarray vector of type numpy.float32 storing the graph invariant value for each episode. Its length is total_population and the episode order matches _population_states... |
| Instance Variable | _population |
Either None if uninitialized, or a numpy.ndarray storing all states during each episode trajectory. Its shape is (episode_length + 1,
total_population, state_length), where episode_length is the episode length of the RL environment, ... |
| Instance Variable | _random |
A RandomActionMechanism object that determines the probability of executing a random action. When a random action is selected, it is sampled uniformly among all available actions in the current state. |
| Instance Variable | _random |
A numpy.random.Generator object used for all probabilistic decisions. |
| Instance Variable | _step |
A nonnegative int representing the number of executed iterations, or None if the agent has not been initialized. |
| Instance Variable | _survivors |
A positive int specifying the number of top-performing episodes carried over to the next generation. |
GraphEnvironment, policy_network: nn.Module, optimizer: torch.optim.Optimizer, candidates_count: int = 200, elite_count: int = 30, survivors_count: int = 50, random_action_mechanism: RandomActionMechanism = NoRandomActionMechanism(), random_generator: np.random.Generator | None = None):
¶
This constructor initializes an instance of the DeepCrossEntropyAgent class.
| Parameters | |
environment:GraphEnvironment | The RL environment defining the extremal problem and providing the
graph building game, given as a GraphEnvironment object. |
policynn.Module | The policy network used to compute the probability of each action in
each episode and step, given as a torch.nn.Module object. |
optimizer:torch.optim.Optimizer | The optimizer responsible for updating the model parameters, given as a
torch.optim.Optimizer object. The parameters of policy_network must be passed to
it. |
candidatesint | A positive int specifying how many graphs are generated in each
iteration by running the corresponding number of episodes in parallel. The default
value is 200. |
eliteint | A positive int specifying how many episodes with the greatest graph
invariant value are used to train the model in each iteration. The default value is 30. |
survivorsint | A positive int specifying how many episodes with the greatest
graph invariant value are carried over to the next generation in each iteration. The
default value is 50. |
randomRandomActionMechanism | A RandomActionMechanism object that governs the
probability of executing a random action in each step of the graph building game. When
a random action is triggered, the agent ignores the action predicted by the policy
network and instead selects an action uniformly at random among all available actions.
By default, this is NoRandomActionMechanism(), meaning that no random actions are
ever executed. |
randomnp.random.Generator | None | Either None, or a numpy.random.Generator used for
probabilistic decisions. If None, a default generator will be used. The default value
is None. |
rlgt.agents.graph_agent.GraphAgent.resetThis abstract method must be implemented by any concrete subclass. It must initialize the agent and prepare it to begin the learning process. If the agent has been used previously, invoking this method must reset all internal state so that the learning restarts from scratch.
rlgt.agents.graph_agent.GraphAgent.stepThis abstract method must be implemented by any concrete subclass. It must perform a single iteration of the learning process, which may involve one or more interactions between the agent and the environment. This iteration should update the agent's internal state and improve its policy or decision-making strategy based on the observed outcomes.
This abstract property must be implemented by any concrete subclass. It must return a graph
attaining the best value of the target graph invariant achieved so far. If at least one
learning iteration has been executed, the result must be returned as a Graph object.
Otherwise, if no iterations have been executed or the agent has not been initialized, the
value None must be returned.
This abstract property must be implemented by any concrete subclass. It must return the
best value of the target graph invariant achieved so far. If at least one learning
iteration has been executed, the value is returned as a float. If the agent has been
initialized but no iterations have yet been executed, the value −∞ must be returned. If the
agent has not been initialized, the value None must be returned.
A positive int specifying the number of graphs constructed per
iteration, i.e., the number of episodes run in parallel.
A positive int specifying the number of top-performing episodes used to
train the model in each iteration.
A GraphEnvironment object defining the extremal problem and providing the
graph building game used to construct all the graphs.
Either None if uninitialized, or a numpy.ndarray of type
numpy.int32 storing all actions during each episode trajectory. Its shape is
(episode_length, total_population), where the first dimension corresponds to the action
trajectory within an episode and the second to the executed episodes. The episode order
matches _population_states.
Either None if uninitialized, or a numpy.ndarray vector of type
numpy.float32 storing the graph invariant value for each episode. Its length is
total_population and the episode order matches _population_states.
Either None if uninitialized, or a numpy.ndarray storing all
states during each episode trajectory. Its shape is (episode_length + 1,
total_population, state_length), where episode_length is the episode length of the RL
environment, state_length is the length of the state vectors, and total_population
is the total number of episodes stored. The stored episodes include both newly generated
episodes and those carried over from the previous generation. The first dimension
corresponds to the state trajectory within an episode, the second to the executed episodes,
and the third to the state vector entries. The states from carried-over episodes appear
before the newly generated ones.
A RandomActionMechanism object that determines the
probability of executing a random action. When a random action is selected, it is sampled
uniformly among all available actions in the current state.