class documentation

This class encapsulates a reinforcement learning agent for graph theory applications using the PyTorch-based Deep Cross-Entropy method. The agent operates on a configurable environment given as a GraphEnvironment object. In each iteration of the learning process, the agent generates a predetermined number of graphs through the graph building game defined by the environment and computes the graph invariant value for the final underlying graph in each episode run in parallel. The agent uses a torch.nn.Module model to compute the probability of selecting each action at each step of every episode. A configured number of episodes with the highest graph invariant value are used to train the model, while another configured number of top episodes are carried over to the next generation. This completes one iteration of the learning process. The user provides the model, the optimizer for training the model with cross-entropy loss, and a random action mechanism to guide exploration. When a random action occurs, it is selected uniformly among all actions available in the current state.

Method __init__ This constructor initializes an instance of the DeepCrossEntropyAgent class.
Method reset This abstract method must be implemented by any concrete subclass. It must initialize the agent and prepare it to begin the learning process. If the agent has been used previously, invoking this method must reset all internal state so that the learning restarts from scratch.
Method step This abstract method must be implemented by any concrete subclass. It must perform a single iteration of the learning process, which may involve one or more interactions between the agent and the environment...
Property best_graph This abstract property must be implemented by any concrete subclass. It must return a graph attaining the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the result must be returned as a ...
Property best_score This abstract property must be implemented by any concrete subclass. It must return the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the value is returned as a ...
Property step_count This abstract property must be implemented by any concrete subclass. It must return the number of learning iterations executed so far. If the agent has been initialized, the returned value must be a nonnegative ...
Instance Variable _best_score A float representing the best achieved value for the graph invariant, or None if the agent has not been initialized.
Instance Variable _candidates_count A positive int specifying the number of graphs constructed per iteration, i.e., the number of episodes run in parallel.
Instance Variable _device A torch.device object indicating the device where the model resides.
Instance Variable _elite_count A positive int specifying the number of top-performing episodes used to train the model in each iteration.
Instance Variable _environment A GraphEnvironment object defining the extremal problem and providing the graph building game used to construct all the graphs.
Instance Variable _loss_function A function implementing the cross-entropy loss used for training.
Instance Variable _optimizer A torch.optim.Optimizer object that updates the model parameters.
Instance Variable _policy_network A torch.nn.Module object predicting the action probabilities for each step in each episode.
Instance Variable _population_actions Either None if uninitialized, or a numpy.ndarray of type numpy.int32 storing all actions during each episode trajectory. Its shape is (episode_length, total_population), where the first dimension corresponds to the action trajectory within an episode and the second to the executed episodes...
Instance Variable _population_scores Either None if uninitialized, or a numpy.ndarray vector of type numpy.float32 storing the graph invariant value for each episode. Its length is total_population and the episode order matches _population_states...
Instance Variable _population_states Either None if uninitialized, or a numpy.ndarray storing all states during each episode trajectory. Its shape is (episode_length + 1, total_population, state_length), where episode_length is the episode length of the RL environment, ...
Instance Variable _random_action_mechanism A RandomActionMechanism object that determines the probability of executing a random action. When a random action is selected, it is sampled uniformly among all available actions in the current state.
Instance Variable _random_generator A numpy.random.Generator object used for all probabilistic decisions.
Instance Variable _step_count A nonnegative int representing the number of executed iterations, or None if the agent has not been initialized.
Instance Variable _survivors_count A positive int specifying the number of top-performing episodes carried over to the next generation.
def __init__(self, environment: GraphEnvironment, policy_network: nn.Module, optimizer: torch.optim.Optimizer, candidates_count: int = 200, elite_count: int = 30, survivors_count: int = 50, random_action_mechanism: RandomActionMechanism = NoRandomActionMechanism(), random_generator: np.random.Generator | None = None):

This constructor initializes an instance of the DeepCrossEntropyAgent class.

Parameters
environment:GraphEnvironmentThe RL environment defining the extremal problem and providing the graph building game, given as a GraphEnvironment object.
policy_network:nn.ModuleThe policy network used to compute the probability of each action in each episode and step, given as a torch.nn.Module object.
optimizer:torch.optim.OptimizerThe optimizer responsible for updating the model parameters, given as a torch.optim.Optimizer object. The parameters of policy_network must be passed to it.
candidates_count:intA positive int specifying how many graphs are generated in each iteration by running the corresponding number of episodes in parallel. The default value is 200.
elite_count:intA positive int specifying how many episodes with the greatest graph invariant value are used to train the model in each iteration. The default value is 30.
survivors_count:intA positive int specifying how many episodes with the greatest graph invariant value are carried over to the next generation in each iteration. The default value is 50.
random_action_mechanism:RandomActionMechanismA RandomActionMechanism object that governs the probability of executing a random action in each step of the graph building game. When a random action is triggered, the agent ignores the action predicted by the policy network and instead selects an action uniformly at random among all available actions. By default, this is NoRandomActionMechanism(), meaning that no random actions are ever executed.
random_generator:np.random.Generator | NoneEither None, or a numpy.random.Generator used for probabilistic decisions. If None, a default generator will be used. The default value is None.
def reset(self):

This abstract method must be implemented by any concrete subclass. It must initialize the agent and prepare it to begin the learning process. If the agent has been used previously, invoking this method must reset all internal state so that the learning restarts from scratch.

def step(self):

This abstract method must be implemented by any concrete subclass. It must perform a single iteration of the learning process, which may involve one or more interactions between the agent and the environment. This iteration should update the agent's internal state and improve its policy or decision-making strategy based on the observed outcomes.

best_graph: Graph | None =

This abstract property must be implemented by any concrete subclass. It must return a graph attaining the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the result must be returned as a Graph object. Otherwise, if no iterations have been executed or the agent has not been initialized, the value None must be returned.

best_score: float | None =

This abstract property must be implemented by any concrete subclass. It must return the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the value is returned as a float. If the agent has been initialized but no iterations have yet been executed, the value −∞ must be returned. If the agent has not been initialized, the value None must be returned.

step_count: int | None =

This abstract property must be implemented by any concrete subclass. It must return the number of learning iterations executed so far. If the agent has been initialized, the returned value must be a nonnegative int. If the agent has not yet been initialized, the value None must be returned.

_best_score: float | None =

A float representing the best achieved value for the graph invariant, or None if the agent has not been initialized.

_candidates_count: int =

A positive int specifying the number of graphs constructed per iteration, i.e., the number of episodes run in parallel.

_device: torch.device =

A torch.device object indicating the device where the model resides.

_elite_count: int =

A positive int specifying the number of top-performing episodes used to train the model in each iteration.

_environment: GraphEnvironment =

A GraphEnvironment object defining the extremal problem and providing the graph building game used to construct all the graphs.

_loss_function: Callable =

A function implementing the cross-entropy loss used for training.

A torch.optim.Optimizer object that updates the model parameters.

_policy_network: nn.Module =

A torch.nn.Module object predicting the action probabilities for each step in each episode.

_population_actions: np.ndarray | None =

Either None if uninitialized, or a numpy.ndarray of type numpy.int32 storing all actions during each episode trajectory. Its shape is (episode_length, total_population), where the first dimension corresponds to the action trajectory within an episode and the second to the executed episodes. The episode order matches _population_states.

_population_scores: np.ndarray | None =

Either None if uninitialized, or a numpy.ndarray vector of type numpy.float32 storing the graph invariant value for each episode. Its length is total_population and the episode order matches _population_states.

_population_states: np.ndarray | None =

Either None if uninitialized, or a numpy.ndarray storing all states during each episode trajectory. Its shape is (episode_length + 1, total_population, state_length), where episode_length is the episode length of the RL environment, state_length is the length of the state vectors, and total_population is the total number of episodes stored. The stored episodes include both newly generated episodes and those carried over from the previous generation. The first dimension corresponds to the state trajectory within an episode, the second to the executed episodes, and the third to the state vector entries. The states from carried-over episodes appear before the newly generated ones.

_random_action_mechanism: RandomActionMechanism =

A RandomActionMechanism object that determines the probability of executing a random action. When a random action is selected, it is sampled uniformly among all available actions in the current state.

_random_generator: np.random.Generator =

A numpy.random.Generator object used for all probabilistic decisions.

_step_count: int | None =

A nonnegative int representing the number of executed iterations, or None if the agent has not been initialized.

_survivors_count: int =

A positive int specifying the number of top-performing episodes carried over to the next generation.