rlgt.agents.deep_cross_entropy_agent.DeepCrossEntropyAgent

class documentation

class DeepCrossEntropyAgent(GraphAgent):

Constructor: DeepCrossEntropyAgent(environment, policy_network, optimizer, candidates_count, ...)

This class encapsulates a reinforcement learning agent for graph theory applications using the PyTorch-based Deep Cross-Entropy method. The agent operates on a configurable environment given as a GraphEnvironment object. In each iteration of the learning process, the agent generates a predetermined number of graphs through the graph building game defined by the environment and computes the graph invariant value for the final underlying graph in each episode run in parallel. The agent uses a torch.nn.Module model to compute the probability of selecting each action at each step of every episode. A configured number of episodes with the highest graph invariant value are used to train the model, while another configured number of top episodes are carried over to the next generation. This completes one iteration of the learning process. The user provides the model, the optimizer for training the model with cross-entropy loss, and a random action mechanism to guide exploration. When a random action occurs, it is selected uniformly among all actions available in the current state.

Method	`__init__`	This constructor initializes an instance of the `DeepCrossEntropyAgent` class.
Method	`reset`	This abstract method must be implemented by any concrete subclass. It must initialize the agent and prepare it to begin the learning process. If the agent has been used previously, invoking this method must reset all internal state so that the learning restarts from scratch.
Method	`step`	This abstract method must be implemented by any concrete subclass. It must perform a single iteration of the learning process, which may involve one or more interactions between the agent and the environment...
Property	`best_graph`	This abstract property must be implemented by any concrete subclass. It must return a graph attaining the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the result must be returned as a ...
Property	`best_score`	This abstract property must be implemented by any concrete subclass. It must return the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the value is returned as a ...
Property	`step_count`	This abstract property must be implemented by any concrete subclass. It must return the number of learning iterations executed so far. If the agent has been initialized, the returned value must be a nonnegative ...
Instance Variable	`_best_score`	A `float` representing the best achieved value for the graph invariant, or `None` if the agent has not been initialized.
Instance Variable	`_candidates_count`	A positive `int` specifying the number of graphs constructed per iteration, i.e., the number of episodes run in parallel.
Instance Variable	`_device`	A `torch.device` object indicating the device where the model resides.
Instance Variable	`_elite_count`	A positive `int` specifying the number of top-performing episodes used to train the model in each iteration.
Instance Variable	`_environment`	A `GraphEnvironment` object defining the extremal problem and providing the graph building game used to construct all the graphs.
Instance Variable	`_loss_function`	A function implementing the cross-entropy loss used for training.
Instance Variable	`_optimizer`	A `torch.optim.Optimizer` object that updates the model parameters.
Instance Variable	`_policy_network`	A `torch.nn.Module` object predicting the action probabilities for each step in each episode.
Instance Variable	`_population_actions`	Either `None` if uninitialized, or a `numpy.ndarray` of type `numpy.int32` storing all actions during each episode trajectory. Its shape is `(episode_length, total_population)`, where the first dimension corresponds to the action trajectory within an episode and the second to the executed episodes...
Instance Variable	`_population_scores`	Either `None` if uninitialized, or a `numpy.ndarray` vector of type `numpy.float32` storing the graph invariant value for each episode. Its length is `total_population` and the episode order matches `_population_states`...
Instance Variable	`_population_states`	Either `None` if uninitialized, or a `numpy.ndarray` storing all states during each episode trajectory. Its shape is `(episode_length + 1, total_population, state_length)`, where `episode_length` is the episode length of the RL environment, ...
Instance Variable	`_random_action_mechanism`	A `RandomActionMechanism` object that determines the probability of executing a random action. When a random action is selected, it is sampled uniformly among all available actions in the current state.
Instance Variable	`_random_generator`	A `numpy.random.Generator` object used for all probabilistic decisions.
Instance Variable	`_step_count`	A nonnegative `int` representing the number of executed iterations, or `None` if the agent has not been initialized.
Instance Variable	`_survivors_count`	A positive `int` specifying the number of top-performing episodes carried over to the next generation.

def __init__(self, environment: GraphEnvironment, policy_network: nn.Module, optimizer: torch.optim.Optimizer, candidates_count: int = 200, elite_count: int = 30, survivors_count: int = 50, random_action_mechanism: RandomActionMechanism = NoRandomActionMechanism(), random_generator: np.random.Generator | None = None): ¶

This constructor initializes an instance of the DeepCrossEntropyAgent class.

Parameters
environment:`GraphEnvironment`	The RL environment defining the extremal problem and providing the graph building game, given as a `GraphEnvironment` object.
policy_network:`nn.Module`	The policy network used to compute the probability of each action in each episode and step, given as a `torch.nn.Module` object.
optimizer:`torch.optim.Optimizer`	The optimizer responsible for updating the model parameters, given as a `torch.optim.Optimizer` object. The parameters of `policy_network` must be passed to it.
candidates_count:`int`	A positive `int` specifying how many graphs are generated in each iteration by running the corresponding number of episodes in parallel. The default value is 200.
elite_count:`int`	A positive `int` specifying how many episodes with the greatest graph invariant value are used to train the model in each iteration. The default value is 30.
survivors_count:`int`	A positive `int` specifying how many episodes with the greatest graph invariant value are carried over to the next generation in each iteration. The default value is 50.
random_action_mechanism:`RandomActionMechanism`	A `RandomActionMechanism` object that governs the probability of executing a random action in each step of the graph building game. When a random action is triggered, the agent ignores the action predicted by the policy network and instead selects an action uniformly at random among all available actions. By default, this is `NoRandomActionMechanism()`, meaning that no random actions are ever executed.
random_generator:`np.random.Generator \| None`	Either `None`, or a `numpy.random.Generator` used for probabilistic decisions. If `None`, a default generator will be used. The default value is `None`.

def reset(self): ¶

overrides rlgt.agents.graph_agent.GraphAgent.reset

This abstract method must be implemented by any concrete subclass. It must initialize the agent and prepare it to begin the learning process. If the agent has been used previously, invoking this method must reset all internal state so that the learning restarts from scratch.

def step(self): ¶

overrides rlgt.agents.graph_agent.GraphAgent.step

This abstract method must be implemented by any concrete subclass. It must perform a single iteration of the learning process, which may involve one or more interactions between the agent and the environment. This iteration should update the agent's internal state and improve its policy or decision-making strategy based on the observed outcomes.

@property

best_graph: Graph | None = ¶

overrides rlgt.agents.graph_agent.GraphAgent.best_graph

This abstract property must be implemented by any concrete subclass. It must return a graph attaining the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the result must be returned as a Graph object. Otherwise, if no iterations have been executed or the agent has not been initialized, the value None must be returned.

@property

best_score: float | None = ¶

overrides rlgt.agents.graph_agent.GraphAgent.best_score

This abstract property must be implemented by any concrete subclass. It must return the best value of the target graph invariant achieved so far. If at least one learning iteration has been executed, the value is returned as a float. If the agent has been initialized but no iterations have yet been executed, the value −∞ must be returned. If the agent has not been initialized, the value None must be returned.

@property

step_count: int | None = ¶

overrides rlgt.agents.graph_agent.GraphAgent.step_count

This abstract property must be implemented by any concrete subclass. It must return the number of learning iterations executed so far. If the agent has been initialized, the returned value must be a nonnegative int. If the agent has not yet been initialized, the value None must be returned.

_best_score: float | None = ¶

A float representing the best achieved value for the graph invariant, or None if the agent has not been initialized.

_candidates_count: int = ¶

A positive int specifying the number of graphs constructed per iteration, i.e., the number of episodes run in parallel.

_device: torch.device = ¶

A torch.device object indicating the device where the model resides.

_elite_count: int = ¶

A positive int specifying the number of top-performing episodes used to train the model in each iteration.

_environment: GraphEnvironment = ¶

A GraphEnvironment object defining the extremal problem and providing the graph building game used to construct all the graphs.

_loss_function: Callable = ¶

A function implementing the cross-entropy loss used for training.

_optimizer: torch.optim.Optimizer = ¶

A torch.optim.Optimizer object that updates the model parameters.

_policy_network: nn.Module = ¶

A torch.nn.Module object predicting the action probabilities for each step in each episode.

_population_actions: np.ndarray | None = ¶

Either None if uninitialized, or a numpy.ndarray of type numpy.int32 storing all actions during each episode trajectory. Its shape is (episode_length, total_population), where the first dimension corresponds to the action trajectory within an episode and the second to the executed episodes. The episode order matches _population_states.

_population_scores: np.ndarray | None = ¶

Either None if uninitialized, or a numpy.ndarray vector of type numpy.float32 storing the graph invariant value for each episode. Its length is total_population and the episode order matches _population_states.

_population_states: np.ndarray | None = ¶

Either None if uninitialized, or a numpy.ndarray storing all states during each episode trajectory. Its shape is (episode_length + 1, total_population, state_length), where episode_length is the episode length of the RL environment, state_length is the length of the state vectors, and total_population is the total number of episodes stored. The stored episodes include both newly generated episodes and those carried over from the previous generation. The first dimension corresponds to the state trajectory within an episode, the second to the executed episodes, and the third to the state vector entries. The states from carried-over episodes appear before the newly generated ones.

_random_action_mechanism: RandomActionMechanism = ¶

A RandomActionMechanism object that determines the probability of executing a random action. When a random action is selected, it is sampled uniformly among all available actions in the current state.

_random_generator: np.random.Generator = ¶

A numpy.random.Generator object used for all probabilistic decisions.

_step_count: int | None = ¶

A nonnegative int representing the number of executed iterations, or None if the agent has not been initialized.

_survivors_count: int = ¶

A positive int specifying the number of top-performing episodes carried over to the next generation.