class GraphEnvironment(ABC):
Known subclasses: rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironment
Constructor: GraphEnvironment(graph_invariant, graph_invariant_diff, sparse_setting)
This abstract class encapsulates the concept of a reinforcement learning (RL) environment for graph theory applications. Such an environment is designed to address extremal problems in which a specified graph invariant is to be maximized over a finite family of fully colored k-edge-colored looped complete graphs.
States are represented as fixed-length numpy.ndarray vectors. Actions are represented as
numpy.int32 integers taking values in the range from 0 to action_number - 1, where
action_number denotes the total (finite) number of available actions. At each step, some
actions may be unavailable, with the constraint that at least one action is always available in
any non-terminal state.
For efficiency, the environment supports running multiple episodes in parallel. All episodes
are guaranteed to terminate after a predetermined number of steps, regardless of whether the
underlying task is episodic or continuing. Accordingly, batches of states are represented as
two-dimensional numpy.ndarray matrices whose rows correspond to individual states, while
batches of actions are represented as one-dimensional numpy.ndarray vectors of type
numpy.int32.
Instead of returning rewards in the conventional RL sense, the environment returns values of a
selected graph invariant associated with the underlying graphs corresponding to the newly
reached states. The graph invariant is specified via a GraphInvariant function. If the sparse
setting is enabled, graph invariant values are computed only for the final batch of actions,
and None is returned at all preceding steps. Otherwise, the invariant values are computed
after every batch of actions. In the latter case, the computation may be optimized by supplying
a GraphInvariantDiff function, which specifies how graph invariant values change when the
environment transitions from one batch of underlying graphs to another.
Conceptually, environments are divided into two types according to the nature of the underlying RL task: the continuing environments and the episodic environments. In continuing environments, the task has no terminal states in the usual RL sense, and the underlying graph associated with each state is guaranteed to be fully colored. In episodic environments, the underlying graph associated with a terminal state is guaranteed to be fully colored, but no such guarantee is made for the underlying graphs associated with non-terminal states.
Concrete subclasses must implement the following abstract properties:
state_length, which returns the length of the state vectors;state_dtype, which returns the data type of the state vectors;action_number, which returns the total number of available actions;action_mask, which specifies which actions are currently available;episode_length, which returns the predetermined length of each episode; andis_continuing, which determines whether the environment is continuing or episodic.
Concrete subclasses must also implement the following abstract methods:
_initialize_batch, which initializes a batch of episodes;_transition_batch, which applies a batch of actions to the current batch of states; andstate_batch_to_graph_batch, which extracts the underlying batch of graphs from a provided batch of states.
| Method | __init__ |
This constructor initializes a GraphEnvironment with a specified graph invariant and, optionally, a function for computing differences of that invariant between successive batches of graphs. |
| Method | reset |
This method initializes a batch of episodes of a specified size and returns the resulting batch of states, the corresponding values of the selected graph invariant (if computed), and the status of the batch of episodes... |
| Method | state |
This abstract method must be implemented by any concrete subclass. It extracts the batch of underlying graphs corresponding to a provided batch of states. Implementations must return a Graph object containing the graphs corresponding to each row in ... |
| Method | state |
This method extracts the underlying graph corresponding to a single state. |
| Method | step |
This method applies a batch of actions to the current batch of episodes and returns the resulting batch of states, the corresponding values of the selected graph invariant (if computed), and the updated status of the batch... |
| Instance Variable | sparse |
A bool indicating whether the graph invariant values should be computed only for the final batch of actions. |
| Property | action |
This abstract property must be implemented by any concrete subclass. It must return None if no episodes are currently being run in parallel, or if every action is available in every current state. Otherwise, it must return a two-dimensional ... |
| Property | action |
This abstract property must be implemented by any concrete subclass. It must return the total number of distinct actions that can be executed in the environment, as a positive int. |
| Property | episode |
This abstract property must be implemented by any concrete subclass. It must return the predetermined common length of all episodes run in parallel, i.e., the total number of actions executed in each episode, as a positive ... |
| Property | is |
This abstract property must be implemented by any concrete subclass. It must return a bool indicating whether the environment is continuing (True) or episodic (False). |
| Property | state |
This abstract property must be implemented by any concrete subclass. It must return the data type of the one-dimensional numpy.ndarray vectors that represent states, as a numpy.dtype. |
| Property | state |
This abstract property must be implemented by any concrete subclass. It must return the number of entries in each state vector, i.e., the length of the one-dimensional numpy.ndarray vectors that represent states, as a positive ... |
| Method | _initialize |
This abstract method must be implemented by any concrete subclass. It must initialize a batch of episodes of the specified size and update the _state_batch and _status attributes so that they represent the newly initialized batch. |
| Method | _transition |
This abstract method must be implemented by any concrete subclass. It must apply a batch of actions to the current batch of states and update the _state_batch and _status attributes to reflect the resulting states and the updated batch status... |
| Instance Variable | __graph |
Either None or a Graph object representing the current batch of underlying graphs. This attribute is updated only when required by the sparse setting. |
| Instance Variable | __graph |
A GraphInvariant function specifying the graph invariant to be maximized. |
| Instance Variable | __graph |
Either None or a one-dimensional numpy.ndarray of type numpy.float32 containing the current batch of graph invariant values. As with __graph_batch, this attribute is updated only when required by the sparse setting. |
| Instance Variable | __graph |
Either None, indicating that graph invariant values are always computed directly using __graph_invariant, or a GraphInvariantDiff function used to incrementally update invariant values after state transitions. |
| Instance Variable | _state |
Either None or a two-dimensional numpy.ndarray representing the current batch of states. |
| Instance Variable | _status |
Either None or an EpisodeStatus value describing the current status of the batch of episodes. |
GraphInvariant, graph_invariant_diff: GraphInvariantDiff | None = None, sparse_setting: bool = False):
¶
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentThis constructor initializes a GraphEnvironment with a specified graph invariant and,
optionally, a function for computing differences of that invariant between successive
batches of graphs.
| Parameters | |
graphGraphInvariant | A GraphInvariant function that computes the graph invariant
values associated with a batch of underlying graphs. These values are the quantities to
be maximized by the environment. |
graphGraphInvariantDiff | None | Either None, indicating that graph invariant values are
always computed directly using graph_invariant, or a GraphInvariantDiff function
that computes element-wise differences of the graph invariant values when the
environment transitions from one batch of underlying graphs to another. The default
value is None. |
sparsebool | A bool indicating whether the sparse setting is enabled. If set to
True, the graph invariant values are computed only for the final batch of actions.
Otherwise, the graph invariant values are computed after every batch of actions. The
default value is False. |
This method initializes a batch of episodes of a specified size and returns the resulting
batch of states, the corresponding values of the selected graph invariant (if computed),
and the status of the batch of episodes. The order of the returned graph invariant values
matches the order of the states in the initialized batch. If the sparse setting is enabled,
the graph invariant values are not computed at initialization and None is returned in
their place. Otherwise, the graph invariant values are computed immediately after
initialization.
| Parameters | |
batchint | The number of episodes to initialize in the batch, given as a positive
int. |
| Returns | |
tuple[ | A tuple (initial_state_batch, graph_invariant_batch, status), where
|
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentThis abstract method must be implemented by any concrete subclass. It extracts the batch of
underlying graphs corresponding to a provided batch of states. Implementations must return
a Graph object containing the graphs corresponding to each row in state_batch,
preserving the row order. This method must be pure and must not modify any attributes of
the class instance.
| Parameters | |
statenp.ndarray | A two-dimensional numpy.ndarray whose rows represent individual
states from which the underlying graphs are to be extracted. |
| Returns | |
Graph | A Graph object representing the extracted batch of graphs. |
This method extracts the underlying graph corresponding to a single state.
| Parameters | |
state:np.ndarray | A one-dimensional numpy.ndarray representing a single state. |
| Returns | |
Graph | The underlying graph corresponding to state, returned as a Graph object. |
| Note | |
This method is pure and does not modify any attributes of the class instance. It
internally calls state_batch_to_graph_batch with a singleton batch. |
np.ndarray) -> tuple[ np.ndarray, np.ndarray | None, EpisodeStatus]:
¶
This method applies a batch of actions to the current batch of episodes and returns the resulting batch of states, the corresponding values of the selected graph invariant (if computed), and the updated status of the batch. The i-th provided action is applied to the i-th state in _state_batch. The order of the returned states and graph invariant values matches the order of the applied actions and the original states. If the sparse setting is enabled, the graph invariant values are computed only when a final state is reached. Otherwise, the graph invariant values are computed after every batch of actions, either directly or via the graph invariant differences function if one is provided.
| Parameters | |
actionnp.ndarray | A one-dimensional numpy.ndarray of type numpy.int32 containing the
actions to be applied. The length of action_batch must match the number of states in
_state_batch. |
| Returns | |
tuple[ | A tuple (new_state_batch, graph_invariant_batch, status), where
|
A bool indicating whether the graph invariant values should be computed
only for the final batch of actions.
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentThis abstract property must be implemented by any concrete subclass. It must return None
if no episodes are currently being run in parallel, or if every action is available in
every current state. Otherwise, it must return a two-dimensional numpy.ndarray matrix
a of type bool whose entry a[i, j] is True if and only if action j is
available in the current state of the i-th episode.
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentThis abstract property must be implemented by any concrete subclass. It must return the
total number of distinct actions that can be executed in the environment, as a positive
int.
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentThis abstract property must be implemented by any concrete subclass. It must return the
predetermined common length of all episodes run in parallel, i.e., the total number of
actions executed in each episode, as a positive int.
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentrlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentThis abstract property must be implemented by any concrete subclass. It must return the
data type of the one-dimensional numpy.ndarray vectors that represent states, as a
numpy.dtype.
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentThis abstract property must be implemented by any concrete subclass. It must return the
number of entries in each state vector, i.e., the length of the one-dimensional
numpy.ndarray vectors that represent states, as a positive int.
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentThis abstract method must be implemented by any concrete subclass. It must initialize a
batch of episodes of the specified size and update the _state_batch and _status
attributes so that they represent the newly initialized batch.
| Parameters | |
batchint | The number of episodes to initialize in the batch, given as a positive
int. |
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentThis abstract method must be implemented by any concrete subclass. It must apply a batch of
actions to the current batch of states and update the _state_batch and _status
attributes to reflect the resulting states and the updated batch status. Implementations
may also update additional subclass-specific attributes as required.
| Parameters | |
actionnp.ndarray | A one-dimensional numpy.ndarray of type numpy.int32 containing the
actions to be applied. The length of action_batch must match the number of states
in _state_batch. |
Either None or a one-dimensional numpy.ndarray of type
numpy.float32 containing the current batch of graph invariant values. As with
__graph_batch, this attribute is updated only when required by the sparse setting.
Either None, indicating that graph invariant values are always
computed directly using __graph_invariant, or a GraphInvariantDiff function used to
incrementally update invariant values after state transitions.
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentEither None or a two-dimensional numpy.ndarray representing the current
batch of states.
rlgt.environments.global_environments.GlobalFlipEnvironment, rlgt.environments.global_environments.GlobalSetEnvironment, rlgt.environments.linear_environments.LinearBuildEnvironment, rlgt.environments.linear_environments.LinearFlipEnvironment, rlgt.environments.linear_environments.LinearSetEnvironment, rlgt.environments.local_environments.LocalFlipEnvironment, rlgt.environments.local_environments.LocalSetEnvironmentEither None or an EpisodeStatus value describing the current status of the
batch of episodes.