class ExponentialRandomActionMechanism(RandomActionMechanism):
This class inherits from the RandomActionMechanism class and represents a random action
mechanism with an exponential-style adaptation rule. An initial random action probability is
specified at construction time. If the best score does not improve for a prescribed number of
consecutive iterations, then the random action probability is increased multiplicatively, up to
a fixed maximum threshold. Whenever a strict improvement in the best score is observed, the
random action probability is reset to its initial value and the adaptation process restarts.
| Method | __init__ |
This constructor initializes the random action mechanism with the parameters governing its exponential adaptation behavior. |
| Method | reset |
This abstract method must be implemented by any concrete subclass. It is invoked by an RL agent during the initialization process and it must initialize or reset all internal state maintained by the random action mechanism. |
| Method | step |
This abstract method must be implemented by any concrete subclass. It is invoked by an RL agent at the end of each iteration of the learning process and it must update the internal state of the random action mechanism based on the previous best score and the current best score. |
| Property | random |
This abstract property must be implemented by any concrete subclass. It must return the current random action probability as a float value from the interval [0, 1]. |
| Instance Variable | __counter |
A nonnegative int that counts the number of iterations since the last improvement in the best score or the last update of the random action probability. |
| Instance Variable | __initial |
A float from the interval [0, 1] that represents the initial random action probability. |
| Instance Variable | __maximum |
A float from the interval [0, 1] that represents the maximum allowable value of the random action probability. |
| Instance Variable | __multiplicative |
A float greater than 1 that specifies the factor by which the random action probability is multiplied when an increase is triggered. |
| Instance Variable | __random |
A float from the interval [0, 1] that represents the current random action probability. |
| Instance Variable | __waiting |
A positive int that specifies how many consecutive iterations without an improvement in the best score are required before the random action probability is increased. |
float, waiting_period: int, multiplicative_factor: float, maximum_random_action_probability: float):
¶
This constructor initializes the random action mechanism with the parameters governing its exponential adaptation behavior.
| Parameters | |
initialfloat | The initial random action probability, given as a
float from the interval [0, 1]. |
waitingint | The number of consecutive iterations without an improvement in the
best score that are required before the random action probability is increased, given
as a positive int. |
multiplicativefloat | The multiplicative factor applied to the random action
probability when an increase is triggered, given as a float greater than 1. |
maximumfloat | The maximum allowable value of the random action
probability, given as a float from the interval [0, 1]. |
This abstract method must be implemented by any concrete subclass. It is invoked by an RL agent during the initialization process and it must initialize or reset all internal state maintained by the random action mechanism.
This abstract method must be implemented by any concrete subclass. It is invoked by an RL agent at the end of each iteration of the learning process and it must update the internal state of the random action mechanism based on the previous best score and the current best score.
| Parameters | |
previousfloat | The value of the best score before the current iteration, given
as a float. |
currentfloat | The value of the best score after the current iteration, given
as a float. |
This abstract property must be implemented by any concrete subclass. It must return the
current random action probability as a float value from the interval [0, 1].
A nonnegative int that counts the number of iterations since the last
improvement in the best score or the last update of the random action probability.
A float from the interval [0, 1] that represents
the maximum allowable value of the random action probability.
A float greater than 1 that specifies the factor by which the
random action probability is multiplied when an increase is triggered.