class documentation

This class inherits from the RandomActionMechanism class and represents a random action mechanism with an exponential-style adaptation rule. An initial random action probability is specified at construction time. If the best score does not improve for a prescribed number of consecutive iterations, then the random action probability is increased multiplicatively, up to a fixed maximum threshold. Whenever a strict improvement in the best score is observed, the random action probability is reset to its initial value and the adaptation process restarts.

Method __init__ This constructor initializes the random action mechanism with the parameters governing its exponential adaptation behavior.
Method reset This abstract method must be implemented by any concrete subclass. It is invoked by an RL agent during the initialization process and it must initialize or reset all internal state maintained by the random action mechanism.
Method step This abstract method must be implemented by any concrete subclass. It is invoked by an RL agent at the end of each iteration of the learning process and it must update the internal state of the random action mechanism based on the previous best score and the current best score.
Property random_action_probability This abstract property must be implemented by any concrete subclass. It must return the current random action probability as a float value from the interval [0, 1].
Instance Variable __counter A nonnegative int that counts the number of iterations since the last improvement in the best score or the last update of the random action probability.
Instance Variable __initial_random_action_probability A float from the interval [0, 1] that represents the initial random action probability.
Instance Variable __maximum_random_action_probability A float from the interval [0, 1] that represents the maximum allowable value of the random action probability.
Instance Variable __multiplicative_factor A float greater than 1 that specifies the factor by which the random action probability is multiplied when an increase is triggered.
Instance Variable __random_action_probability A float from the interval [0, 1] that represents the current random action probability.
Instance Variable __waiting_period A positive int that specifies how many consecutive iterations without an improvement in the best score are required before the random action probability is increased.
def __init__(self, initial_random_action_probability: float, waiting_period: int, multiplicative_factor: float, maximum_random_action_probability: float):

This constructor initializes the random action mechanism with the parameters governing its exponential adaptation behavior.

Parameters
initial_random_action_probability:floatThe initial random action probability, given as a float from the interval [0, 1].
waiting_period:intThe number of consecutive iterations without an improvement in the best score that are required before the random action probability is increased, given as a positive int.
multiplicative_factor:floatThe multiplicative factor applied to the random action probability when an increase is triggered, given as a float greater than 1.
maximum_random_action_probability:floatThe maximum allowable value of the random action probability, given as a float from the interval [0, 1].
def reset(self):

This abstract method must be implemented by any concrete subclass. It is invoked by an RL agent during the initialization process and it must initialize or reset all internal state maintained by the random action mechanism.

def step(self, previous_best_score: float, current_best_score: float):

This abstract method must be implemented by any concrete subclass. It is invoked by an RL agent at the end of each iteration of the learning process and it must update the internal state of the random action mechanism based on the previous best score and the current best score.

Parameters
previous_best_score:floatThe value of the best score before the current iteration, given as a float.
current_best_score:floatThe value of the best score after the current iteration, given as a float.
random_action_probability: float =

This abstract property must be implemented by any concrete subclass. It must return the current random action probability as a float value from the interval [0, 1].

__counter: int =

A nonnegative int that counts the number of iterations since the last improvement in the best score or the last update of the random action probability.

__initial_random_action_probability: float =

A float from the interval [0, 1] that represents the initial random action probability.

__maximum_random_action_probability: float =

A float from the interval [0, 1] that represents the maximum allowable value of the random action probability.

__multiplicative_factor: float =

A float greater than 1 that specifies the factor by which the random action probability is multiplied when an increase is triggered.

__random_action_probability: float =

A float from the interval [0, 1] that represents the current random action probability.

__waiting_period: int =

A positive int that specifies how many consecutive iterations without an improvement in the best score are required before the random action probability is increased.