Main Content

MonitorLogger

Log reinforcement learning training data to monitor window

Since R2022b

    Description

    Use a MonitorLogger object to log data to a monitor window, within the train function or inside a custom training loop. To log data when using the train function, specify appropriate callback functions in MonitorLogger, as shown in the examples. These callbacks are executed at different stages of training, for example, EpisodeFinishedFcn is executed after the completion of an episode. The output of a callback function is a structure containing the data to log at that stage of training.

    Note

    Using a MonitorLogger object to log data when using the train function does not affect (and is not affected by) any option to save agents during training specified within a training options object.

    Note

    MonitorLogger is an handle object. If you assign an existing MonitorLogger object to a new MonitorLogger object, both the new object and the original one refer to the same underlying object in memory. To preserve the original object parameters for later use, save the object to a MAT-file. For more information about handle objects, see Handle Object Behavior.

    Creation

    Create a MonitorLogger object using rlDataLogger specifying a trainingProgressMonitor object as input argument.

    Properties

    expand all

    Object containing a set of logging options, returned as a MonitorLoggingOptions object. A MonitorLoggingOptions object has the following properties, which you can access using dot notation after creating the MonitorLogger object.

    Monitor data write period, specified as a positive integer. It is the number of episodes after which data is transmitted to the trainingProgressMonitor object. For example, if DataWriteFrequency is 5 then data from episodes 1 to 5 will be cached in memory and be transmitted to the monitor object at the end of the 5-th episode. This improves performance in some cases. The default is 1.

    Example: DataWriteFrequency=10

    This property is read-only.

    Maximum number of episodes, specified as a positive integer. When using train, the value is automatically initialized. Set this value when using the logger object in a custom training loop. The default is 500.

    Example: MaxEpisodes=1000

    This property is read-only.

    Callback to log data after episode completion, specified as a function handle object. The specified function is automatically called by the training loop at the end of each episode, and must return a structure containing the data to log, such as experiences, simulation information, or initial observations.

    Your function must have the following signature.

    function dataToLog = myEpisodeFinishedFcn(data)

    Here, data is a structure that contains the following fields:

    • EpisodeCount — current episode number

    • Environment — environment object

    • Agent — agent object

    • Experience — structure array containing the experiences. Each element of this array corresponds to a step and is a structure containing the fields NextObservation, Observation, Action, Reward and IsDone.

    • EpisodeInfo — structure containing the fields CumulativeReward, StepsTaken and InitialObservation.

    • SimulationInfo — contains simulation information from the episode. For MATLAB environments this is a structure with the field SimulationError, and for Simulink® environments it is a Simulink.SimulationOutput object.

    The function output dataToLog is the structure containing the data to be logged to disk.

    Example: EpisodeFinishedFcn=@myEpLoggingFcn

    This property is read-only.

    Callback to log data after training step completion within an episode, specified as a function handle object. The specified function is automatically called by the training loop at the end of each training step, and must return a structure containing the data to log, such as for example the state of the agent's exploration policy.

    Your function must have the following signature.

    function dataToLog = myAgentStepFinishedFcn(data)

    Here, data is a structure that contains the following fields:

    • EpisodeCount — current episode number

    • AgentStepCount — cumulative number of steps taken by the agent

    • SimulationTime — current simulation time in the environment

    • Agent — agent object

    The function output dataToLog is the structure containing the data to be logged to disk.

    For multi agent training, AgentStepFinishedFcn can be a cell array of function handles with as many elements as number of agent groups.

    Note

    Logging data using the AgentStepFinishedFcn callback is not supported when training agents in parallel with the train function.

    Example: AgentStepFinishedFcn=@myAgtStepLoggingFcn

    This property is read-only.

    Callback to log data after completion of the learn subroutine, specified as a function handle object. The specified function is automatically called by the training loop at the end of each learning subroutine, and must return a structure containing the data to log, such as the training losses of the actor and critic networks, or, for a model-based agent, the environment model training losses.

    Your function must have the following signature.

    function dataToLog = myAgentLearnFinishedFcn(data)

    Here, data is a structure that contains the following fields:

    • EpisodeCount — current episode number

    • AgentStepCount — cumulative number of steps taken by the agent

    • AgentLearnCount — cumulative number of learning steps taken by the agent

    • EnvModelTrainingInfo — structure containing model-based agents related fields TransitionFcnLoss, RewardFcnLoss and IsDoneFcnLoss.

    • Agent — agent object

    • ActorLoss — value of the actor loss

    • CriticLoss — value of the critic loss

    • ActorGradientStepCount — cumulative number of gradient calculations for the actor (if the agent has an actor)

    • CriticGradientStepCount — cumulative number of gradient calculations for the critic (if the agent has a critic)

    Depending on the agent, the data structure also contains the following fields:

    • TDTarget — temporal-difference target value (for DQN, DDPG, TD3, SAC, PPO and TRPO agents)

    • TDError — temporal-difference target error (for DQN, DDPG, TD3, SAC, PPO and TRPO agents)

    • SampleIndex — indices of the minibatch experiences sampled for the current gradient step (for DQN, DDPG, TD3, and SAC agents)

    • MaskIndex — sequence padding mask (for DQN, DDPG, TD3, and SAC agents that use RNNs)

    • Advantage — advantage value (for PPO and TRPO agents)

    • PolicyRatio — policy ratio value (for PPO agents)

    • AdvantageLoss — advantage loss value (for PPO agents)

    • EntropyLoss — entropy loss value (for PPO agents)

    The function output dataToLog is the structure containing the data to be logged to disk.

    For multi agent training, AgentLearnFinishedFcn can be a cell array of function handles with as many elements as number of agent groups.

    Example: AgentLearnFinishedFcn=@myAgtLearnLoggingFcn

    Object Functions

    setupSet up reinforcement learning environment or initialize data logger object
    cleanupClean up reinforcement learning environment or data logger object

    Examples

    collapse all

    This example shows how to log and visualize data to the window of a trainingProgressMonitor object when using train.

    Create a trainingProgressMonitor object. Creating the object also opens a window associated with the object.

    monitor = trainingProgressMonitor();

    Create a MonitorLogger object using rlDataLogger.

    logger = rlDataLogger(monitor);

    Create callback functions to log the data (for this example, see the helper function section), and specify the appropriate callback functions in the logger object. For the specific function signatures and more information on the function input structure, see the corresponding properties of MonitorLogger.

    logger.AgentLearnFinishedFcn = @myAgentLearnFinishedFcn;

    To train the agent, you can now call train, passing logger as an argument such as in the following command.

    trainResult = train(agent, env, trainOpts, Logger=logger);
    

    While the training progresses, data will be logged to the training monitor object, and visualized in the associated window.

    Note that only scalar data can be logged with a monitor logger object.

    Example Logging Functions

    Define a logging function that logs data periodically at the completion of the learning subroutine. This function is automatically called by the training loop at the end of each learning subroutine, and must return a structure containing the learning-related data to log, such as the training losses of the actor and critic networks, or, for a model-based agent, the environment model training losses.

    function dataToLog = myAgentLearnFinishedFcn(data)
    
        if mod(data.AgentLearnCount, 2) == 0
            dataToLog.ActorLoss  = data.ActorLoss;
            dataToLog.CriticLoss = data.CriticLoss;
        else
            dataToLog = [];
        end
        
    end

    Limitations

    • Only scalar data is supported when logging data with a MonitorLogger object. The structure returned by the callback functions must contain fields with scalar data.

    • Resuming of training from a previous training result is not supported when logging data with a MonitorLogger object.

    Version History

    Introduced in R2022b