It looks as if the Matlab support can read my mind - one day later I got the notification about the R2024b Update 5, which solves exactly the problem. During parallel computing the action limits were not transferred which resulted in exorbitant low rewards and numerical errors (NaN)!
Why does parallel RL-training not work for SAC agent but for DDPG agents?
조회 수: 24 (최근 30일)
이전 댓글 표시
I need to train a SAC Agent for a robotic steering and cannot make it train on parallel workers.
The model is a simulink model with differential equations and everything is connected to the agent-brick (name: rl_agent).
Training on the normal way works totally fine but if I want to set "UseParallel" true, the training stops with the following error text:
Error using rl.internal.train.OffPolicyTrainer/run_internal_ (line 388)
The environment simulation completed with an error.
Error in rl.internal.train.OffPolicyTrainer/run_ (line 39)
result = run_internal_(this);
Error in rl.internal.train.Trainer/run (line 8)
result = run_(this);
Error in rl.internal.trainmgr.OnlineTrainingManager/run_ (line 123)
trainResult = run(trainer);
Error in rl.internal.trainmgr.TrainingManager/run (line 4)
result = run_(this);
Error in rl.agent.AbstractAgent/train (line 86)
trainingResult = run(tm);
Caused by:
Error using rl.env.internal.reportSimulinkSimError>localReframeSimError (line 65)
An error occurred while running the simulation for model 'robomod2D_plot' with the following RL agent blocks:
robomod2D_plot/rl_agent
Error using rl.env.internal.reportSimulinkSimError (line 29)
Block 'robomod2D_plot/rl_agent/Evaluate Policy/Execute Policy/Enabled Policy Evaluator/Policy Evaluator/Policy Process Experience Internal' outputs 'NaN' for element 1 of output port 1 at major time step 0
The abortion always takes place at the training episode equal to the double number of workers (eg 4 workers -> error at episode 8). I changed the number of workers and sync/async mode and no solution. If set is async, then the shown error code is shown, if the setting is sync, then the lower part is shown 4 times.
I tried different reward functions as well and got a problem as well, which seems to be disappeared now (reward in simulation is ~-500 but in training it gets a value of ~-1e73 and I could not find a reason for that)
When I changed the agent and agent options from SAC to DDPG, training works perfectly well. So, where is the difference in the SAC agents that causes the trouble? Is there an issue with the "entropy term" and the transfer from the worker/to the worker?
It is so strange because this time the issue seems not to be in the simulink model but somewhere else. The simulink model uses only standard blocks (agent, step delay, matlab functions, scope, rate transition for plot (delete does not make a change), transpose, matrix multiply, add, product, integrator and initial condition block (delete will lead to problems in model)
I am using an thinkpad with 155H processor.
댓글 수: 0
답변 (1개)
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!