Why does parallel RL-training not work for SAC agent but for DDPG agents?

Question

Hendrik 2025년 3월 25일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2175611-why-does-parallel-rl-training-not-work-for-sac-agent-but-for-ddpg-agents

답변: Hendrik 2025년 3월 26일

I need to train a SAC Agent for a robotic steering and cannot make it train on parallel workers.

The model is a simulink model with differential equations and everything is connected to the agent-brick (name: rl_agent).

Training on the normal way works totally fine but if I want to set "UseParallel" true, the training stops with the following error text:

Error using rl.internal.train.OffPolicyTrainer/run_internal_ (line 388)
The environment simulation completed with an error.
Error in rl.internal.train.OffPolicyTrainer/run_ (line 39)
            result = run_internal_(this);
Error in rl.internal.train.Trainer/run (line 8)
            result = run_(this);
Error in rl.internal.trainmgr.OnlineTrainingManager/run_ (line 123)
            trainResult = run(trainer);
Error in rl.internal.trainmgr.TrainingManager/run (line 4)
            result = run_(this);
Error in rl.agent.AbstractAgent/train (line 86)
    trainingResult = run(tm);
Caused by:
    Error using rl.env.internal.reportSimulinkSimError>localReframeSimError (line 65)
    An error occurred while running the simulation for model 'robomod2D_plot' with the following RL agent blocks:
     	robomod2D_plot/rl_agent
        Error using rl.env.internal.reportSimulinkSimError (line 29)
        Block 'robomod2D_plot/rl_agent/Evaluate Policy/Execute Policy/Enabled Policy Evaluator/Policy Evaluator/Policy Process Experience Internal' outputs 'NaN' for element 1 of output port 1 at major time step 0 

The abortion always takes place at the training episode equal to the double number of workers (eg 4 workers -> error at episode 8). I changed the number of workers and sync/async mode and no solution. If set is async, then the shown error code is shown, if the setting is sync, then the lower part is shown 4 times.

I tried different reward functions as well and got a problem as well, which seems to be disappeared now (reward in simulation is ~-500 but in training it gets a value of ~-1e73 and I could not find a reason for that)

When I changed the agent and agent options from SAC to DDPG, training works perfectly well. So, where is the difference in the SAC agents that causes the trouble? Is there an issue with the "entropy term" and the transfer from the worker/to the worker?

It is so strange because this time the issue seems not to be in the simulink model but somewhere else. The simulink model uses only standard blocks (agent, step delay, matlab functions, scope, rate transition for plot (delete does not make a change), transpose, matrix multiply, add, product, integrator and initial condition block (delete will lead to problems in model)

I am using an thinkpad with 155H processor.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Hendrik 2025년 3월 26일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2175611-why-does-parallel-rl-training-not-work-for-sac-agent-but-for-ddpg-agents#answer_1562531

It looks as if the Matlab support can read my mind - one day later I got the notification about the R2024b Update 5, which solves exactly the problem. During parallel computing the action limits were not transferred which resulted in exorbitant low rewards and numerical errors (NaN)!

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Why does parallel RL-training not work for SAC agent but for DDPG agents?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Why does parallel RL-training not work for SAC agent but for DDPG agents?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기