geting negative reward in two agent while other 2 are geting trained

Question

Kartikeya 2023년 6월 3일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1977634-geting-negative-reward-in-two-agent-while-other-2-are-geting-trained

답변: TARUN 2025년 6월 11일

I am able to train agent A5 and agent 8, but getting constant negative reward in agent 6 and 7. I am tr

ying to control the quadrotor

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

TARUN 2025년 6월 11일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1977634-geting-negative-reward-in-two-agent-while-other-2-are-geting-trained#answer_1566346

MATLAB Online에서 열기

Hi @Kartikeya,

Based on the training plot and code you shared, Agent6 and Agent7 are consistently receiving negative rewards, while Agent5 and Agent8 are learning effectively.

This typically points to either an environment setup issue or improper agent configuration.

You can follow the below steps to fix this issue:

1. Incorrect Block Path for Agent7: There seems to be an extra space in the block path:

'rl_backstep_Multi/ Agent7'

It should be:

'rl_backstep_Multi/Agent7'

A malformed block path will prevent proper environment-agent linking.

2. Shared Observation/Action Specifications: All agents are using the same ainfo and oinfo objects. This results in shared handles, which may cause unexpected behavior. Instead, define separate specs for each agent like:

oinfo1 = rlNumericSpec([2 1]); oinfo1.Name = 'obs1';  
ainfo1 = rlNumericSpec([1 1]); ainfo1.Name = 'act1'; 

% Repeat it for each agent

3. Reward Signal Issue Agent6 and Agent7 might be receiving either invalid or overly penalizing rewards. Please check your reward block logic inside Simulink — make sure it’s producing finite and meaningful values throughout training. You can log or scope the reward signals to debug this.

4. Hyperparameter Tuning Sometimes, PPO agents can diverge depending on the learning rate or entropy weight. You could reduce the ActorOptimizerOptions.LearnRate and EntropyLossWeight slightly for Agent6 and 7 to stabilize their learning.

Feel free to go through the following documentation to understand agent-environment integration and reward design:

rlSimulinkEnv: https://www.mathworks.com/help/releases/R2023a/reinforcement-learning/ref/rlsimulinkenv.html?searchHighlight=rlsimulinkenv&s_tid=doc_srchtitle

RL toolbox: https://www.mathworks.com/help/releases/R2023a/reinforcement-learning/index.html?searchHighlight=reinforcement+learning+toolbox&s_tid=doc_srchtitle

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

geting negative reward in two agent while other 2 are geting trained

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

geting negative reward in two agent while other 2 are geting trained

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기