geting negative reward in two agent while other 2 are geting trained

조회 수: 3 (최근 30일)
Kartikeya
Kartikeya 2023년 6월 3일
답변: TARUN 2025년 6월 11일
I am able to train agent A5 and agent 8, but getting constant negative reward in agent 6 and 7. I am trying to control the quadrotor

답변 (1개)

TARUN
TARUN 2025년 6월 11일
Based on the training plot and code you shared, Agent6 and Agent7 are consistently receiving negative rewards, while Agent5 and Agent8 are learning effectively.
This typically points to either an environment setup issue or improper agent configuration.
You can follow the below steps to fix this issue:
1. Incorrect Block Path for Agent7: There seems to be an extra space in the block path:
'rl_backstep_Multi/ Agent7'
It should be:
'rl_backstep_Multi/Agent7'
A malformed block path will prevent proper environment-agent linking.
2. Shared Observation/Action Specifications: All agents are using the same ainfo and oinfo objects. This results in shared handles, which may cause unexpected behavior. Instead, define separate specs for each agent like:
oinfo1 = rlNumericSpec([2 1]); oinfo1.Name = 'obs1';
ainfo1 = rlNumericSpec([1 1]); ainfo1.Name = 'act1';
% Repeat it for each agent
3. Reward Signal Issue Agent6 and Agent7 might be receiving either invalid or overly penalizing rewards. Please check your reward block logic inside Simulink — make sure it’s producing finite and meaningful values throughout training. You can log or scope the reward signals to debug this.
4. Hyperparameter Tuning Sometimes, PPO agents can diverge depending on the learning rate or entropy weight. You could reduce the ActorOptimizerOptions.LearnRate and EntropyLossWeight slightly for Agent6 and 7 to stabilize their learning.
Feel free to go through the following documentation to understand agent-environment integration and reward design:

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by