运用MAPPO训练后进行验证，发现保存的所有Agent.mat文件的的验证reward回合奖励都是一个数，而且效果非常差

Question

郭欣 2023년 8월 15일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2008857-mappo-agent-mat-reward

댓글: 郭欣 2023년 8월 21일

我运用MAPPO强化学习做了500个episode的训练，训练结果从奖励函数来看是收敛了，训练结果如下：

训练加载的环境相对于验证环境复杂一点，但是我验证时发现所有保存的Agent输出的回合奖励非常差，而且所有episode保存的Agent都是一样的结果，感觉网络没有被训练收敛或者正常保存，我的文件写法如下：

trainOpts = rlMultiAgentTrainingOptions(...

"AgentGroups","auto",...%%{[1,2]}

"LearningStrategy","decentralized",...

"MaxEpisodes",500,...

"MaxStepsPerEpisode",Tf/Ts,...

"ScoreAveragingWindowLength",10,...

"StopTrainingCriteria","AverageReward",...

"StopTrainingValue",99999990,...

'Verbose',true,...%在命令行输出训练进度

SaveAgentCriteria="Averagereward",...

SaveAgentValue=-inf);

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Sarthak 2023년 8월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2008857-mappo-agent-mat-reward#answer_1290362

Hi 郭欣,

As per my understanding, there could be a few potential reasons for the poor performance you’re observing:

As you mentioned that the training environment is slightly more complex than the validation environment, it is important to have similar complexities between the training and validation to ensure that the trained agents can be generalized.
Also, to improve the effectiveness of MAPPO, you may want to increase the amount of training data which covers a wide range of scenarios.
If not, already you may also have to normalize the values to stabilize value learning
Analysis and refinement of the reward function might also be necessary.

However, it is difficult to pinpoint the exact reason why you are experiencing poor results without looking at your implementation, you can implement a thorough debugging process to track the training progress and identify potential issues.

You can also refer to the following documentation for a better understanding of how to train your reinforcement learning agents

https://www.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-agents.html

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

郭欣 2023년 8월 21일

Thank you very much for your answer, in fact, the most confusing thing for me is that the 500 episodes of gent I trained and saved turned out to be exactly the same in the new verification environment when I was verifying, even if I use the trained data when I verified. What is the reason for this?

댓글을 달려면 로그인하십시오.

运用MAPPO训练后进行验证，发现保存的所有Agent.mat文件的的验证reward回合奖励都是一个数，而且效果非常差

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

运用MAPPO训练后​进行验证，发现保存的​所有Agent.ma​t文件的的验证rew​ard回合奖励都是一​个数，而且效果非常差

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

运用MAPPO训练后进行验证，发现保存的所有Agent.mat文件的的验证reward回合奖励都是一个数，而且效果非常差

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기