SAC RL agent does not explore properly (rlSACAgent)
이전 댓글 표시
Hi,
I'm trying to create a SAC RL agent. The agent can set 8 separate continuous actions with the same upper and lower bound (-10 and 10).
During training I observe that the actions chosen are (almost!) always one of the two bounds. So they often fluctuate between the minimum or the maximum. Sporadically another value is chosen for one of the actions.
I've found a similar question HERE, but the answer given did not solve the issue. (The range of the action space for all actions is already the same, and EntropyWeight did not change anything). Besides, I've tried to scale the reward, such as suggested in this article.
Are there any other methods for solving such problem? Or could it be that the must have some patience, and train the agent for more episodes, such that the problem is solved by itself?
Thanks in advance for any reply.
Kind regards,
댓글 수: 3
Emmanouil Tzorakoleftherakis
2021년 6월 24일
Can you share the actor architecture? This most likely has to do with that
Willemijn Remmerswaal
2021년 6월 24일
Touleen Ibrahim
2024년 4월 2일
Hi, I see the question is posted long time ago but I have faced the same problem and found the root cause and I would to share it, hopping it will help others.
The input consists of two types or more of data, normalization of the componant should be considered. Otherwise, the output of the actor neural network will be biased to the larger componant values.
BR
답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Reinforcement Learning에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


