Transient value problem of the variable in reward function of reinforcement learning
이전 댓글 표시
Hello, I encounted a problem when designing the reward function. In the simulink environment, I want to incorporate some variables in the reward function. During the training of RL agent, the varibles will converge after about 0.06s, while the agent is trained from 0s. The enable block doesn't help by putting the RL block in a subsystem.
From my understanding, it will influence the value reward function, which may result in poor trained agent. Does anyone have any suggestions regarding this questions?
Thank you very much.
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Reinforcement Learning Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!