Reinforcement learning unable to dupilcapte the best reward i had during training
조회 수: 2 (최근 30일)
이전 댓글 표시
I use matlab RL toolbox to train a model and I set following rltrainingoptions:
op = rlTraingOptions('StopTrainingCriteria','EpisodeReward','StopTrainingValue',100);
the training process stops when the episodeReawrd>100, however when i used the trained agent to simulate, the episode reward is much lower than 100. Does anybody know why? The other condition is exactly the same.
댓글 수: 0
답변 (1개)
Emmanouil Tzorakoleftherakis
2023년 1월 26일
Just because the reward of a single episode meets the desired performance, this does not mean that when you stop ttraining you should see exactly the same behavior from the agent. It could be that the agent was influenced by parameters such as exploration, environment noise etc to get to this result.
Before stopping training, you shouldbe able to see consistent good behavior across multiple episodes in a row (or high average episode reward). In that case, after stopping training, the agent behavior should be close to what you saw in training.
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Training and Simulation에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!