Transient value problem of the variable in reward function of reinforcement learning

2021 3월 22

1 답변

답변 채택됨

업데이트 시간: 2021 3월 23

조회 수: 6 (30일)

이 질문에 답변하려면 로그인하십시오.

Follow Question

이 질문에 답변하려면 로그인하십시오.

Follow Question

이전 댓글 표시

1 개 추천

Hello, I encounted a problem when designing the reward function. In the simulink environment, I want to incorporate some variables in the reward function. During the training of RL agent, the varibles will converge after about 0.06s, while the agent is trained from 0s. The enable block doesn't help by putting the RL block in a subsystem.

From my understanding, it will influence the value reward function, which may result in poor trained agent. Does anyone have any suggestions regarding this questions?

Thank you very much.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

채택된 답변

Emmanouil Tzorakoleftherakis 2021년 3월 22일

0 개 추천

You can put the agent block under a triggered subsystem and set it to begin training after 0.06 seconds

댓글 수: 5
이전 댓글 3개 표시 이전 댓글 3개 숨기기

Emmanouil Tzorakoleftherakis 2021년 3월 23일

I believe it should be 40 yes - there is a counter implemented internally that keeps track of how many times the RL Agent block will run

Yihao Wan 2021년 3월 23일

Thank you very much for your help.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

이 질문에 답변하려면 로그인하십시오.

카테고리

도움말 센터 및 File Exchange에서 Reinforcement Learning Toolbox에 대해 자세히 알아보기

제품

Simulink

릴리스

R2021a

태그

2021년 3월 22일

2021년 3월 23일

Emmanouil Tzorakoleftherakis

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Translated by