Transient value problem of the variable in reward function of reinforcement learning

Hello, I encounted a problem when designing the reward function. In the simulink environment, I want to incorporate some variables in the reward function. During the training of RL agent, the varibles will converge after about 0.06s, while the agent is trained from 0s. The enable block doesn't help by putting the RL block in a subsystem.
From my understanding, it will influence the value reward function, which may result in poor trained agent. Does anyone have any suggestions regarding this questions?
Thank you very much.

 채택된 답변

You can put the agent block under a triggered subsystem and set it to begin training after 0.06 seconds

댓글 수: 5

Thank you for your reply. I tried this before, while it didn't work well. The Error is shown below:
Subsystem/RL Agent/AgentWrapper has sample time 2e-05. Only constant (inf) or inherited (-1) sample times are allowed in triggered subsystem.
It seems that it cannot be changed inside the block. Could you please try a model of example or give me some suggestions? Thank you.
Can you try an enabled subsystem instead of a triggered one? That may take care of the sample time error
Thank you very much. It works. I got a question, in the configuration of rlTrainingOptions, the maxsteps of each training episode is ceil(T/Ts) (T is the simulation time, and Ts is the sampling time). When using the enable block at 0.06s, e.g. the simulation time T is 0.1s and Ts is 0.001s, the maxsteps calculation should be based on the actual training time 0.04s or the simulation time 0.1s? Which means the value of maxsteps are 40 or 100. From my understanding, the maxsteps should be 40, right?
I believe it should be 40 yes - there is a counter implemented internally that keeps track of how many times the RL Agent block will run
Thank you very much for your help.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Reinforcement Learning Toolbox에 대해 자세히 알아보기

제품

릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by