Reinforcement Learning Sample Time

Question

Braydon Westmoreland 2020년 6월 27일

2
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/555799-reinforcement-learning-sample-time

댓글: Kai Tybussek 2020년 7월 15일

Sorry if this is a dumb question, but I am not sure how to configure the sample time on my reinforcement learning agent so that it will properly interact with the Simscape Electrical environment I've created. My goal is for the RL agent to output an action every 1 seconds and then that action is used to update the MOSFET gate voltages in the environment. The environment then uses the new gate voltages to perform a 100 micro second pulse where the MOSFET's drain-source currents are measured midway through the pulse. The measured current is used to determine the agent's rewards in addition to determining when an episode is over. An episode is over when the agent has (mostly) balanced the 4 measured currents within a defined threshold.

My confusion comes when trying to setup the timing in the environment, particularly the timing of the outputs that go to the RL agent. The agent requires the environment to output every Ts (1 sec), but I need an additional delay of roughly 100 micro seconds in order for the pulse and subsequent current measurements to take place.

I believe I have a fundamental misunderstanding of the way sample time works here. Any help is greatly appreciated. Thank you

Additional note: there is a bug where the agent is outputting the same sequence of actions every episode, regardless of the previous observation.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2020년 7월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/555799-reinforcement-learning-sample-time#answer_460051

Hi Braydon,

The agent sample time effectively determines how often the agent will output a decision/action. Think of it as the equivalent of your control application time. If you need new actions every 100us, that should be your sample time. If new actions every 1 second are enough, then the environment could consume the same action for 10 consecutive time steps (assuming 100us sample time for the environment) until a new action is available 1 second later.

If you want to add a delay in the observation inputs, you can always use a delay block.

This may not be exactly the same application, but this video that shows how to use RL for motor control by setting PWM references may be helpful.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Kai Tybussek 2020년 7월 15일

what do i have to do if i want the agent to perform one action, see if "isDone=1" and if not reset to initial observation and do another action ? My Sample time in this case is 1 and my steps per episode need to be 1 too?

댓글을 달려면 로그인하십시오.

Reinforcement Learning Sample Time

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Reinforcement Learning Sample Time

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기