Reinforcement Learning Sample Time

조회 수: 22 (최근 30일)
Braydon Westmoreland
Braydon Westmoreland 2020년 6월 27일
댓글: Kai Tybussek 2020년 7월 15일
Sorry if this is a dumb question, but I am not sure how to configure the sample time on my reinforcement learning agent so that it will properly interact with the Simscape Electrical environment I've created. My goal is for the RL agent to output an action every 1 seconds and then that action is used to update the MOSFET gate voltages in the environment. The environment then uses the new gate voltages to perform a 100 micro second pulse where the MOSFET's drain-source currents are measured midway through the pulse. The measured current is used to determine the agent's rewards in addition to determining when an episode is over. An episode is over when the agent has (mostly) balanced the 4 measured currents within a defined threshold.
My confusion comes when trying to setup the timing in the environment, particularly the timing of the outputs that go to the RL agent. The agent requires the environment to output every Ts (1 sec), but I need an additional delay of roughly 100 micro seconds in order for the pulse and subsequent current measurements to take place.
I believe I have a fundamental misunderstanding of the way sample time works here. Any help is greatly appreciated. Thank you
Additional note: there is a bug where the agent is outputting the same sequence of actions every episode, regardless of the previous observation.

답변 (1개)

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2020년 7월 2일
Hi Braydon,
The agent sample time effectively determines how often the agent will output a decision/action. Think of it as the equivalent of your control application time. If you need new actions every 100us, that should be your sample time. If new actions every 1 second are enough, then the environment could consume the same action for 10 consecutive time steps (assuming 100us sample time for the environment) until a new action is available 1 second later.
If you want to add a delay in the observation inputs, you can always use a delay block.
This may not be exactly the same application, but this video that shows how to use RL for motor control by setting PWM references may be helpful.
  댓글 수: 1
Kai Tybussek
Kai Tybussek 2020년 7월 15일
what do i have to do if i want the agent to perform one action, see if "isDone=1" and if not reset to initial observation and do another action ? My Sample time in this case is 1 and my steps per episode need to be 1 too?

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Training and Simulation에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by