Reinforcement Learning on Simscape
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
0 개 추천
I am having an issue with RL in simscape. I added a unit delay in order to break an algrbraic loop but what the unit delay initial condition does it that it sets the value I want to change to a constant equal to the initial condition of the unit delay bloc. Do you by any chance know what might be the problem causing this ?
I will add a screenshot of the training.
채택된 답변
Emmanouil Tzorakoleftherakis
2024년 6월 28일
0 개 추천
One option is to look at introducing the delay on the observation, not the action. Please take a look at this page for more details
댓글 수: 14
@Emmanouil TzorakoleftherakisI have another question that I am not able to answer, my actions are constant at each episode.
Emmanouil Tzorakoleftherakis
2024년 7월 1일
편집: Emmanouil Tzorakoleftherakis
2024년 7월 1일
Hi, In general, if you have a question unrelated to the original one, it's a good idea to start a separate thread for visibility. Not sure which agent you are using, but make sure your exploration options make sense. Also, let the agent run for a few episodes first as sometimes the behavior you are describing is common in the initial episodes.
@Emmanouil Tzorakoleftherakis My apologies. I am using a PPO agent with a 0.001 Learning rate for both the actor and the critic. I did a trainning over 50 episodes but the action is still constant (it's changing from an episode to another tho). I am very new to RL and I am trying mostly with trial and error. Thank you for your previous responses.
Np. Which release are you using? How long are your episodes? What is the agent sample size? What is your reward? Also, I would let training continue for a few hundred episodes and check again if the issue persists.
Karim Darwich
2024년 7월 1일
편집: Karim Darwich
2024년 7월 1일
@Emmanouil Tzorakoleftherakis Thank you for your response.
I am using a PPO agent with the hyperparameters in the screenshot attached to the message. I am using 2023a. The reward is also attached to this message. Thank you in advance hope this clears the situation. I am actually doing an RL control on a model I did myself for the control of district heating networks.
PS: I will later on create another question and link it to this one that way the question would be much more visible.
Emmanouil Tzorakoleftherakis
2024년 7월 1일
편집: Emmanouil Tzorakoleftherakis
2024년 7월 1일
Thanks. It seems your agent sample time is the same as the episode duration. Is that expected? How often do you expect your agent to take actions? Regardless, that explains what you are seeing. The agent will basically only take one action per episode,so 50 actions in total for 50 episodes. This is really not sufficient training time.
@Emmanouil Tzorakoleftherakis Yes I did not see that I had the same sample time and experience horizon. Thank you very much.
Emmanouil Tzorakoleftherakis
2024년 7월 1일
편집: Emmanouil Tzorakoleftherakis
2024년 7월 1일
I was actually referring to the isdone signal in the reward function. It is set to true at t=86400 which is the same as the agent sample time.
@Emmanouil Tzorakoleftherakis Oh sorry I see ! And what option would be a good one in that case ?
Emmanouil Tzorakoleftherakis
2024년 7월 1일
편집: Emmanouil Tzorakoleftherakis
2024년 7월 1일
Depends on your problem and how frequently your agent needs to take actions (that's determined by the agent sample time)
@Emmanouil Tzorakoleftherakis So for example if I want my agent to change the mass flow every hour for one day I should put the sample time at 3600 (the number of seconds in an hour) with the isdone condition at 86400 (the number of seconds in a day)
Correct. Alternatively, you could use the maxstepsperepisode training option and leave the isdone flag to be false all the time. The IsDone flag can be used for cases where you want to terminate an episode early (e.g. some constraint is being violated, etc.)
@Emmanouil Tzorakoleftherakis Perfect. Thank you very much sir !
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Training and Simulation에 대해 자세히 알아보기
참고 항목
2024년 6월 28일
2024년 7월 2일
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
