이 질문을 팔로우합니다.

팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.

Reinforcement Learning on Simscape

조회 수: 4 (최근 30일)

이전 댓글 표시

Karim Darwich 2024년 6월 28일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape

⋮

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape

댓글: Emmanouil Tzorakoleftherakis 2024년 7월 2일

채택된 답변: Emmanouil Tzorakoleftherakis

untitled1.png
untitled11png.png

I am having an issue with RL in simscape. I added a unit delay in order to break an algrbraic loop but what the unit delay initial condition does it that it sets the value I want to change to a constant equal to the initial condition of the unit delay bloc. Do you by any chance know what might be the problem causing this ?

I will add a screenshot of the training.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

채택된 답변

Emmanouil Tzorakoleftherakis 2024년 6월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#answer_1478456

⋮

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#answer_1478456

One option is to look at introducing the delay on the observation, not the action. Please take a look at this page for more details

댓글 수: 14
이전 댓글 12개 표시이전 댓글 12개 숨기기

Karim Darwich 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199421

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199421

@Emmanouil TzorakoleftherakisI have another question that I am not able to answer, my actions are constant at each episode.

Emmanouil Tzorakoleftherakis 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199456

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199456

편집: Emmanouil Tzorakoleftherakis 2024년 7월 1일

Hi, In general, if you have a question unrelated to the original one, it's a good idea to start a separate thread for visibility. Not sure which agent you are using, but make sure your exploration options make sense. Also, let the agent run for a few episodes first as sometimes the behavior you are describing is common in the initial episodes.

Karim Darwich 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199571

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199571

@Emmanouil Tzorakoleftherakis My apologies. I am using a PPO agent with a 0.001 Learning rate for both the actor and the critic. I did a trainning over 50 episodes but the action is still constant (it's changing from an episode to another tho). I am very new to RL and I am trying mostly with trial and error. Thank you for your previous responses.

Emmanouil Tzorakoleftherakis 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199641

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199641

Np. Which release are you using? How long are your episodes? What is the agent sample size? What is your reward? Also, I would let training continue for a few hundred episodes and check again if the issue persists.

Karim Darwich 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199656

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199656

편집: Karim Darwich 2024년 7월 1일

Hyperparameters .png
Rewards .png

@Emmanouil Tzorakoleftherakis Thank you for your response.

I am using a PPO agent with the hyperparameters in the screenshot attached to the message. I am using 2023a. The reward is also attached to this message. Thank you in advance hope this clears the situation. I am actually doing an RL control on a model I did myself for the control of district heating networks.

PS: I will later on create another question and link it to this one that way the question would be much more visible.

Emmanouil Tzorakoleftherakis 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199771

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199771

편집: Emmanouil Tzorakoleftherakis 2024년 7월 1일

Thanks. It seems your agent sample time is the same as the episode duration. Is that expected? How often do you expect your agent to take actions? Regardless, that explains what you are seeing. The agent will basically only take one action per episode,so 50 actions in total for 50 episodes. This is really not sufficient training time.

Karim Darwich 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199891

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199891

@Emmanouil Tzorakoleftherakis Yes I did not see that I had the same sample time and experience horizon. Thank you very much.

Emmanouil Tzorakoleftherakis 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199911

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199911

편집: Emmanouil Tzorakoleftherakis 2024년 7월 1일

I was actually referring to the isdone signal in the reward function. It is set to true at t=86400 which is the same as the agent sample time.

Karim Darwich 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199921

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199921

@Emmanouil Tzorakoleftherakis Oh sorry I see ! And what option would be a good one in that case ?

Emmanouil Tzorakoleftherakis 2024년 7월 1일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199931

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3199931

편집: Emmanouil Tzorakoleftherakis 2024년 7월 1일

Depends on your problem and how frequently your agent needs to take actions (that's determined by the agent sample time)

Karim Darwich 2024년 7월 2일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3200196

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3200196

@Emmanouil Tzorakoleftherakis So for example if I want my agent to change the mass flow every hour for one day I should put the sample time at 3600 (the number of seconds in an hour) with the isdone condition at 86400 (the number of seconds in a day)

Emmanouil Tzorakoleftherakis 2024년 7월 2일

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3200366

⋮

링크

이 댓글에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2132831-reinforcement-learning-on-simscape#comment_3200366

Correct. Alternatively, you could use the maxstepsperepisode training option and leave the isdone flag to be false all the time. The IsDone flag can be used for cases where you want to terminate an episode early (e.g. some constraint is being violated, etc.)

Karim Darwich 2024년 7월 2일