Agent repeats same sequence of actions each episode

Question

Braydon Westmoreland 2020년 7월 1일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/557872-agent-repeats-same-sequence-of-actions-each-episode

편집: Emmanouil Tzorakoleftherakis 2020년 7월 2일

Can someone please help me understand why my RL Agent is outputting the same sequence of actions each episode, regardless of the observations made from the environment. Here is an example of what I mean:

prev_state = 11.20 11.90 11.30 11.50

action = 0.00 0.00 0.00 0.00

new_state = 11.20 11.90 11.30 11.50

prev_state = 11.20 11.90 11.30 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 11.30 12.00 11.20 11.50

prev_state = 11.30 12.00 11.20 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 11.40 12.00 11.10 11.50

prev_state = 11.40 12.00 11.10 11.50

action = -0.10 -0.10 0.10 0.00

new_state = 11.30 11.90 11.20 11.50

prev_state = 11.30 11.90 11.20 11.50

action = 0.00 0.00 0.10 0.10

new_state = 11.30 11.90 11.30 11.60

prev_state = 12.00 11.20 11.70 11.50

action = 0.00 0.00 0.00 0.00

new_state = 12.00 11.20 11.70 11.50

prev_state = 12.00 11.20 11.70 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 12.00 11.30 11.60 11.50

prev_state = 12.00 11.30 11.60 11.50

action = 0.10 0.10 -0.10 0.00

new_state = 12.00 11.40 11.50 11.50

prev_state = 12.00 11.40 11.50 11.50

action = -0.10 -0.10 0.10 0.00

new_state = 11.90 11.30 11.60 11.50

prev_state = 11.90 11.30 11.60 11.50

action = 0.00 0.00 0.10 0.10

new_state = 11.90 11.30 11.70 11.60

Let me know if you have any questions about the simulation.

More info on the simulation & my other issues: https://www.mathworks.com/matlabcentral/answers/555799-reinforcement-learning-sample-time

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2020년 7월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/557872-agent-repeats-same-sequence-of-actions-each-episode#answer_460096

편집: Emmanouil Tzorakoleftherakis 2020년 7월 2일

Hi Braydon,

I am not really sure why you are only looking at the first two episodes. RL can take thousands of episodes to converge so the first few really don't give you enough information. As a matter of fact, I ran your models for 20 episodes and the action sequence was different after a few episodes or so. If nothing else, I would check the reward formulation since this would drive how the neural networks weights change and thus how actions are selected (in addition to exploration).

1.0000e-04

prev_state = 11.90 11.90 12.00 11.20

action = 0.00 0.00 0.00 0.00

new_state = 11.90 11.90 12.00 11.20

prev_state = 11.90 11.90 12.00 11.20

action = 0.10 0.10 -0.10 0.00

new_state = 12.00 12.00 11.90 11.20

prev_state = 12.00 12.00 11.90 11.20

action = -0.10 0.00 -0.10 0.10

new_state = 11.90 12.00 11.80 11.30

prev_state = 11.90 12.00 11.80 11.30

action = -0.10 0.10 0.00 -0.10

new_state = 11.80 12.00 11.80 11.20

prev_state = 11.80 12.00 11.80 11.20

action = 0.10 0.00 -0.10 0.00

new_state = 11.90 12.00 11.70 11.20

1.0000e-04

prev_state = 11.70 11.90 11.50 11.60

action = 0.00 0.00 0.00 0.00

new_state = 11.70 11.90 11.50 11.60

prev_state = 11.70 11.90 11.50 11.60

action = 0.10 0.10 -0.10 0.00

new_state = 11.80 12.00 11.40 11.60

prev_state = 11.80 12.00 11.40 11.60

action = -0.10 0.00 -0.10 0.10

new_state = 11.70 12.00 11.30 11.70

prev_state = 11.70 12.00 11.30 11.70

action = -0.10 0.10 0.00 -0.10

new_state = 11.60 12.00 11.30 11.60

prev_state = 11.60 12.00 11.30 11.60

action = 0.10 0.00 -0.10 0.00

new_state = 11.70 12.00 11.20 11.60

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Agent repeats same sequence of actions each episode

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Agent repeats same sequence of actions each episode

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기