Could I learn from past data INCLUDING actions? Could I make vector with actions to be used in a certain order?

Question

Cecilia S. 2021년 6월 16일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/858170-could-i-learn-from-past-data-including-actions-could-i-make-vector-with-actions-to-be-used-in-a-cer

댓글: Cecilia S. 2021년 6월 22일

If I have a complete set of past data (observations) and a list of the actions taken by some agent (or human), could I update my policy using that instead of running my simulated environment dynamics?

I have a DQN agent that was initially trained using simulated data. As usual, my agent chose actions following some policy and some action selection method (in my case, epsilon greedy selection). Now I would like to update my dnn with real world past data, how could that be done?

I don't seem to be able to modify the action as an input in the step function (I could modify it afterwards but if I do that, then the agent would be evaluating the wrong action). Is there a way to "force" the action value (at the input of the step function) so that the system evaluates that action instead of the one selected by my current exploration/exploitaition method?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2021년 6월 22일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/858170-could-i-learn-from-past-data-including-actions-could-i-make-vector-with-actions-to-be-used-in-a-cer#answer_730740

Hello,

If the historical observations do not depend on the actions taken, (think of stock values, or historical power demand), you could set up your environment so that the agent uses this data for observations. The agent will still be taking actions though.

If the above is not the case, what you are referring to is often called offline RL. This is something we are looking at but we do not have functionality that supports this right now.

Hope this helps

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Cecilia S. 2021년 6월 22일

It was the second option, yes. Thank you very much! I hope it rolls out soon!

I'll mark it as answered.

댓글을 달려면 로그인하십시오.

Could I learn from past data INCLUDING actions? Could I make vector with actions to be used in a certain order?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Could I learn from past data INCLUDING actions? Could I make vector with actions to be used in a certain order?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기