Time-varying policy function

Question

Matheus Silva 2023년 5월 24일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1972534-time-varying-policy-function

댓글: Emmanouil Tzorakoleftherakis 2023년 5월 30일

Hi,

I am wondering if it is possible to have time-varying (non-stationary) policy functions in the reinforcement learning toolbox.

For example, say my episode lasts three periods (t=1,2,3), then I would have the set

where

is some neural network structure indexed by a general vector of parameters ϑ, which will ultimately depend on the time period.

Is that possible to do with the toolbox?

Thank you so much!

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2023년 5월 25일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1972534-time-varying-policy-function#answer_1244654

Why don't you just train 3 separate policies and pick and choose as needed?

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Matheus Silva 2023년 5월 28일

편집: Matheus Silva 2023년 5월 28일

My problem is that my periods can be related in some arbitrary way. For example, I am thinking of a model where the state

can vary according to

Where

is a stochastic term and

is some transition function. However, I may want to allow some relation between the stochastic terms in periods 1 and 3. Solving the problem period by period would eliminate that dependence, no?

Emmanouil Tzorakoleftherakis 2023년 5월 30일

Honestly, I think your best bet would be to use the same policy throughout, but maybe use an input signal to the neural net to indicate which period you are in based on your state.

Another option, which is similar to what I mentioned earlier, is to train 3 different policies. To work around the period dependencies, you can place the RL policy block inside a triggered subsystem and only enable the subsystem for training when the system is in the appropriate period. Do that for each policy and then you can switch between the 3 as needed. See here

댓글을 달려면 로그인하십시오.

Time-varying policy function

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Time-varying policy function

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기