Create custom policy function for a RL DQN.

Question

0 개 추천

Hi Community,

I am working on a project that requires me to have a little bit modification of the DQN policy. The learned function is still Q, but instead of taking the argmax Q(s,a), I have a few more conditions added (most likely some if statement as hard constraints). I am wondering if it is ever possible for me to make this change? If so, where should i work on?

Best regards,

Yiyang

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Anh Tran 2020년 3월 27일

MATLAB Online에서 열기

0 개 추천

Currently there I do not see any workaround to modify DQN policy directly with buit-in rlDQNAgent. A possible workaround is to reimplement DQN agent with rlQValueRepresentation, introduced in MATLAB R2020a

You can refer to RL custom train loop example where we implement vanilla policy gradients with RL Toolbox.

For discrete action, I would recommend multi-output Q value representation Q(o) (better performance than Q(o,a)).

% create Q(o) critic, assumed you defined NeuralNet,ObservationInfo,ActionInfo
Critic = rlQValueRepresentation(NeuralNet,ObservationInfo,ActionInfo,'Observation',ObsLayerName);
% get state-action values of an observation RandomObservation
Q = getValue(Critic,RandomObservation)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Create custom policy function for a RL DQN.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

Create custom policy function for a RL DQN.

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기