Reinforcement Learning agent converges to a suboptimal policy

Question

Jeehwan Lee 2022년 11월 13일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1849648-reinforcement-learning-agent-converges-to-a-suboptimal-policy

답변: Emmanouil Tzorakoleftherakis 2023년 2월 13일

Hello

I am trying to learn an multi-period optimal capacity planning problem. The system has 2 uncertainties that are stochastic, but Markovian and a third state which is the capacity. The benchmark is a single-period planning problem, which I have already solved with MINLP optimization.

I have tried many weeks with different agents, but so far I have not succeeded in getting the agent to learn correctly.

In the graph below (with actor critic) you can see that although it seems that learning takes place, the value is suboptimal (less than the single-period optimization value).

One of the uncertainties is demand. In theory, the agent should increase the capacity observing the demand as states. However at convergence, it does not properly do this.

Note that although I have defined the actions as discrete, not all actions are feasible. To compensate for this, I have clipped the actions as follows:

if TIME_P < DEPLOY_T

Action(Action>1-INS_CAP) = 1-INS_CAP; % If OPTION TO ABANDON is added, then [-CAP_UPPER+INS_CAP:5:CAP_UPPER-INS_CAP]

else

% t>DEPLOY_T

Action = 0;

end

Here, DEPLOY_T is the number of years the capacity planning actions can be exercised. The time-step continues to TERMINAL_P to account for more future cash flows.

I was wondering if anyone has any tips (@Emmanouil Tzorakoleftherakis 's answers on this forum has been particularly helpful, but no luck for me) or could possibly look at the code for me.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2023년 2월 13일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1849648-reinforcement-learning-agent-converges-to-a-suboptimal-policy#answer_1170580

Hello,

In your question you mention a graph but it has not been attached?

It sounds like the agent you trained has converged to a suboptimal solution. If that's the case you probably need to tweak your reward a bit (make sure it is equivalent to your benchmark problem) and possibly make sure the agent is exploring throughout training. Starting simple with a DQN agent would help.The EpsilonDecay and EpsilonMin values are important for exploration (see here). You may also want to randomize the initial condition of your environment. That could help bypass the local solution you converged to.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Reinforcement Learning agent converges to a suboptimal policy

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Reinforcement Learning agent converges to a suboptimal policy

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기