RL Agent does not learn

Question

Janika Ofterdinger 2020년 6월 22일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/552940-rl-agent-does-not-learn

답변: rbih rbih 2021년 12월 16일

Hello,

I'm up to Reinforcement Learning with the RL Toolbox. After I built a custom environment in Simulink I have problems training the PG Agent. The RL problem is to control a system with diffuse irradiance, direct irradiance, and temperature as states and a mass flow rate as actions which can be 0 or 30. While Simulink generates a cost function it is also the reward signal and the objective is to reduce the cost over one episode of 24 time steps.

The code is the following:

obsInfo = rlNumericSpec([3 1]);

obsInfo.Name = 'Observation';

actInfo = rlFiniteSetSpec([0 30]);

actInfo.Elements = [0 30];

actInfo.Name = 'Action';

env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);

% Create deep neural network approximator for the actor

net = [ imageInputLayer([3 1 1], 'Normalization', 'none', 'Name', 'state')

fullyConnectedLayer(32, 'Name', 'fc1')

reluLayer('Name','relu1')

fullyConnectedLayer(32,'Name','fc2')

reluLayer('Name','relu2')

fullyConnectedLayer(32,'Name','fc3')

reluLayer('Name','relu3')

fullyConnectedLayer(2, 'Name', 'fc4')

softmaxLayer('Name', 'actionProb') ];

% Create actor

actorOpts = rlRepresentationOptions('LearnRate',0.01,'GradientThreshold',1);

actor = rlStochasticActorRepresentation(net, obsInfo, actInfo, 'Observation', 'state',actorOpts);

% Create Agent

opt = rlPGAgentOptions('DiscountFactor',0.0001);

agent = rlPGAgent(actor);

When I train the agent for 2000 Episodes with different configurations of the neural net I have the problem that it does not converge to a path at all. At some point the policy finds configurations which result in better reward but afterwards the agent does not converge further and does not follow the improved policy.

It would be great if you could help me solve the problem. Do you think this happens due to a insufficient reward signal or does the structure of my neural net not fit my observation and action signals? I also tried using tanhLayer or different amounts of nodes without any success.

Thank you very much for your help!

Best regards

Janika

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

shadi abpeikar 2021년 2월 16일

Hi Janika,

Im just wondering if you find the solution? I have the same problem, and I hope you would give me some hints on how you solved this?

Thanks.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2020년 7월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/552940-rl-agent-does-not-learn#answer_460033

편집: Emmanouil Tzorakoleftherakis 2020년 7월 2일

Hello,

It is really hard to say just by looking at the training plot. The first thing I would try is 1) a different agent (maybe DQN since you have a discrete action space). If the agent is still not learning, assuming your network structure is roughly similar to what you see in other shipping examples, it probably has something to do with the reward or lack of exploration. It sounds like this is a controls problem, so I would highly recommend to look at some of the shipping examples to get a feeling of how different agents behave.

댓글 수: 2
없음 표시없음 숨기기

Janika Ofterdinger 2020년 7월 5일

Thanks for answering my question!

I already tried different agents such as DQN and the training plot looks similar but the issue may be caused by my reward signal. Due to algebraic loops I had to implement a unit delay which affects the reward and holds it for one sample time delay.

To circumvent the issue of exploration I already set the Entropy Loss Weight of the PG Agent to 1.

I also was wondering if the agent is capable to detect a better policy even if the reward is only a little bit better than the old one. Do you think it is better to shape the reward function in the direction to increase the range between a penalty and a reward?

Thanks again for your support!

Best

Janika

Emmanouil Tzorakoleftherakis 2020년 7월 6일

It all depends on the application. On your side you need to make sure that the different terms in the reward signal are comparable and that their relative relationship guides the agent to the desired behavior.

For now, I would probably leave the default values for exploration and focus on the reward. If you start seeing some improvement but the agent is stuck on a local solution, that's a good indication to start playing with exploration parameters.

Also, keep in mind that PG agent is monte carlo-based, so it will likely need more training to learn.

댓글을 달려면 로그인하십시오.

Answer 2

rbih rbih 2021년 12월 16일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/552940-rl-agent-does-not-learn#answer_856445

Hello Janika,

I'm also new on matlab RL thing. though may be the activation plays an important role in the agent learning. have you tried to use (tanh) activation instead of (relu) activation.

let me know.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

RL Agent does not learn

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

RL Agent does not learn

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기