Reinforcement Learning . Sudden very high Rewards during training of RL model.

Question

Sourabh 2023년 5월 24일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1972494-reinforcement-learning-sudden-very-high-rewards-during-training-of-rl-model

댓글: Sourabh 2023년 5월 28일

sir during the training i get sudden very high rewards of order 10e16 (shown in image attached) and i am unable to figure out what is causing this. here is the code i am using and i am also attaching the simulink model.

Tf = 10;

Ts = 0.1;

mdl = 'rl_exam2'

obsInfo = rlNumericSpec([3 1]);

obsInfo.Name = 'observations';

obsInfo.Description = 'integrated error, error, Response';

numObservations = obsInfo.Dimension(1)

actInfo = rlNumericSpec([1 1],'LowerLimit',0,'UpperLimit',1);

actInfo.Name = 'Control Input';

numActions = actInfo.Dimension(1);

%% To Create Environment

env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);

%%

rng(0)

%%

%% To Create Critic Network

statePath = [

imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')

fullyConnectedLayer(50,'Name','CriticStateFC1')

reluLayer('Name','CriticRelu1')

fullyConnectedLayer(40,'Name','CriticStateFC2')];

actionPath = [

imageInputLayer([numActions 1 1],'Normalization','none','Name','Action')

fullyConnectedLayer(40,'Name','CriticActionFC1')];

commonPath = [

additionLayer(2,'Name','add')

reluLayer('Name','CriticCommonRelu')

fullyConnectedLayer(1,'Name','CriticOutput')];

criticNetwork = layerGraph();

criticNetwork = addLayers(criticNetwork,statePath);

criticNetwork = addLayers(criticNetwork,actionPath);

criticNetwork = addLayers(criticNetwork,commonPath);

criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');

criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');

criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);

critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);

actorNetwork = [

imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')

fullyConnectedLayer(40,'Name','actorFC1')

reluLayer('Name','ActorRelu1')

fullyConnectedLayer(numActions,'Name','actorFC2')

tanhLayer('Name','actorTanh')

scalingLayer('Name','Action','Scale',0.5,'Bias',0.5)

];

actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);

actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);

%% To Create Agent

agentOpts = rlDDPGAgentOptions(...

'SampleTime',0.1,...

'TargetSmoothFactor',1e-3,...

'DiscountFactor',1,...

'ExperienceBufferLength',1e6,...

'MiniBatchSize',64,...

'ExperienceBufferLength',1e6);

agentOpts.NoiseOptions.Variance = 0.08;

agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;

agent = rlDDPGAgent(actor,critic,agentOpts)

%% Training Options

maxepisodes = 3000;

maxsteps = ceil(Tf/Ts);

trainingOpts = rlTrainingOptions(...

'MaxEpisodes',maxepisodes,...

'MaxStepsPerEpisode',maxsteps,...

'ScoreAveragingWindowLength',20, ...

'Verbose',false,...

'Plots','training-progress',...

'StopTrainingCriteria','EpisodeCount',...

'StopTrainingValue',1500);

%% TO TRAIN

doTraining = true;

if doTraining

trainingStats = train(agent,env,trainingOpts);

% save('agent_new.mat','agent_ready') %%% to save agent ###

else

% Load pretrained agent for the example.

load('agent_old.mat','agent')

end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2023년 5월 25일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1972494-reinforcement-learning-sudden-very-high-rewards-during-training-of-rl-model#answer_1244649

You should first check the 'error' signal that you feed in the reward for those episodes. Could be that the error becomes too big/the system becomes unstable, which leads to those large negative values

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Emmanouil Tzorakoleftherakis 2023년 5월 25일

Can you send a screenshot? I do not know the specifics of the problem but the reward I saw did not look unreasonable. You can maybe use a smaller scaling factor instead of 15 which I believe you had. But the spikes are probably still coming from the system going unstable. You may want to consider implementing an IsDone signal to stop the simulation if the system goes unstable

Sourabh 2023년 5월 28일

i have tried few reward functions but most i get is my response settle at 0.3 i dont know why

plz have a look

댓글을 달려면 로그인하십시오.

Reinforcement Learning . Sudden very high Rewards during training of RL model.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Reinforcement Learning . Sudden very high Rewards during training of RL model.

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기