training is not efficient for RL agent

Basel Mohammed

2023 2월 18

0 답변

조회 수: 7 (30일)

0 개 추천

Hello,

I am trying to use the Reinforcement Learning toolbox for an energy optimization problem. I have started by a simple RL agent the DQN and the critic network code is as shown below

nI = 4;  % number of inputs (4)
nL = 400;                           % number of neurons
nO = 101;    % number of possible outputs (101)
dnn = [
    featureInputLayer(nI,'Normalization','none','Name','state')
    fullyConnectedLayer(nL,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(nL/2,'Name','fc2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(nL/4,'Name','fc3')
    reluLayer('Name','relu3')
    fullyConnectedLayer(nO,'Name','fc4')];
figure(1)
plot(layerGraph(dnn)

I have used the following options for the critic, agent, and training respectively.

criticOpts = rlRepresentationOptions('LearnRate',0.1,'GradientThreshold',1,...
    'UseDevice','gpu');
agentOpts = rlDQNAgentOptions(...
    'UseDoubleDQN',false, ...    
    'ExperienceBufferLength',1e5, ...
    'DiscountFactor',0.99, ...
    'MiniBatchSize',256,...
    'SaveExperienceBufferWithAgent',true,...
    'SampleTime',1);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',1000,...
    'MaxStepsPerEpisode',3000,...
    'StopTrainingCriteria',"AverageReward",...
    'StopTrainingValue',0,...
    'Verbose',false,...
    'Plots',"training-progress",...
    'SaveAgentDirectory','D:\ADVISOR_Exp\RL_Exp');
trainstats=train(agent,env,trainOpts);

However I did not get good results when testing the agent, the agent does not evolve with time and after several hundered episodes the reward still oscilates as shown in the figure.

I have tried different critic network archituctures (with state and action paths) and different agents (Q-learnign agent, and DDPG) with similar options but no luck. And also I have tried using different reward and I have tuned the reward function. What should I do to