How should I fix the error 'too many outputs argument'?

Question

ryunosuke tazawa 2021년 8월 6일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/893357-how-should-i-fix-the-error-too-many-outputs-argument

I made the cord using by reinforcement learning tool.

train_agent
Error: reset
There are too many output arguments.
Error: rl.env.MATLABEnvironment / simLoop (line 235)
                observation = reset (env);
Error: rl.env.MATLABEnvironment / simWithPolicyImpl (line 106)
                    [expcell {simCount}, epinfo, siminfos {simCount}] = simLoop (env, policy, opts, simCount, usePCT);
Error: rl.env.AbstractEnv / simWithPolicy (line 83)
            [experiences, varargout {1: (nargout-1)}] = simWithPolicyImpl (this, policy, opts, varargin {:});
Error: rl.task.SeriesTrainTask / runImpl (line 33)
            [varargout {1}, varargout {2}] = simWithPolicy (this.Env, this.Agent, simOpts);
Error: rl.task.Task / run (line 21)
            [varargout {1: nargout}] = runImpl (this);
Error: rl.task.TaskSpec / internal_run (line 166)
            [varargout {1: nargout}] = run (task);
Error: rl.task.TaskSpec / runDirect (line 170)
            [this.Outputs {1: getNumOutputs (this)}] = internal_run (this);
Error: rl.task.TaskSpec / runScalarTask (line 194)
                runDirect (this);
Error: rl.task.TaskSpec / run (line 69)
                runScalarTask (task);
Error: rl.train.SeriesTrainer / run (line 24)
            run (series taskspec);
Error: rl.train.TrainingManager / train (line 424)
            run (trainer);
Error: rl.train.TrainingManager / run (line 215)
            train (this);
Error: rl.agent.AbstractAgent / train (line 77)
    TrainingStatistics = run (trainMgr);
Error: train_agent (line 90)
trainingStats = train (agent, env, trainingOptions);

But the above error happed.

How should I fix it and, please teach me the way to check thet outputs argument and its number.

% DDPG エージェントのトレーニング
% 環境の設定
env = Environment;
obsInfo = env.getObservationInfo;
actInfo = env.getActionInfo;
numObs = obsInfo.Dimension(1);              % 2
numAct = numel(actInfo);                    % 1           
% CRITIC
statePath =[
    featureInputLayer(numObs, 'Normalization','none','Name','observation')
    fullyConnectedLayer(128, 'Name','CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(200,'Name','CriticStateFC2')];
actionPath = [
    featureInputLayer(numAct,'Normalization','none','Name','action')
    fullyConnectedLayer(200,'Name','CriticActionFC1','BiasLearnRateFactor', 0)];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'observation'},'Action',{'action'},criticOptions);
% ACTOR
actorNetwork = [
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(128,'Name','ActorFC1')
    reluLayer('Name','ActorRelu1')
    fullyConnectedLayer(200,'Name','ActorFC2')
    reluLayer('Name','ActorRelu2')
    fullyConnectedLayer(1,'Name','ActorFC3')
    tanhLayer('Name','ActorTanh1')
    scalingLayer('Name','ActorScaling','Scale',max(actInfo.UpperLimit))];
actorOptions = rlRepresentationOptions('LearnRate',5e-04,'GradientThreshold',1);
actor= rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'observation'},'Action',{'ActorScaling'},actorOptions);
% エージェントオプション
agentOptions = rlDDPGAgentOptions(...
    'SampleTime',env.Ts,...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e6,...
    'MiniBatchSize',128);
% ノイズ
agentOptions.NoiseOptions.Variance = 0.4;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOptions);
% トレーニングオプション
maxepisodes = 20000;
maxsteps = 1e8;
trainingOptions = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'Verbose',false,...
    'Plots','training-progress',...
    'StopOnError','on',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',Inf,...
    'ScoreAveragingWindowLength',10);
% 描画の環境
%plot(env);
% トレーニングエージェント
trainingStats = train(agent,env,trainingOptions);           %%  ←  the error happend here.
% シミュレーション　エージェントのトレーニング
simOptions = rlSimulationOptions('MaxSteps',maxsteps);
experience = sim(env,agent,simOptions);