RL SAC agent structure

Question

Praveen Verma 2024년 11월 6일 0:23

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2164220-rl-sac-agent-structure

댓글: Praveen Verma 2024년 11월 6일 22:37

I’ve created an SAC agent, but I'm encountering the error below.

Error using rl.internal.validate.mapFunctionMeanStdOutput (line 10)

Deep neural network for continuous gaussian function must have 2 output layers, one for mean and one for standard deviation.

Error in rlContinuousGaussianActor (line 93)

model = rl.internal.validate.mapFunctionMeanStdOutput(model,nameValueArgs.ActionMeanOutputNames,nameValueArgs.ActionStandardDeviationOutputNames,"actor");

Error in RL_agent_1 (line 158)

actor1 = rlContinuousGaussianActor(actorNetwork1, obsInfo1, actInfo1, ...

I’ve also attached the code for my RL agent, and I’ve bolded the relevant part, which clearly shows that I already have two layers—one for the mean and one for the standard deviation.

% Create environment

codeenv = createOpfEnv();

% Retrieve observation and action specifications

obsInfo = getObservationInfo(env); % Observation info for all agents

actInfo = getActionInfo(env); % Action info for all agents

% Separate the observation and action information for each agent

numAgents = 3; % Example with 3 agents

% Separate observation and action info

obsInfo1 = obsInfo{1}; % Observation info for agent 1

obsInfo2 = obsInfo{2}; % Observation info for agent 2

obsInfo3 = obsInfo{3}; % Observation info for agent 3

actInfo1 = actInfo{1}; % Action info for agent 1

actInfo2 = actInfo{2}; % Action info for agent 2

actInfo3 = actInfo{3}; % Action info for agent 3

%% Define actor networks for each agent

% Define the actor network for Agent 1

actorNetwork1 = [

featureInputLayer(obsInfo1.Dimension(1), 'Normalization', 'none', 'Name', 'state1')

fullyConnectedLayer(64, 'Name', 'fc1_1')

reluLayer('Name', 'relu1_1')

fullyConnectedLayer(64, 'Name', 'fc2_1')

reluLayer('Name', 'relu2_1')

fullyConnectedLayer(64, 'Name', 'fc3_1')

reluLayer('Name', 'relu3_1')

fullyConnectedLayer(1, 'Name', 'mean1') % Output for the mean

fullyConnectedLayer(1, 'Name', 'std1') % Output for the standard deviation

];

% Define the actor network for Agent 2

actorNetwork2 = [

featureInputLayer(obsInfo2.Dimension(1), 'Normalization', 'none', 'Name', 'state2')

fullyConnectedLayer(64, 'Name', 'fc1_2')

reluLayer('Name', 'relu1_2')

fullyConnectedLayer(64, 'Name', 'fc2_2')

reluLayer('Name', 'relu2_2')

fullyConnectedLayer(64, 'Name', 'fc3_2')

reluLayer('Name', 'relu3_2')

fullyConnectedLayer(1, 'Name', 'mean2') % Output for the mean

fullyConnectedLayer(1, 'Name', 'std2') % Output for the standard deviation

];

% Define the actor network for Agent 3

actorNetwork3 = [

featureInputLayer(obsInfo3.Dimension(1), 'Normalization', 'none', 'Name', 'state3')

fullyConnectedLayer(64, 'Name', 'fc1_3')

reluLayer('Name', 'relu1_3')

fullyConnectedLayer(64, 'Name', 'fc2_3')

reluLayer('Name', 'relu2_3')

fullyConnectedLayer(64, 'Name', 'fc3_3')

reluLayer('Name', 'relu3_3')

fullyConnectedLayer(1, 'Name', 'mean3') % Output for the mean

fullyConnectedLayer(1, 'Name', 'std3') % Output for the standard deviation

];

% For each agent, we'll define a critic network that combines the state and action

statePath1 = [

featureInputLayer(obsInfo1.Dimension(1), 'Normalization', 'none', Name="state1")

fullyConnectedLayer(64, Name="state_fc1_1")

reluLayer(Name="state_relu1_1")

];

actionPath1 = [

featureInputLayer(actInfo1.Dimension(1), 'Normalization', 'none', Name="action1")

fullyConnectedLayer(64, Name="action_fc1_1")

reluLayer(Name="action_relu1_1")

];

commonPath1 = [

concatenationLayer(1, 2, Name="concat1")

fullyConnectedLayer(64, Name="common_fc1_1")

reluLayer(Name="common_relu1_1")

fullyConnectedLayer(64, Name="common_fc2_1")

reluLayer(Name="common_relu2_1")

fullyConnectedLayer(1, Name="value1")

];

statePath2 = [

featureInputLayer(obsInfo2.Dimension(1), 'Normalization', 'none', Name="state2")

fullyConnectedLayer(64, Name="state_fc2_2")

reluLayer(Name="state_relu2_2")

];

actionPath2 = [

featureInputLayer(actInfo2.Dimension(1), 'Normalization', 'none', Name="action2")

fullyConnectedLayer(64, Name="action_fc2_2")

reluLayer(Name="action_relu2_2")

];

commonPath2 = [

concatenationLayer(1, 2, Name="concat2")

fullyConnectedLayer(64, Name="common_fc1_2")

reluLayer(Name="common_relu1_2")

fullyConnectedLayer(64, Name="common_fc2_2")

reluLayer(Name="common_relu2_2")

fullyConnectedLayer(1, Name="value2")

];

statePath3 = [

featureInputLayer(obsInfo3.Dimension(1), 'Normalization', 'none', Name="state3")

fullyConnectedLayer(64, Name="state_fc3_3")

reluLayer(Name="state_relu3_3")

];

actionPath3 = [

featureInputLayer(actInfo3.Dimension(1), 'Normalization', 'none', Name="action3")

fullyConnectedLayer(64, Name="action_fc3_3")

reluLayer(Name="action_relu3_3")

];

commonPath3 = [

concatenationLayer(1, 2, Name="concat3")

fullyConnectedLayer(64, Name="common_fc1_3")

reluLayer(Name="common_relu1_3")

fullyConnectedLayer(64, Name="common_fc2_3")

reluLayer(Name="common_relu2_3")

fullyConnectedLayer(1, Name="value3")

];

%% Assemble critic networks for each agent

% Combine state and action paths

criticNetwork1 = layerGraph(statePath1);

criticNetwork1 = addLayers(criticNetwork1, actionPath1);

criticNetwork1 = addLayers(criticNetwork1, commonPath1);

criticNetwork1 = connectLayers(criticNetwork1, 'state_relu1_1', 'concat1/in1');

criticNetwork1 = connectLayers(criticNetwork1, 'action_relu1_1', 'concat1/in2');

criticNetwork2 = layerGraph(statePath2);

criticNetwork2 = addLayers(criticNetwork2, actionPath2);

criticNetwork2 = addLayers(criticNetwork2, commonPath2);

criticNetwork2 = connectLayers(criticNetwork2, 'state_relu2_2', 'concat2/in1');

criticNetwork2 = connectLayers(criticNetwork2, 'action_relu2_2', 'concat2/in2');

criticNetwork3 = layerGraph(statePath3);

criticNetwork3 = addLayers(criticNetwork3, actionPath3);

criticNetwork3 = addLayers(criticNetwork3, commonPath3);

criticNetwork3 = connectLayers(criticNetwork3, 'state_relu3_3', 'concat3/in1');

criticNetwork3 = connectLayers(criticNetwork3, 'action_relu3_3', 'concat3/in2');

%% Set options for the actor and critic

actorOptions = rlRepresentationOptions('Optimizer', 'adam', 'LearnRate', 1e-4, 'GradientThreshold', 1);

criticOptions = rlRepresentationOptions('Optimizer', 'adam', 'LearnRate', 1e-4, 'GradientThreshold', 1);

%% Create actor and critic representations for each agent

% Use continuous actor for each agent (as required by SAC)

actor1 = rlContinuousGaussianActor(actorNetwork1, obsInfo1, actInfo1, ...

'ActionMeanOutputNames', 'mean1', 'ActionStandardDeviationOutputNames', 'std1');

actor2 = rlContinuousGaussianActor(actorNetwork2, obsInfo2, actInfo2, ...

'ActionMeanOutputNames', 'mean2', 'ActionStandardDeviationOutputNames', 'std2');

actor3 = rlContinuousGaussianActor(actorNetwork3, obsInfo3, actInfo3, ...

'ActionMeanOutputNames', 'mean3', 'ActionStandardDeviationOutputNames', 'std3');

% Create critic representations for each agent

%% Create Q-value critics for each agent

critic1 = rlQValueRepresentation(criticNetwork1, obsInfo1, actInfo1, criticOptions);

critic2 = rlQValueRepresentation(criticNetwork2, obsInfo2, actInfo2, criticOptions);

critic3 = rlQValueRepresentation(criticNetwork3, obsInfo3, actInfo3, criticOptions);

%% Define the SAC agent for each agent

agentOptions = rlSACAgentOptions('SampleTime', 1, ...

'TargetSmoothFactor', 1e-3, ...

'TargetUpdateFrequency', 1, ...

'ExperienceBufferLength', 1e6);

agent1 = rlSACAgent(actor1, critic1, agentOptions);

agent2 = rlSACAgent(actor2, critic2, agentOptions);

agent3 = rlSACAgent(actor3, critic3, agentOptions);

%% Training options and training process

trainOpts = rlTrainingOptions(...

'MaxEpisodes', 500, ...

'MaxStepsPerEpisode', 100, ...

'ScoreAveragingWindowLength', 100, ...

'Verbose', true, ...

'Plots', 'training-progress');

%% Train the agents

train(agent1, env, trainOpts);

train(agent2, env, trainOpts);

train(agent3, env, trainOpts);

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Gayathri 2024년 11월 6일 4:48

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2164220-rl-sac-agent-structure#answer_1541530

MATLAB Online에서 열기

Hi @Praveen Verma,

To resolve the error “Deep neural network for continuous gaussian function must have 2 output layers, one for mean and one for standard deviation.” we will have to add two separate paths for mean and standard deviation separately. The error will not get resolved just by adding two output layers alone. Please refer to the below code for resolving the error.

inputLayer = featureInputLayer(obsInfo2.Dimension(1), 'Normalization', 'none', 'Name', 'state2');
meanPath = [
    fullyConnectedLayer(64, 'Name', 'fc1_mean')
    reluLayer('Name', 'relu1_mean')
    fullyConnectedLayer(64, 'Name', 'fc2_mean')
    reluLayer('Name', 'relu2_mean')
    fullyConnectedLayer(64, 'Name', 'fc3_mean')
    reluLayer('Name', 'relu3_mean')
    fullyConnectedLayer(1, 'Name', 'mean2')  % Output for the mean
];
% Define the standard deviation path
stdPath = [
    fullyConnectedLayer(64, 'Name', 'fc1_std')
    reluLayer('Name', 'relu1_std')
    fullyConnectedLayer(64, 'Name', 'fc2_std')
    reluLayer('Name', 'relu2_std')
    fullyConnectedLayer(64, 'Name', 'fc3_std')
    reluLayer('Name', 'relu3_std')
    fullyConnectedLayer(1, 'Name', 'std2')   % Output for the standard deviation
];
% Create the layer graph
actorNetwork2 = layerGraph(inputLayer);
actorNetwork2 = addLayers(actorNetwork2, meanPath);
actorNetwork2 = addLayers(actorNetwork2, stdPath);
% Connect the input layer to both the mean and std paths
actorNetwork2 = connectLayers(actorNetwork2, 'state_input', 'fc1_mean');
actorNetwork2 = connectLayers(actorNetwork2, 'state_input', 'fc1_std');

I have given this code for “actorNetwork2” as an example. Please make the same changes in “actorNetwork1” and “actorNetwork3” to resolve all the errors.

You can also connect two paths for mean and standard deviation in a different format if required.

By implementing the mentioned changes, the training will begin. Attaching a small snippet of the training window below.

For more information on “rlContinuousGaussianActor” please refer to the following link.

https://www.mathworks.com/help/releases/R2024a/reinforcement-learning/ref/rl.function.rlcontinuousgaussianactor.html

Hope you find this information helpful.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Praveen Verma 2024년 11월 6일 22:37

@Gayathri Thank you, it works.

댓글을 달려면 로그인하십시오.

RL SAC agent structure

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

RL SAC agent structure

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기