When used in DDPG actor and critics networks, the 1×1 convolution does NOT behave like a fully connected layer (Even though it should). Why is that?

Question

Arman Ahmadian 2021년 8월 8일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/894757-when-used-in-ddpg-actor-and-critics-networks-the-1x1-convolution-does-not-behave-like-a-fully-conne

편집: Arman Ahmadian 2021년 11월 15일

Supposing (for some valid reason), my data is arranged along the third dimension. Say

x = rand(1,1,m);

I need to process this data using a deep learning network. For the sake of example, let us assume the network has just two layers. I input the data using an image input layer:

imageInputLayer([1,1,m], "Normalization", "none"));

I then use a 1 by 1 convolution layer

convolution2dLayer([1,1], n, "Stride", [1,1]);

This should behave like an

fully connected layer. But it doesn't! Probably because data is in the third dimension? In fact, even using a fully connected layer in this case doesn't behave like a fully connected layer! The is, the following layer gives unexpected resutls.

fullyConnectedLayer(n);

Only when the data is along the first or second dimension, the fully connected layer behaves properly. But in this case I can't use a convolution layer which defeats the purpuse.

Note: I need the first and second dimensions for other purposes... that's why I'm aranging my data along the third dimension in this example.

====================

Update: Here's a minimal working example that you can run and see for yourself (Just an example... not a real application).

It's a DDPG agent trying to learn to replicate the state. In other words, the agent should learn to output an action vector as close as possible to the observation vector (which is very trivial). In order to do so, I've set the reward to be the negative norm of the difference between action and state (or observation). That is:

Reward = -norm(Action(:) - obj.State(:));

This is the complete definition of the environment

classdef env < rl.env.MATLABEnvironment
    
    %% Properties
    properties
		State
    end
    
    %% Methods
    methods
        function obj = env(dim)
			% Initialize Observation settings
			ObservationInfo				= rlNumericSpec(dim);			   %
			ObservationInfo.LowerLimit	= -ones(ObservationInfo.Dimension);% 
			ObservationInfo.UpperLimit	= +ones(ObservationInfo.Dimension);% 
			% -------------------------------------------------------------
			
			% Initialize Action settings
			ActionInfo              = rlNumericSpec(dim);				   %
			ActionInfo.LowerLimit	= -ones(ActionInfo.Dimension);		   % 
			ActionInfo.UpperLimit	= +ones(ActionInfo.Dimension);		   % 
			% -------------------------------------------------------------
			
            obj	     = obj@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);	% This line implements built-in functions of RL env
        end
        function [Observation, Reward, IsDone, LoggedSignals] = step(obj, Action)
			Reward			= -norm(Action(:) - obj.State(:));
			Observation		= reset(obj);
			IsDone			= false;
			LoggedSignals	= [];
        end
        function InitialObservation = reset(obj)						   % Reset environment to initial state and output initial observation
			obsInfo				= getObservationInfo(obj);
			obj.State			= 2*rand(obsInfo.Dimension)-1;
			InitialObservation	= obj.State;
		end
    end
end

And here's a test function which defines two simple actor and critic networks and trains the network

function test
    k 		= 3;            % k can be set to 1, 2, or 3
    dim 	= ones(1,3);
    dim(k)	= 10;
    
    %%  Create Environment
    envObj	= env(dim);
    obsInfo = getObservationInfo(envObj);
    actInfo = getActionInfo(envObj);
    % -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    
    %%  Create Actor
    actorNetwork = [
    	imageInputLayer(dim,            'Name','observation', 'Normalization','none')
    	fullyConnectedLayer(100, 		'Name', 'ActorFC1');	reluLayer('Name',	'ActorRelu1')
    	fullyConnectedLayer(max(dim),   'Name', 'ActorFC3');	tanhLayer('Name',	'ActorTanh' )];
    
    actorOpts = rlRepresentationOptions('LearnRate', 1e-4, 'GradientThreshold', 1);
    actor     = rlDeterministicActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', {'observation'}, 'Action', {'ActorTanh'}, actorOpts);
    % -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    
    %% Create Critic
    statePath  = [
    	imageInputLayer(dim,    	'Name', 'observation', 		'Normalization', 'none')
        fullyConnectedLayer(100, 	'Name', 'CriticStateFC2')							];
    actionPath = [
    	imageInputLayer(dim,		'Name', 'action',			'Normalization',	'none')
        fullyConnectedLayer(100,	'Name', 'CriticActionFC1',	'BiasLearnRateFactor',0)];
    commonPath = [
        additionLayer(2,'Name','add'); reluLayer('Name','CriticCommonRelu')
        fullyConnectedLayer(1,'Name','CriticOutput')];
    
    criticNetwork = layerGraph();
    criticNetwork = addLayers(criticNetwork,statePath);
    criticNetwork = addLayers(criticNetwork,actionPath);
    criticNetwork = addLayers(criticNetwork,commonPath);
        
    criticNetwork = connectLayers(criticNetwork, 'CriticStateFC2', 'add/in1');
    criticNetwork = connectLayers(criticNetwork, 'CriticActionFC1', 'add/in2');
    
    criticOpts 	  = rlRepresentationOptions('LearnRate', 1e-3, 'GradientThreshold', 1);
    
    critic 		  = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'observation'},'Action',{'action'},criticOpts);
    % -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    
    
    %% Create Agent
    agentOpts = rlDDPGAgentOptions();
    agentOpts.NoiseOptions.Variance          = 0.1;
    agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
    
    agent = rlDDPGAgent(actor, critic, agentOpts);
    % -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    
    %% Training
    trainOpts   = rlTrainingOptions(...
        "MaxEpisodes",					1000,...
        "MaxStepsPerEpisode",			100,...
        "ScoreAveragingWindowLength",	10,...
        "Verbose",						true,...
        "Plots",						"training-progress",...
        "StopTrainingCriteria",			"AverageReward",...
        "StopTrainingValue",			-0.01,...
        "SaveAgentCriteria",			"EpisodeReward",...
        "SaveAgentValue",				-0.1);
        
    train(agent, envObj, trainOpts);                                       	   % Train the agent.
    
    % -------------------------------------------------------------------------
end

You can chose to put the random data along dimension 1, 2, or 3 by setting k to be one of the tree in the initial line of the test function. Since a fully connected layer is used, the restuls should not differ. But, here's a plot for

or 2