When used in DDPG actor and critics networks, the 1×1 convolution does NOT behave like a fully connected layer (Even though it should). Why is that?
    조회 수: 3 (최근 30일)
  
       이전 댓글 표시
    
Supposing (for some valid reason), my data is arranged along the third dimension. Say
x = rand(1,1,m);
I need to process this data using a deep learning network. For the sake of example, let us assume the network has just two layers. I input the data using an image input layer:
imageInputLayer([1,1,m], "Normalization", "none"));
I then use a 1 by 1 convolution layer
convolution2dLayer([1,1], n, "Stride", [1,1]);
This should behave like an  fully connected layer. But it doesn't! Probably because data is in the third dimension? In fact, even using a fully connected layer in this case doesn't behave like a fully connected layer! The is, the following layer gives unexpected resutls.
 fully connected layer. But it doesn't! Probably because data is in the third dimension? In fact, even using a fully connected layer in this case doesn't behave like a fully connected layer! The is, the following layer gives unexpected resutls.
 fully connected layer. But it doesn't! Probably because data is in the third dimension? In fact, even using a fully connected layer in this case doesn't behave like a fully connected layer! The is, the following layer gives unexpected resutls.
 fully connected layer. But it doesn't! Probably because data is in the third dimension? In fact, even using a fully connected layer in this case doesn't behave like a fully connected layer! The is, the following layer gives unexpected resutls.fullyConnectedLayer(n);
Only when the data is along the first or second dimension, the fully connected layer behaves properly. But in this case I can't use a convolution layer which defeats the purpuse.
Note: I need the first and second dimensions for other purposes... that's why I'm aranging my data along the third dimension in this example.
====================
Update: Here's a minimal working example that you can run and see for yourself (Just an example... not a real application).
It's a DDPG agent trying to learn to replicate the state. In other words, the agent should learn to output an action vector as close as possible to the observation vector (which is very trivial). In order to do so, I've set the reward to be the negative norm of the difference between action and state (or observation). That is:
Reward	= -norm(Action(:) - obj.State(:));
This is the complete definition of the environment
classdef env < rl.env.MATLABEnvironment
    %% Properties
    properties
		State
    end
    %% Methods
    methods
        function obj = env(dim)
			% Initialize Observation settings
			ObservationInfo				= rlNumericSpec(dim);			   %
			ObservationInfo.LowerLimit	= -ones(ObservationInfo.Dimension);% 
			ObservationInfo.UpperLimit	= +ones(ObservationInfo.Dimension);% 
			% -------------------------------------------------------------
			% Initialize Action settings
			ActionInfo              = rlNumericSpec(dim);				   %
			ActionInfo.LowerLimit	= -ones(ActionInfo.Dimension);		   % 
			ActionInfo.UpperLimit	= +ones(ActionInfo.Dimension);		   % 
			% -------------------------------------------------------------
            obj	     = obj@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);	% This line implements built-in functions of RL env
        end
        function [Observation, Reward, IsDone, LoggedSignals] = step(obj, Action)
			Reward			= -norm(Action(:) - obj.State(:));
			Observation		= reset(obj);
			IsDone			= false;
			LoggedSignals	= [];
        end
        function InitialObservation = reset(obj)						   % Reset environment to initial state and output initial observation
			obsInfo				= getObservationInfo(obj);
			obj.State			= 2*rand(obsInfo.Dimension)-1;
			InitialObservation	= obj.State;
		end
    end
end
And here's a test function which defines two simple actor and critic networks and trains the network
function test
    k 		= 3;            % k can be set to 1, 2, or 3
    dim 	= ones(1,3);
    dim(k)	= 10;
    %%  Create Environment
    envObj	= env(dim);
    obsInfo = getObservationInfo(envObj);
    actInfo = getActionInfo(envObj);
    % -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    %%  Create Actor
    actorNetwork = [
    	imageInputLayer(dim,            'Name','observation', 'Normalization','none')
    	fullyConnectedLayer(100, 		'Name', 'ActorFC1');	reluLayer('Name',	'ActorRelu1')
    	fullyConnectedLayer(max(dim),   'Name', 'ActorFC3');	tanhLayer('Name',	'ActorTanh' )];
    actorOpts = rlRepresentationOptions('LearnRate', 1e-4, 'GradientThreshold', 1);
    actor     = rlDeterministicActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', {'observation'}, 'Action', {'ActorTanh'}, actorOpts);
    % -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    %% Create Critic
    statePath  = [
    	imageInputLayer(dim,    	'Name', 'observation', 		'Normalization', 'none')
        fullyConnectedLayer(100, 	'Name', 'CriticStateFC2')							];
    actionPath = [
    	imageInputLayer(dim,		'Name', 'action',			'Normalization',	'none')
        fullyConnectedLayer(100,	'Name', 'CriticActionFC1',	'BiasLearnRateFactor',0)];
    commonPath = [
        additionLayer(2,'Name','add'); reluLayer('Name','CriticCommonRelu')
        fullyConnectedLayer(1,'Name','CriticOutput')];
    criticNetwork = layerGraph();
    criticNetwork = addLayers(criticNetwork,statePath);
    criticNetwork = addLayers(criticNetwork,actionPath);
    criticNetwork = addLayers(criticNetwork,commonPath);
    criticNetwork = connectLayers(criticNetwork, 'CriticStateFC2', 'add/in1');
    criticNetwork = connectLayers(criticNetwork, 'CriticActionFC1', 'add/in2');
    criticOpts 	  = rlRepresentationOptions('LearnRate', 1e-3, 'GradientThreshold', 1);
    critic 		  = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'observation'},'Action',{'action'},criticOpts);
    % -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    %% Create Agent
    agentOpts = rlDDPGAgentOptions();
    agentOpts.NoiseOptions.Variance          = 0.1;
    agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
    agent = rlDDPGAgent(actor, critic, agentOpts);
    % -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
    %% Training
    trainOpts   = rlTrainingOptions(...
        "MaxEpisodes",					1000,...
        "MaxStepsPerEpisode",			100,...
        "ScoreAveragingWindowLength",	10,...
        "Verbose",						true,...
        "Plots",						"training-progress",...
        "StopTrainingCriteria",			"AverageReward",...
        "StopTrainingValue",			-0.01,...
        "SaveAgentCriteria",			"EpisodeReward",...
        "SaveAgentValue",				-0.1);
    train(agent, envObj, trainOpts);                                       	   % Train the agent.
    % -------------------------------------------------------------------------
end
You can chose to put the random data along dimension 1, 2, or 3 by setting k to be one of the tree in the initial line of the test function. Since a fully connected layer is used, the restuls should not differ. But, here's a plot for  or 2
 or 2
 or 2
 or 2
As can be seen, in this case, the agent is sucessfuly reducing the cost (Increasing the reward).
Now, let  and see the result
 and see the result
 and see the result
 and see the result
As can be seen, in this case, the agent is simply meandering around the initial value.
This example uses fully connected layers. But, when  , you can replace the fully connected layers with
, you can replace the fully connected layers with  convolutional layers and you would get the same resutls. For example
 convolutional layers and you would get the same resutls. For example
 , you can replace the fully connected layers with
, you can replace the fully connected layers with  convolutional layers and you would get the same resutls. For example
 convolutional layers and you would get the same resutls. For examplefullyConnectedLayer(100, 		'Name', 'ActorFC1');
Would be replaced by
convolution2dLayer([1,1], 100,   'Name', 'ActorFC1');
====================
댓글 수: 2
답변 (0개)
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

