How to change activation function for fully connected layer in convolutional neural network?

조회 수: 25 (최근 30일)
I'm in the process of implementing a wavelet neural network (WNN) using the Series Network class of the neural networking toolbox v7. While executing a simple network line-by-line, I can clearly see where the fully connected layer multiplies the inputs by the appropriate weights and adds the bias, however as best I can tell there are no additional calculations performed for the activations of the fully connected layer. It was my general understanding that standard perceptrons always have an activation/transfer function and was fully expecting to see the familiar sigmoid. However, it appears that the fully connected layer, as implemented here, assumes the identity operation as the transfer function (or, equivalently, no transfer function at all).
1) Do fully connected layers use an activation function, or are the outputs simply the weighted sums of the inputs with the addition of the bias? My initial assumption is no since I see activations greater than +1 (see example code at bottom)
2) If an activation function is used, does anyone have any suggestions where I might find and/or alter the source? I have examined the FullyConnected class and definition files and the FullyConnectedGPU(HOST)Strategy, the latter of which has the actual multiplication by weight and addition of bias.
3) If I want to use a custom activation function (in this case a wavelet), is it safe for me to simply apply said transfer function following the weighting and addition of bias? For example, if I wanted to modify a FullyConnectedLayer to have a tanh activation function, for the forward pass could I simply alter the forward method as follows? (obviously changes to the backward pass and gradient determination would also be required for the full implementation):
classdef FullyConnectedGPUStrategy < nnet.internal.cnn.layer.util.ExecutionStrategy
...
function [Z, memory] = forward(~, X, weights, bias)
Z = iForwardConvolveOrMultiply(X, weights);
Z = Z + bias;
Z = tanh(z); %addition of activation function
memory = [];
end
Example code to illustrate problem:
%Generate training data
[XTrain, YTrain] = digitTrain4DArrayData;
%Define layers
layers = [ ...
imageInputLayer([28 28 1])
fullyConnectedLayer(10)
softmaxLayer()
classificationLayer()];
%Train network using stochastic gradient descent with momentum
options = trainingOptions('sgdm');
net = trainNetwork(XTrain, YTrain, layers, options);
%View activations of fully connected layer
%Note: When testing this I see activations greater than +1 and
%less than 0, so it can't be using tanh or sigmoid
activations(net,XTrain(:,:,:,1),2)
Note: The reason I chose to use the Series Network class used for CNNs as opposed to the generic Neural Network class is because the output of the WNN will need to act as the input to a CNN which will then be trained together as one unit.
  댓글 수: 1
Greg Heath
Greg Heath 2018년 8월 18일
Before reading your question, let me state:
1. I am an engineer, not a mathematician. So, my following statements may not be as precise as some would like. However, I believe it should be perfectly clear what I am stating:
The STANDARD UNIVERSAL APPROXIMATOR single hidden layer regression net has
1. A nonlinear hidden layer transfer function
2. A LINEAR output layer transfer function
I'm stating this because it is obvious that some believe that, for a universal approximator, the standard output transfer function has to be nonlinear.
Of course there are additional conditions on finiteness, etc which I have omitted, but I think I have made my point.
Hope this Helps,
Greg

댓글을 달려면 로그인하십시오.

채택된 답변

Joss Knight
Joss Knight 2017년 6월 28일
Activations are added as a separate layer, and in R2017a there is only the RelULayer (see reluLayer).
Custom layers have not been introduced yet, so you'd have to be hacking or masking the toolbox files, but that's fine. You could take a copy of the RelULayer classes and modify them, or just edit your MATLAB install directly if you think that's safe.
  댓글 수: 10
Maxime Bezanilla
Maxime Bezanilla 2019년 3월 14일
@wenyi Thank you for your work. It however has a few mistakes in it.
I know that the topic is old but I am sure this can help some people, so I post my code for the sigmoid layer based on the one of @wenyi and @Balakrishnan_Rajan. There was also a mistake with the "~".
classdef sigmoidLayer < nnet.layer.Layer
methods
function layer = sigmoidLayer(name)
% Set layer name
if nargin == 2
layer.Name = name;
end
% Set layer description
layer.Description = 'sigmoidLayer';
end
function Z = predict(layer,X)
% Forward input data through the layer and output the result
Z = exp(X)./(exp(X)+1);
end
function dLdX = backward(layer, X ,Z,dLdZ,memory)
% Backward propagate the derivative of the loss function through
% the layer
dLdX = Z.*(1-Z) .* dLdZ;
end
end
end
This is accepted by checkLayer.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Build Deep Neural Networks에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by