Main Content

segnetLayers

(To be removed) Create SegNet layer graph for semantic segmentation

segnetLayers will be removed in a future release. Create a SegNet network using a dlnetwork (Deep Learning Toolbox) object instead. For more information, see Compatibility Considerations.

Description

example

lgraph = segnetLayers(imageSize,numClasses,model) returns SegNet layers, lgraph, that is preinitialized with layers and weights from a pretrained model.

SegNet is a convolutional neural network for semantic image segmentation. The network uses a pixelClassificationLayer to predict the categorical label for every pixel in an input image.

Use segnetLayers to create the network architecture for SegNet. You must train the network using the Deep Learning Toolbox™ function trainNetwork (Deep Learning Toolbox).

lgraph = segnetLayers(imageSize,numClasses,encoderDepth) returns uninitialized SegNet layers configured using the specified encoder depth.

lgraph = segnetLayers(imageSize,numClasses,encoderDepth,Name,Value) returns a SegNet layer with additional options specified by one or more Name,Value pair arguments.

Examples

collapse all

Create SegNet layers with an encoder/decoder depth of 4.

imageSize = [480 640 3];
numClasses = 5;
encoderDepth = 4;
lgraph = segnetLayers(imageSize,numClasses,encoderDepth)
lgraph = 
  LayerGraph with properties:

     InputNames: {'inputImage'}
    OutputNames: {'pixelLabels'}
         Layers: [59x1 nnet.cnn.layer.Layer]
    Connections: [66x2 table]

Display the network.

figure
plot(lgraph)

Load training images and pixel labels.

dataSetDir = fullfile(toolboxdir('vision'),'visiondata','triangleImages');
imageDir = fullfile(dataSetDir,'trainingImages');
labelDir = fullfile(dataSetDir,'trainingLabels');

Create an image datastore holding the training images.

imds = imageDatastore(imageDir);

Define the class names and their associated label IDs.

classNames = ["triangle", "background"];
labelIDs   = [255 0];

Create a pixel label datastore holding the ground truth pixel labels for the training images.

pxds = pixelLabelDatastore(labelDir,classNames,labelIDs);

Combine image and pixel label data for training a semantic segmentation network.

ds = combine(imds,pxds);

Create SegNet layers.

imageSize = [32 32];
numClasses = 2;
lgraph = segnetLayers(imageSize,numClasses,2)
lgraph = 
  LayerGraph with properties:

     InputNames: {'inputImage'}
    OutputNames: {'pixelLabels'}
         Layers: [31x1 nnet.cnn.layer.Layer]
    Connections: [34x2 table]

Set up training options.

options = trainingOptions('sgdm','InitialLearnRate',1e-3, ...
      'MaxEpochs',20,'VerboseFrequency',10);

Train the network.

net = trainNetwork(ds,lgraph,options)
Training on single CPU.
Initializing input data normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:03 |       39.75% |       0.7658 |          0.0010 |
|      10 |          10 |       00:00:25 |       49.98% |       0.7388 |          0.0010 |
|      20 |          20 |       00:00:49 |       66.39% |       0.6910 |          0.0010 |
|========================================================================================|
Training finished: Max epochs completed.


net = 
  DAGNetwork with properties:

         Layers: [31x1 nnet.cnn.layer.Layer]
    Connections: [34x2 table]
     InputNames: {'inputImage'}
    OutputNames: {'pixelLabels'}

Display the network.

plot(lgraph)

Input Arguments

collapse all

Network input image size, specified as a:

  • 2-element vector in the format [height, width].

  • 3-element vector in the format [height, width, depth]. depth is the number of image channels. Set depth to 3 for RGB images, 1 for grayscale images, or to the number of channels for multispectral and hyperspectral images.

Number of classes in the semantic segmentation, specified as an integer greater than 1.

Pretrained network model, specified as 'vgg16' or 'vgg19'. These models have an encoder depth of 5. When you use a 'vgg16' model, you must specify RGB inputs. You can convert grayscale images to RGB using the im2gray function.

Encoder depth, specified as a positive integer.

SegNet is composed of an encoder and corresponding decoder subnetwork. The depth of these networks determines the number of times the input image is downsampled or upsampled as it is processed. The encoder network downsamples the input image by a factor of 2D, where D is the value of encoderDepth. The decoder network upsamples the encoder network output by a factor of 2D.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'NumConvolutionLayers',1

Number of convolutional layers in each encoder and decoder section, specified as a positive integer or vector of positive integers.

NumConvolutionLayersDescription
scalarThe same number of layers is used for all encoder and decoder sections.
vectorThe kth element of NumConvolutionLayers is the number of convolution layers in the kth encoder section and corresponding decoder section. Typical values are in the range [1, 3].

Number of output channels for each section in the SegNet encoder network, specified as a positive integer or vector of positive integers. segnetLayers sets the number of output channels in the decoder to match the corresponding encoder section.

NumOutputChannelsDescription
scalarThe same number of output channels is used for all encoder and decoder sections.
vectorThe kth element of NumOutputChannels is the number of output channels of the kth encoder section and corresponding decoder section.

Convolutional layer filter size, specified as a positive odd integer or a 2-element row vector of positive odd integers. Typical values are in the range [3, 7].

FilterSizeDescription
scalarThe filter is square.
2-element row vector

The filter has the size [height width].

Output Arguments

collapse all

Layers that represent the SegNet network architecture, returned as a layerGraph (Deep Learning Toolbox) object.

Tips

  • The sections within the SegNet encoder and decoder subnetworks are made up of convolutional, batch normalization, and ReLU layers.

  • All convolutional layers are configured such that the bias term is fixed to zero.

  • Convolution layer weights in the encoder and decoder subnetworks are initialized using the 'MSRA' weight initialization method. For 'vgg16' or 'vgg19' models, only the decoder subnetwork is initialized using MSRA.[1]

  • Networks produced by segnetLayers support GPU code generation for deep learning once they are trained with trainNetwork (Deep Learning Toolbox). See Code Generation (Deep Learning Toolbox) for details and examples.

References

[1] He, K., X. Zhang, S. Ren, and J. Sun. "Delving Deep Into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034.

[2] Badrinarayanan, V., A. Kendall, and R. Cipolla. "Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." arXiv. Preprint arXiv: 1511.0051, 2015.

Extended Capabilities

Version History

Introduced in R2017b

collapse all

R2024a: segnetLayers will be removed

The segnetLayers function will be removed in a future release. To update your code, create a dlnetwork (Deep Learning Toolbox) instead. You can use functions such as addLayers (Deep Learning Toolbox) and connectLayers (Deep Learning Toolbox) to build the network.

Do not include output layers in the network. Instead, define a loss function. Here are some sample loss functions appropriate for pixel classification:

function loss = modelLoss(Y,T)
  z = generalizedDice(Y,T); 
  loss = 1 - mean(z,"all"); 
end 

function loss = modelLoss(Y,T) 
  mask = ~isnan(T);
  targets(isnan(T)) = 0;
  loss = crossentropy(Y,T,Mask=mask); 
end

Specify the loss function when you train the network using the trainnet (Deep Learning Toolbox) function. For example, this code trains a dlnetwork network called net using the training data images and the loss function modelLoss.

netTrained = trainnet(images,net,@modelLoss,options); 

See Also

(Deep Learning Toolbox) | (Deep Learning Toolbox) | |