This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Logo Recognition Network

This example demonstrates code generation for a logo classification application that uses deep learning. It uses the codegen command to generate a MEX function that runs prediction on a SeriesNetwork object called LogoNet.

Prerequisites

  • CUDA® enabled NVIDIA® GPU with compute capability 3.2 or higher.

  • NVIDIA CUDA toolkit and driver.

  • NVIDIA cuDNN library v7 or higher.

  • Environment variables for the compilers and libraries. For more information, see Environment Variables.

  • Deep Learning Toolbox™ to use a SeriesNetwork object.

  • Image Processing Toolbox™ for reading and displaying images.

  • GPU Coder™ for generating CUDA code.

  • GPU Coder Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.

Verify the GPU Environment

Use the coder.checkGpuInstall function and verify that the compilers and libraries needed for running this example are set up correctly.

coder.checkGpuInstall('gpu','codegen','cudnn','quiet');

About the Network

Logos are used to assist users in brand identification and recognition. Logos find their applications under various domains such as advertising, document community, and others. The logo recognition network (logonet) was developed in MATLAB® and can recognize 32 logos under various lighting conditions and camera motions. The architecture of this network is similar to AlexNet. Since this network focuses only on recognition, it can be used in applications where localization is not required.

Training the Network

Network is trained in MATLAB and the training data used for logo classification contains around 200 images for each logo. Since the number of images used for training the network is small, data augmentation is used to increase the number of training samples. Four types of data augmentation are used: Contrast normalization, gaussian blur, random flipping, and shearing. This data augmentation helps in recognizing logos in images captured at different lighting conditions and camera motions. Input size for logonet is [227 227 3]. Standard SGDM is used for training with a learning rate of 0.0001 for 40 epochs with a mini-batch size of 45. The trainLogonet.m script demonstrates the data augmentation on a sample image, architecture of the logonet and training options used for training.

Get the Pretrained SeriesNetwork

Download the logonet network and save to LogoNet.mat if it does not exist.

getLogonet();

The saved network contains 22 layers including convolution, fully connected, and the classification output layers.

load('LogoNet.mat');
convnet.Layers
ans = 

  22x1 Layer array with layers:

     1   'imageinput'    Image Input             227x227x3 images with 'zerocenter' normalization and 'randfliplr' augmentations
     2   'conv_1'        Convolution             96 5x5x3 convolutions with stride [1  1] and padding [0  0  0  0]
     3   'relu_1'        ReLU                    ReLU
     4   'maxpool_1'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'conv_2'        Convolution             128 3x3x96 convolutions with stride [1  1] and padding [0  0  0  0]
     6   'relu_2'        ReLU                    ReLU
     7   'maxpool_2'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0  0  0]
     8   'conv_3'        Convolution             384 3x3x128 convolutions with stride [1  1] and padding [0  0  0  0]
     9   'relu_3'        ReLU                    ReLU
    10   'maxpool_3'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0  0  0]
    11   'conv_4'        Convolution             128 3x3x384 convolutions with stride [2  2] and padding [0  0  0  0]
    12   'relu_4'        ReLU                    ReLU
    13   'maxpool_4'     Max Pooling             3x3 max pooling with stride [2  2] and padding [0  0  0  0]
    14   'fc_1'          Fully Connected         2048 fully connected layer
    15   'relu_5'        ReLU                    ReLU
    16   'dropout_1'     Dropout                 50% dropout
    17   'fc_2'          Fully Connected         2048 fully connected layer
    18   'relu_6'        ReLU                    ReLU
    19   'dropout_2'     Dropout                 50% dropout
    20   'fc_3'          Fully Connected         32 fully connected layer
    21   'softmax'       Softmax                 softmax
    22   'classoutput'   Classification Output   crossentropyex with 'adidas' and 31 other classes

About the 'logonet_predict' Function

The logonet_predict.m function takes an image input and runs prediction on the image using the deep learning network saved in LogoNet.mat file. The function loads the network object from LogoNet.mat into a persistent variable logonet. On subsequent calls to the function, the persistent object is reused for prediction.

type('logonet_predict.m')
function out = logonet_predict(in)
%#codegen

% Copyright 2017 The MathWorks, Inc.

% function for predicting the logos
% A persistent object logonet is used to load the series network object.
% At the first call to this function, the persistent object is constructed and
% setup. When the function is called subsequent times, the same object is reused 
% to call predict on inputs, thus avoiding reconstructing and reloading the
% network object.

persistent logonet;

if isempty(logonet)
    
    logonet = coder.loadDeepLearningNetwork('LogoNet.mat','logonet');
end

out = logonet.predict(in);

end

Generate CUDA MEX for 'logonet_predict' Function

Create a GPU configuration object for MEX target and set the target language to C++. Use the coder.DeepLearningConfig function to create a CuDNN deep learning configuration object and assign it to the DeepLearningConfig property of the GPU code configuration object. To generate CUDA MEX, use the codegen command and specify the input to be of size [227,227,3]. This value corresponds to the input layer size of the logonet network.

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
codegen -config cfg logonet_predict -args {ones(227,227,3,'uint8')} -report
Code generation successful: To view the report, open('codegen/mex/logonet_predict/html/report.mldatx').

Run the Generated MEX

Load an input image.

im = imread('test.png');
imshow(im);

Call logonet predict on the input image.

im = imresize(im, [227,227]);
predict_scores = logonet_predict_mex(im);

Map the top five prediction scores to words in the synset dictionary (logos).

synsetOut = {'adidas', 'aldi', 'apple', 'becks', 'bmw', 'carlsberg', ...
    'chimay', 'cocacola', 'corona', 'dhl', 'erdinger', 'esso', 'fedex',...
    'ferrari', 'ford', 'fosters', 'google', 'guinness', 'heineken', 'hp',...
    'milka', 'nvidia', 'paulaner', 'pepsi', 'rittersport', 'shell', 'singha', 'starbucks', 'stellaartois', 'texaco', 'tsingtao', 'ups'};

[val,indx] = sort(predict_scores, 'descend');
scores = val(1:5)*100;
top5labels = synsetOut(indx(1:5));

Display the top five classification labels.

outputImage = zeros(227,400,3, 'uint8');
for k = 1:3
    outputImage(:,174:end,k) = im(:,:,k);
end

scol = 1;
srow = 20;

for k = 1:5
    outputImage = insertText(outputImage, [scol, srow], [top5labels{k},' ',num2str(scores(k), '%2.2f'),'%'], 'TextColor', 'w','FontSize',15, 'BoxColor', 'black');
    srow = srow + 20;
end

 imshow(outputImage);

Use clear mex to remove the static network object loaded in memory.

clear mex;