Main Content

Code Generation For Object Detection Using YOLO v3 Deep Learning

This example shows how to generate CUDA® MEX for a you only look once (YOLO) v3 object detector with custom layers. YOLO v3 improves upon YOLO v2 by adding detection at multiple scales to help detect smaller objects. Moreover, the loss function used for training is separated into mean squared error for bounding box regression and binary cross-entropy for object classification to help improve detection accuracy. The YOLO v3 network used in this example was trained from the Object Detection Using YOLO v3 Deep Learning example in the Computer Vision Toolbox (TM). For more information, see Object Detection Using YOLO v3 Deep Learning (Computer Vision Toolbox).

Third-Party Prerequisites


  • CUDA enabled NVIDIA® GPU and compatible driver.


For non-MEX builds such as static, dynamic libraries or executables, this example has the following additional requirements.

Verify GPU Environment

To verify that the compilers and libraries for running this example are set up correctly, use the coder.checkGpuInstall (GPU Coder) function.

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;

YOLO v3 Network

The YOLO v3 network in this example is based on squeezenet, and uses the feature extraction network in SqueezeNet with the addition of two detection heads at the end. The second detection head is twice the size of the first detection head, so it is better able to detect small objects. Note that any number of detection heads of different sizes can be specified based on the size of the objects to be detected. The YOLO v3 network uses anchor boxes estimated using training data to have better initial priors corresponding to the type of data set and to help the network learn to predict the boxes accurately. For information about anchor boxes, see Anchor Boxes for Object Detection (Computer Vision Toolbox).

The YOLO v3 network in this example is illustrated in the following diagram.

Each detection head predicts the bounding box coordinates (x, y, width, height), object confidence, and class probabilities for the respective anchor box masks. Therefore, for each detection head, the number of output filters in the last convolution layer is the number of anchor box mask times the number of prediction elements per anchor box. The detection heads comprise the output layer of the network.

Pretrained YOLO v3 Network

This example uses the file containing the pretrained YOLO v3 network. The YOLO v3 network used in this example was trained using the steps described in Object Detection Using YOLO v3 Deep Learning (Computer Vision Toolbox). Download the file from the MathWorks website, then unzip the file.

fileName = matlab.internal.examples.downloadSupportFile('vision/data/','');
data = unzip(fileName);
matFile = data{1,1};
vehicleDetector = load(matFile);
net = vehicleDetector.detector.Network
net = 
  dlnetwork with properties:

         Layers: [75×1 nnet.cnn.layer.Layer]
    Connections: [84×2 table]
     Learnables: [66×3 table]
          State: [6×3 table]
     InputNames: {'data'}
    OutputNames: {'customOutputConv1'  'customOutputConv2'}
    Initialized: 1

Note: You can also use the pretrained detector network availabe through the Computer Vision Toolbox™ Model for YOLO v3 Object Detection support package.

To use this pretrained network, you must first install the Computer Vision Toolbox Model for YOLO v3 Object Detection from the Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

Then, save the network from the yolov3ObjectDetector object to a MAT-file and proceed. For example,

detector = yolov3ObjectDetector('darknet53-coco');
net = detector.Network;
matFile = 'pretrainedYOLOv3Detector.mat';

The yolov3Detect Entry-Point Function

The yolov3Detect entry-point function takes an image input and runs the detector on the image using the deep learning network saved in the yolov3SqueezeNetVehicleExample_21aSPKG.mat file. The function loads the network object from the yolov3SqueezeNetVehicleExample_21aSPKG.mat file into a persistent variable yolov3Obj and reuses the persistent object on subsequent detection calls.

function outImg = yolov3Detect(in,matFile)

%   Copyright 2021 The MathWorks, Inc.

persistent yolov3Obj;

if isempty(yolov3Obj)
    yolov3Obj = coder.loadDeepLearningNetwork(matFile);

% Call to detect method
[bboxes,~,labels] = yolov3Obj.detect(in,'Threshold',0.5);

% Convert categorical labels to cell array of charactor vectors
labels = cellstr(labels);

% Annotate detections in the image.
outImg = insertObjectAnnotation(in,'rectangle',bboxes,labels);

Generate CUDA MEX

To generate CUDA code for the entry-point function, create a GPU code configuration object for a MEX target and set the target language to C++. Use the coder.DeepLearningConfig (GPU Coder) function to create a CuDNN deep learning configuration object and assign it to the DeepLearningConfig property of the GPU code configuration object. Run the codegen command specifying an input size of 227-by-227-by-3. This value corresponds to the input layer size of YOLOv3.

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
cfg.GenerateReport = true;
inputArgs = {ones(227,227,3,'uint8'),coder.Constant(matFile)};

codegen -config cfg yolov3Detect -args inputArgs -report
Code generation successful: View report

To generate CUDA® code for TensorRT target create and use a TensorRT deep learning configuration object instead of the CuDNN configuration object. Similarly, to generate code for MKLDNN target, create a CPU code configuration object and use MKLDNN deep learning configuration object as its DeepLearningConfig property.

Run the Generated MEX

Set up the video file reader and read the input video. Create a video player to display the video and the output detections.

videoFile = 'highway_vehicles.mp4';
videoFreader = vision.VideoFileReader(videoFile,'VideoOutputDataType','uint8');
depVideoPlayer = vision.DeployableVideoPlayer('Size','Custom','CustomSize',[640 480]);

Read the video input frame-by-frame and detect the vehicles in the video using the detector.

cont = ~isDone(videoFreader);
while cont
    I = step(videoFreader);
    in = imresize(I,[227,227]);
    out = yolov3Detect_mex(in,matFile);
    step(depVideoPlayer, out);
    % Exit the loop if the video player figure window is closed
    cont = ~isDone(videoFreader) && isOpen(depVideoPlayer); 


1. Redmon, Joseph, and Ali Farhadi. “YOLOv3: An Incremental Improvement.” Preprint, submitted April 8, 2018.