Pedestrian Detection

This example demonstrates code generation for pedestrian detection application that uses deep learning. Pedestrian detection is a key problem in computer vision, with several applications in the fields of autonomous driving, surveillance, robotics etc.


  • CUDA® enabled NVIDIA® GPU with compute capability 3.2 or higher.

  • NVIDIA CUDA toolkit and driver.

  • NVIDIA cuDNN library v7 or higher.

  • Computer Vision Toolbox™.

  • Deep Learning Toolbox™ to use a SeriesNetwork object.

  • Image Processing Toolbox™ for reading and displaying images.

  • GPU Coder™ for generating CUDA code.

  • GPU Coder Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.

  • Environment variables for the compilers and libraries. For more information, see Environment Variables.

Verify the GPU Environment

Use the coder.checkGpuInstall function and verify that the compilers and libraries needed for running this example are set up correctly.


About the Network

The pedestrian detection network was trained by using images of pedestrians and non-pedestrians. This network is trained in MATLAB® by using the trainPedNet.m script. A sliding window approach is used to crop patches from an image of size [64 32] as shown. Patch dimensions are obtained from heatmap which represents the distribution of pedestrians in the images in the data set. It indicates the presence of pedestrians at various scales and locations in the images. In this example, patches of pedestrians close to the camera are cropped and processed. Finally, Non-Maximal Suppression (NMS) is applied on the obtained patches to merge them together and detect complete pedestrians.

The pedestrian detection network contains 12 layers which include convolution, fully connected, and classification output layers.

ans = 

  12x1 Layer array with layers:

     1   'imageinput'    Image Input                   64x32x3 images with 'zerocenter' normalization
     2   'conv_1'        Convolution                   20 5x5x3 convolutions with stride [1  1] and padding [0  0  0  0]
     3   'relu_1'        ReLU                          ReLU
     4   'maxpool_1'     Max Pooling                   2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'crossnorm'     Cross Channel Normalization   cross channel normalization with 5 channels per element
     6   'conv_2'        Convolution                   20 5x5x20 convolutions with stride [1  1] and padding [0  0  0  0]
     7   'relu_2'        ReLU                          ReLU
     8   'maxpool_2'     Max Pooling                   2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     9   'fc_1'          Fully Connected               512 fully connected layer
    10   'fc_2'          Fully Connected               2 fully connected layer
    11   'softmax'       Softmax                       softmax
    12   'classoutput'   Classification Output         crossentropyex with classes 'NonPed' and 'Ped'

About the 'pedDetect_predict' Function

The pedDetect_predict.m function takes an image input and runs prediction on image using the deep learning network saved in PedNet.mat file. The function loads the network object from PedNet.mat into a persistent variable pednet. On subsequent calls to the function, the persistent object is reused for prediction.

function selectedBbox = pedDetect_predict(img)

% Copyright 2017 The MathWorks, Inc.

% function for predicting the pedestrians
% A persistent object pednet is used to load the series network object.
% At the first call to this function, the persistent object is constructed and
% setup. When the function is called subsequent times, the same object is reused 
% to call predict on inputs, thus avoiding reconstructing and reloading the
% network object.


persistent pednet;
if isempty(pednet) 
    pednet = coder.loadDeepLearningNetwork(coder.const('PedNet.mat'),'Pedestrian_Detection');

[imgHt , imgWd , ~] = size(img);
VrHt = [imgHt - 30 , imgHt]; % Two bands of vertical heights are considered

% patchHt and patchWd are obtained from heat maps (heat map here refers to
% pedestrians data represented in the form of a map with different
% colors. Different colors indicate presence of pedestrians at various
% scales).
patchHt = 300; 
patchWd = patchHt/3;

% PatchCount is used to estimate number of patches per image
PatchCount = ((imgWd - patchWd)/20) + 2;
maxPatchCount = PatchCount * 2; 
Itmp = zeros(64 , 32 , 3 , maxPatchCount);
ltMin = zeros(maxPatchCount);
lttop = zeros(maxPatchCount);

idx = 1; % To count number of image patches obtained from sliding window
cnt = 1; % To count number of patches predicted as pedestrians

bbox = zeros(maxPatchCount , 4);
value = zeros(maxPatchCount , 1);

%% Region proposal for two bands
for VrStride = 1 : 2
    for HrStride = 1 : 20 : (imgWd - 60)  % Obtain horizontal patches with stride 20.
        ltMin(idx) = HrStride + 1;
        rtMax = min(ltMin(idx) + patchWd , imgWd);
        lttop(idx) = (VrHt(VrStride) - patchHt);
        It = img(lttop(idx): VrHt(VrStride) , ltMin(idx) : rtMax , :);
        Itmp(:,:,:,idx) = imresize(It,[64,32]);
        idx = idx + 1;

for j = 1 : size (Itmp,4)
    score = pednet.predict(Itmp(:,:,:,j)); % Classify ROI
    % accuracy of detected box should be greater than 0.90
    if (score(1,2) > 0.80)
        bbox(cnt,:) = [ltMin(j),lttop(j), patchWd , patchHt];
        value(cnt,:) = score(1,2);
        cnt = cnt + 1;

%% NMS to merge similar boxes
if ~isempty(bbox)
    [selectedBbox,~] = selectStrongestBbox(bbox(1:cnt-1,:),...

Generate CUDA MEX for the pedDetect_predict Function

Create a GPU Configuration object for MEX target and set the target language to C++. Use the coder.DeepLearningConfig function to create a CuDNN deep learning configuration object and assign it to the DeepLearningConfig property of the GPU code configuration object. To generate CUDA MEX, use the codegen command and specify the size of the input image size. This value corresponds to the input layer size of pedestrian detection network.

% Load an input image.
im = imread('test.jpg');
im = imresize(im,[480,640]);

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
codegen -config cfg pedDetect_predict -args {im} -report
Code generation successful: To view the report, open('codegen/mex/pedDetect_predict/html/report.mldatx').

Run the Generated MEX


Call pednet predict on the input image.

ped_bboxes = pedDetect_predict_mex(im);

Display the final predictions.

outputImage = insertShape(im,'Rectangle',ped_bboxes,'LineWidth',3);

Classification on a Video

The included example file pedDetect_predict.m grabs frames from a video, invokes prediction, and displays the classification results on each of the captured video frames.

  v = VideoReader('LiveData.avi');
  fps = 0;
  while hasFrame(v)
     % Read frames from video
     im = readFrame(v);
     im = imresize(im,[480,640]);
     % Call MEX function for pednet prediction
     ped_bboxes = pedDetect_predict_mex(im);
     newt = toc;
     % fps
     fps = .9*fps + .1*(1/newt);
     % display
     outputImage = insertShape(im,'Rectangle',ped_bboxes,'LineWidth',3);

Use clear mex to remove the static network object loaded in memory.

clear mex;