Object detection is the process of finding and classifying objects in an image. One deep learning approach, regions with convolutional neural networks (R-CNN), combines rectangular region proposals with convolutional neural network features. R-CNN is a two-stage detection algorithm. The first stage identifies a subset of regions in an image that might contain an object. The second stage classifies the object in each region.
Applications for R-CNN object detectors include:
Smart surveillance systems
Computer Vision Toolbox™ provides object detectors for the R-CNN, Fast R-CNN, and Faster R-CNN algorithms.
Models for object detection using regions with CNNs are based on the following three processes:
Find regions in the image that might contain an object. These regions are called region proposals.
Extract CNN features from the region proposals.
Classify the objects using the extracted features.
There are three variants of an R-CNN. Each variant attempts to optimize, speed up, or enhance the results of one or more of these processes.
The R-CNN detector  first generates region proposals using an algorithm such as Edge Boxes. The proposal regions are cropped out of the image and resized. Then, the CNN classifies the cropped and resized regions. Finally, the region proposal bounding boxes are refined by a support vector machine (SVM) that is trained using CNN features.
As in the R-CNN detector , the Fast R-CNN detector also uses an algorithm like Edge Boxes to generate region proposals. Unlike the R-CNN detector, which crops and resizes region proposals, the Fast R-CNN detector processes the entire image. Whereas an R-CNN detector must classify each region, Fast R-CNN pools CNN features corresponding to each region proposal. Fast R-CNN is more efficient than R-CNN, because in the Fast R-CNN detector, the computations for overlapping regions are shared.
The Faster R-CNN detector. Instead of using an external algorithm like Edge Boxes, Faster R-CNN adds a region proposal network (RPN) to generate region proposals directly in the network. The RPN uses Anchor Boxes for Object Detection. Generating region proposals in the network is faster and better tuned to your data.
This family of object detectors uses region proposals to detect objects within images. The number of proposed regions dictates the time it takes to detect objects in an image. The Fast R-CNN and Faster R-CNN detectors are designed to improve detection performance with a large number of regions.
You can use a pretrained convolution neural network (CNN) as the basis for an R-CNN
detector, also referred to as transfer learning. See Pretrained Deep Neural Networks (Deep Learning Toolbox).
Use one of the following networks with the
trainFastRCNNObjectDetector functions. To use any of these networks you
must install the corresponding Deep Learning Toolbox™ model:
You can also design a custom model based on a pretrained image classification CNN. See the Design an R-CNN, Fast R-CNN, and a Faster R-CNN Model section and the Deep Network Designer app.
You can design custom R-CNN models based on a pretrained image classification CNN. You can also use the Deep Network Designer to build, visualize, and edit a deep learning network.
The basic R-CNN model starts with a pretrained network. The last three classification layers are replaced with new layers that are specific to the object classes you want to detect.
For an example of how to create an R-CNN object detection network, see Create R-CNN Object Detection Network
The Fast R-CNN model builds on the basic R-CNN model. A box regression layer is added to improve on the position of the object in the image by learning a set of box offsets. An ROI pooling layer is inserted into the network to pool CNN features for each region proposal.
For an example of how to create a Fast R-CNN object detection network, see Create Fast R-CNN Object Detection Network
The Faster R-CNN model builds on the Fast R-CNN model. A region proposal network is added to produce the region proposals instead of getting the proposals from an external algorithm.
For an example of how to create a Faster R-CNN object detection network, see Create Faster R-CNN Object Detection Network
You can use the Image Labeler, Video Labeler, or Ground Truth Labeler (available in Automated Driving Toolbox™) apps to interactively label pixels and export label data for training. The apps can also be used to label rectangular regions of interest (ROIs) for object detection, scene labels for image classification, and pixels for semantic segmentation.
 Zitnick, C. Lawrence, and P. Dollar. "Edge boxes: Locating object proposals from edges." Computer Vision-ECCV. Springer International Publishing. Pages 391-4050. 2014.
 Girshick, R., J. Donahue, T. Darrell, and J. Malik. "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation." CVPR '14 Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Pages 580-587. 2014
 Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015
 Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks." Advances in Neural Information Processing Systems . Vol. 28, 2015.