Faster RCNN code in Matlab

조회 수: 1 (최근 30일)
younghak shin
younghak shin 2017년 3월 30일
답변: zahir ullah 2018년 9월 29일
I am trying to use trainFasterRCNNObjectDetection in Matlab 2017. As I understand it, in the original faster R-CNN paper the input size of the CNN first layer is the image size, for example 256*256. But in the Matlab example: https://se.mathworks.com/help/vision/examples/object-detection-using-faster-r-cnn-deep-learning.html they recommend using smallest object size in the image such as 32*32 see in below part. "Start with the imageInputLayer function, which defines the type and size of the input layer. For classification tasks, the input size is typically the size of the training images. For detection tasks, the CNN needs to analyze smaller sections of the image, so the input size must be similar in size to the smallest object in the data set. In this data set all the objects are larger than [16 16], so select an input size of [32 32]. This input size is a balance between processing time and the amount of spatial detail the CNN needs to resolve." I don't understand this part. For applying CNN, the input layer has the full image size. How can we find RPN using just smaller part of the image?
Can anyone help me?

채택된 답변

Eric Psota
Eric Psota 2017년 4월 4일
편집: Eric Psota 2017년 5월 22일
The Faster RCNN network is designed to operate on a bunch of small regions of the image. For example, if you're trying to detect people, and they never take up more than 200x200 regions in a 1080x1920 image, you should use a network that takes as input a 200x200 image. If you think about it, convolutional kernels don't care how big the input image is. You can pass an image of any size through the convolutional layers of a network, and the only thing that will change is the spatial dimensions at each layer. That is why Faster RCNN shares the convolutional layers between the region proposal network (RPN) and the classification/regression networks.
Once you get to the fully connected layers, this is a different story, since the connections are trained with a specific spatial dimension in mind. For this reason, Faster RCNN trains the RPN and classification/regression layers separately.
  댓글 수: 2
younghak shin
younghak shin 2017년 4월 5일
Thank you very much for your answer. I don't clearly understand why we set input size as 200*200 in your example. Then, for the test image (1080*1920) how faster RCNN works? it checks all possible 200*200 region in RPN regardless of input size?
Eric Psota
Eric Psota 2017년 5월 22일
편집: Eric Psota 2017년 5월 22일
It doesn't check all possible 200x200 regions. Instead, it checks a subset of them which depends on a lot of factors. One of the factors is how much your convolutional layers reduce the spatial dimensionality of the original image. For example, consider a case where your original image was 1080x1920x3 and, after a series of convolutional - RELU - max pooling layers, your resulting feature map is 108x192x300. This is effectively a spatial downsampling of 1/10, so a 200x200 window in the full-scale example becomes 20x20 in the resulting feature map. At this point, Faster R-CNN might slide a 20x20x300 kernel through the feature map to determine if there are objects present in the spatial regions, effectively stepping by 10 pixels horizontally and vertically through the original image.
After the presence or absence of an object is established in a given location, a spatial chunk of the 108x192x300 feature map will be extracted (though ROI pooling) and passed through both the bounding box regressor and the classifier.
The other factors to consider are the height/width ratio of the regions, how many different sizes are considered, etc. But, hopefully this helps to explain how the regions get processed.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

miao wang
miao wang 2017년 4월 4일
I am confused too. When i use my own dataset to train the Faster R-CNN and get detector,but when a test a picture,it's usually return empty bbox and scores [bboxes scores]=detect(detector,I);I do kown what's the problem, i also holp someone can help me.

zahir ullah
zahir ullah 2018년 9월 29일
how to test the faster rcnn detector on video

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by