From the series: Computer Vision with MATLAB
Avinash Nehemiah, MathWorks
In this introductory webinar you will learn how to use computer vision algorithms to solve real world imaging problems. Computer vision uses images and video to detect, classify, and track objects or events in order to understand a real-world scene.
You will discover how to:
We will demonstrate key features through real world examples including:
This webinar assumes some experience with MATLAB and no experience with computer vision. We will focus on the Computer Vision Toolbox.
About the Presenter: Avinash Nehemiah works on computer vision applications in technical marketing at MathWorks. Prior to joining MathWorks he spent 7 years as an algorithm developer and researcher designing computer vision algorithms for hospital safety and video surveillance. He holds an MSEE degree from Carnegie Mellon University.
Welcome to this webinar on computer vision made easy. My name is Avinash Nehemiah, and I'm a product marketing manager for computer vision here at the MathWorks. Before we get started, here's a quick overview of what I'm going to talk about for the next 40 minutes or so.
I'm going to start by defining computer vision and showing you a few interesting examples of computer vision. And then I'm going to show you how easy it is to use MATLAB and the Computer Vision System Toolbox to solve real computer vision problems. I'm going to show you how to solve problems such as locating an object in an image of a cluttered scene, analyzing the flow of traffic on a busy street, and tracking a person's movements in video.
So what is computer vision? Computer vision extends image processing by using images and video to understand the real world scene by detecting, classifying, and identifying objects and events. In this video, I've used image processing to improve the image quality by adjusting the contrast of the input images and sharpening up the images. I've then used computer vision to detect and count the moving objects. I'd like to point out that this is one of the problems that I will be showing you how to solve later in this webinar.
Our customers have been very successful over the years in using computer vision to solve a wide variety of problems. For example, BMW used computer vision to create a parking assist system. And NASA used computer vision to help land the Mars Rover.
Before we get started, here are a couple of tips to help you get the most out of this webinar. We will make the code available at the end of the webinar, and I really encourage you to check it out. Now let's solve some problems. For my first problem, what I'd like to do is detect an object of the book in the image on the left, in an image of a cluttered scene on the right.
Now there are several challenges to solving this problem. The object could appear rotated on the scene. There could be a scale change where the object could appear bigger or smaller than the template. Or you could have clutter, or you have other objects that could hide parts of the object that you're looking for, or confusers that could confuse your algorithm.
Now what all of this means is you can't use standard image processing techniques, like template matching, to solve this problem. And you need to find an algorithm that's a little bit more sophisticated and robust. The approach I've chosen to solve this problem is a workflow called feature detection, extraction, and matching. Now the reason I've picked this approach is that this is a very fundamental workflow in computer vision and can be used to solve a wide variety of problems.
The first step is to detect interesting features in the object on the left, and this is shown by the red markers. The next step is to take a region surrounding the detected features and encode some information about that region into what you call a feature vector. You then do the same thing for the image on the right. And after that, you look for corresponding matches between the feature vectors that you computed for the two images.
Once you have a set of match features, you can then estimate the location of the object in the scene. Now before we start working in MATLAB, I want to talk really quickly about what makes a good feature. So a good feature detects an image region that is distinct, something that leads to an unambiguous match in other images, and that is repeatable across many images over time. So for this image of a white car, that could be a corner, or maybe a template of the car itself, or something more modern, like a SURF feature that detects blobs, or an MSER feature that detects regions.
So now let's start with MATLAB. Now the first thing I'd like to do is read in an image of the object that I'd like to detect and import that to my MATLAB workspace. So to do this, I'm going to right-click on an image here, click Import Data, and MATLAB is going to give me a wizard that's going to let me import this data to my MATLAB workspace.
And you see a variable called LostBook3 shown up in my workspace. Now the next thing I'm going to do is I'm going to try and visualize this data, just to make sure I write in the right image. So to do this, I'm going to click on this variable, and I'm going to the Plots tab in MATLAB. And what MATLAB does is it gives me a list of plots that would work for this variable that I've clicked on. So I'm going to click on imshow, and that's going to display the image that I just read.
Now as a quick side note, this book, Digital Image Processing with MATLAB, is an excellent introduction to image processing. And if you're new to image processing and computer vision, I would highly recommend you check it out.
Now you don't always have to use this point and click interface to import data into MATLAB. You can also read in an image using the imread function. And that does the exactly same thing that I just did with the import wizard.
Now the next step for me is to convert this color image into grayscale. Now the reason I want to do that is a lot of computer vision algorithms, especially feature detection algorithms, prefer grayscale data. So I'm going to do that using the rgb2gray function. Now as I mentioned in my slides, the first step in my workflow is to detect interesting features.
Now I know I want to detect features, but I'm not entirely sure what functionality MATLAB has to let me do that. So let me show you an easy way to find that. So I'm going to go to the search bar on the top right hand corner, and I'm going to type, "detect features." And when I press Enter, this opens up the Doc center, and this lists all the functions that I could use in MATLAB to detect features.
So I'm going to select detect SURF features, and this is going to tell me pretty much anything I need to know about detecting SURF features in MATLAB. It's going to give me the function syntax, the description of the function, and more information about the arguments. And if I scroll down a little further, this actually shows me an example of how I could use MATLAB to detect SURF features.
Let me show you a really cool thing about the MATLAB help. What I can do is I can highlight a line of code here, and I can right-click on that and hit evaluate section. And what this does is it inserts that code into the command window and runs it. So as you can tell, it's run the detect SURF features function on the image I, and it's returned the variable points. And if I look at this points variable in my workspace, it's actually 2,048 SURF points, or a little over 2,000 SURF points.
So now let me go back to my help, and the help actually shows me how I can visualize the SURF features. And what this section of code does is it just visualizes the 10 strongest features out of the 2,000 features detected. So if I evaluate this section, you can see that the 10 strongest features really lie on the text of Digital Image Processing. You have a number of features on the letter I. You have a couple on the letter M, one on the letter N, and one on the letter R. And these are very distinct regions in the image, as you can imagine, and these will make it very easy to match between images.
Now one thing you've probably noticed is the radius of the circles around these detected features are different. Now the reason this happens is the feature detection algorithm tells you how large the region around the feature is that you need to extract to form your feature vector. So let's visualize this a little better to see what these regions around these detected features are.
So to do that, I'm actually going to visualize the 20 strongest features in the image I. And as you can see, all of those features lie in the text of the Digital Image Processing book. The strongest features seem to be around the letter I. There's a lot in the letters M and N. And again, these are very textured regions, regions that you can imagine would be easy to match between images.
Now as you can see, MATLAB is an excellent environment just to import data, to explore the data, to just get to know more about the problem that you're solving. But MATLAB is also an excellent scripting language. So let me open the script that I've written to help me better explain the rest of this problem. First, I'm going to close out the current folder browser, just so we have more space to see the code.
Now you can see this code has been split into some different sections. And this is a very, very cool feature in MATLAB where you can run these sections out of order. So you can run section 3, and then you can go back and run section 1, and then you can follow that by running section 2. Now this is very different for those of you who come from the C and C++ world, where if you had made a change in your code, you'd have to recompile your project and you can't really run anything out of order.
So I'm going to start by clearing my workspace. And then I'm going to read in my two images. I'm going to read in the object and the scene. And one thing I'd like to point out is the template I have the object I'm trying to detect is very different from how it appears in my scene. I mean, the book has a different color cover, and it's actually a different edition of the book.
And then I'm going to detect SURF features, like I showed you on the command line. And after that, I'm actually going to use the Extract Features function to extract the features. And you can see two variables have appeared in my workspace, called feats1 and feats2, that are the same size as the detected SURF feature points, but they're 64 elements wide.
So what this means is I've used 64 elements to represent the region around the detected features, and my feature vectors are 64 elements long. Now since I've actually already displayed these features, I'm going to skip over this section, and I'm going to the next section. Now as I mentioned before, the final step is to match the detected features between the two images. So to do that, I'm going to use the match feature function.
Now let me show you another quick way to find out more information about the function that you're using in your editor. To do that, you could just right-click on a function and click Help on selection, or hit F1, and that gives you the help on that function. So you can see the help for how to match features, and it gives you a function description and also gives you examples on how you could use the function.
So what I'm going to do is I'm going to use the Match Features function to match the two feature vectors, features 1 and features 2. And then I'm going to display the matched features using the show match features function. Great! So as you can see, we've done a really good job of matching features between the book on the left and the book in the scene.
But there's a slight problem here. As you can see, there's a ton of spurious matches that appear outside of the book. And this is a problem, because if you use these matches to estimate where the book was, you'd basically have a bounding box that was half this image, which is not entirely accurate. So let me show you how you can solve this problem.
To solve this problem, we're going to use an algorithm called RANSAC. Now RANSAC is short for random sample consensus. And RANSAC is a method to estimate the mathematical model, while filtering out outliers. So what does this mean? So for this data and this image on the left, if you wanted to estimate the equation of the line, what RANSAC would do would help you estimate the equation of this line in blue on the right, while filtering out all the points in red, or the outliers that don't really contribute to finding the equation of the line.
Now in our case, what we actually have to do is to estimate the geometric transform of the points from the image, or the object, sorry, in the scene. So let's go back to MATLAB. And here I am using the estimated geometry transform function that uses RANSAC under the hood to figure out which my inliers and tell me the transform between the object and the scene. So let me run that, and then I'm going to use the show match features function again to display the matches.
Great! And you can see that RANSAC's done a fantastic job of filtering out the outliers that I had that were spurious matches, and now I have fewer matches, but all those matches are very good matches. You've found the exact matches between the object on the left and the scene on the right.
Now the last step for me to do is to locate the object in the scene. So to do this, I'm going to use the transform that I calculated using RANSAC. And I'm going to transform the equation of the bounding box from the object into the scene. I'm going to use a function called transform points forward to do this. So let's see how that looks. And you can see it's done a fantastic job of locating the book in the scene. And this is in spite of the fact that the template that we used for the book was very different from what we have. We actually had a template that was-- the book's cover was a completely different color.
Now the other great thing about having a script is I can very quickly change one variable and run different data through this to see how well my algorithm works. So I'm going to just read in a different image. I'm just going to run the whole thing. I'm not going to step through it step by step like I did previously. And as you can see, for this book on the left, the algorithm has done a fantastic job of locating it in the same scene on the right.
As I mentioned earlier, this workflow of extracting and matching features is used in a lot of applications and computer visions. So let me show you an example of a different application. So to do that, what I'm going to do is I'm going to open the Doc center by clicking on the question mark there. I'm going to click to the Computer Vision System Toolbox. And then I'm going to open the Examples tab, and this lists out a huge number of examples that really show you a lot of different computer vision problems.
And I'm going to click on Video Mosaicking. Now what video mosaicking is, is you take images of a scene from different angles, and then you stitch them all together to form a mosaic of that scene. Now feature detection extraction and matching is used in video mosaicking to figure out how to stitch the images together. So the features are matched from frame to frame in the images on the left, and that is used to estimate the transformation between frames, which is then used to create the nice video mosaic that you see in the image here.
So in conclusion, to locate an object in an image of a cluttered scene, we use feature detection extraction and matching. We found that this algorithm was really robust to rotation, scale change, and even slight occlusions. We learned about using RANSAC to help remove outliers. We also learned that this algorithm is used in many applications, like image registration and autonomous robotics, where robots actually use this exact algorithm to help find their charging stations.
Now in the last example, I showed you how you can detect specific objects and images. In this next example, I'm going to show you how to detect moving objects. And we're going to use that information to help analyze the flow of traffic on a busy street. To do this in MATLAB, I'm going to detect the moving objects. And then I'm going to count the number of moving objects in each frame of video.
Now one thing you've probably noticed is different between this example and the last, is that in this example, I'm using a stream of video, as opposed to the last example, where I used a pair of still images. To help deal with streaming data on MATLAB, we use something called system objects. Now system objects and MATLAB objects are designed to handle streaming data, like video. And they can represent algorithms or input-output capabilities.
Now, the key thing to remember is system objects have a step method that is used to process data and run the system object. Now the really cool thing about system objects is they form a bridge between MATLAB and Simulink. So the system objects that you use in MATLAB can be used in Simulink as well. So let me show you a quick example as to how system objects work.
So I have this script where I'm going to use the vision.VideoFileReader system object to read a video file. I'm then going to use the video player system object to display the video. I'm going to use the isdone method. Now the isdone method is specific to the VideoFileReader object, and it lets you know when you've reached the end of the video.
So now I'm going to loop till I reach the end of the video. I'm going to use the step function to read the next frame of video. And I'm then going to use the step function to display that video. So let me run this to see what I get.
And as you can tell, MATLAB has read the video using the system object, and that's displayed it. Now you might have noticed, this video is actually playing a lot faster than real time. And the reason this is doing this is MATLAB is reading the data as fast as it can from the hard drive and displaying it to the screen.
So let's get back to our problem. Now, the first thing you need to do is find the moving pixels in each frame of video. So to do this, we're going to use an algorithm called background subtraction. Now background subtraction, what it does is it estimates a statistical model of the background. So the background in this image would normally be the buildings, the sky, parts of the road. Basically, pixels that aren't changing over time and could be considered parts of the background.
So if you had an input image, and if you subtracted the model of the background that you just learned, what you would get is a binary mask. And this is also known as a foreground mask, or a motion mask, where the pixels that are represented by one are actually pixels that are different from the background. So in this case, you'll see that the car, the location of the car, is represented by pixels as one. So let's jump into MATLAB to see how to do this.
I'm going to open another script that I have called countingCars. I'm going to start by clearing my workspace. I'm then going to create a video file reader to read in my data. And then I'm actually creating two video file displays, one to display the foreground, and one to display the video.
The next thing for me to do is to create a system object of foreground detector. Now the foreground detector system object performs background subtraction. So let's open up the help just to get a better look at what it does. So as you can see, the vision.ForegroundDetector system object uses Gaussian mixture models to detect the foreground.
And what I'm going to do here is I'm actually going to run the first 75 frames of video through the foreground detector system object. And the reason I'm doing this is I want to give it enough time to learn the foreground. And after that, I'm just going to stop, and I'm going to look at the 75th frame of video, and I'm also going to look at that foreground.
So let me run that section. And as you can see, for this image here, the foreground actually is a pretty good representation of which pixels are moving and which pixels are not. So the background subtraction's done a really good job of learning the background and figuring out which pixels are the foreground.
There is, however, one little problem. If you look at all these little specks of noise, if you use this data from this foreground to count the number of moving objects, you'd have hundreds of objects, because the noise would be counted as moving objects, too. And that would be very inaccurate. So the next thing to do is actually filter out the noise from the foreground.
Now to do that, I'm going to use something called image morphology. Now morphology is a form of image filtering that helps filter binary images and clean it up, remove some noise. So to do this, I'm going use the opening morphology operator which is implemented in the imopen function. And I'm going to use a filtering element which is a disk of radius 1. So let me run that to see how it looks. Wow, and you can see it's done a great job of removing all the noise, and it's actually kept all the objects that I want to count intact.
So now that we know that we need to use image morphology, let me show you how I came to this conclusion that I wanted to use an opening operation with that particular structuring element. Now to do this, I'm going to click on the Apps tab in MATLAB. And this lists all the apps that I have installed in MATLAB. Now most MATLAB toolboxes come with a set of apps. But I'm actually going to use an app called morphTo. This app was created by Brett Shoelson, who's an application engineer here at the MathWorks. And this is available on the MATLAB file exchange for free.
So what I'm going to do is I'm going to import a variable from my workspace, which in this case would be the foreground. And want the morphTo app does is it lets me try out all the different morphological operations available in MATLAB, so I can figure out what works for my problem. So I'm going to try the dilate function first. And as you can tell, that's made my noise situation much worse. That's not ideal!
The erosion function actually gets rid of all the noise, but it actually damages some of these moving pixels that I want to keep. So that's also not ideal. I then tried the imopen function, which is an erosion followed by a dilation, which does a fantastic job of removing the noise and keeps all my moving objects intact. And I'm also going to try the other operations available, but as you can see, they don't do quite as good a job as the imopen.
So I'm going to select on imopen, and then the next thing I can do is try all the different filtering elements and the different sizes for filtering elements, before I finally settle down on a disk of radius 1, because that seems to give me the best results. And that is exactly how I figured out that I wanted to use the opening operation with the filtering element of disk with radius 1.
Now once you have a clean foreground, the next thing to do is to group the moving pixels together so that you can segment them into different moving objects. Now to do this, I'm going to use a methodology called connected component analysis. And this is implemented in another system object called blob analysis. So let's look at the help to learn a little bit more about it.
So as you can see, blob analysis computes the statistics of connected regions in a binary image. So in this case, the binary image is going to be the clean foreground. Let's look at the statistics it gives us. So it tells you things like the area of the regions, the centroids, the bounding boxes, the size of the major and minor axes, and the orientation.
So I'm going to create the system object. I'm going to have it list out the bounding boxes of all the connected components. And I'm also going to tell it to filter out all objects whose pixel area is less than 150. So this is really going to give me some extra nice filtering that I wasn't able to achieve with my morphology. So let me run that section.
And now let's move on to our main processing loop. So as you can see, the first thing I do is I read in my video frame by stepping the video file reader. I then pass that video frame through the foreground detector, and I take the output from the foreground detector, and I perform the opening operation on it. I then pass the clean photograph through the blob analysis system object. And basically what this does is it lists all the bounding boxes of the connected components in each frame.
So then the next thing for me to do is to just simply count the number of bounding boxes, and that tells me how many moving objects are in each frame. I'm also going to use the information from these bounding boxes to draw green rectangles around each of the moving objects, just so I have a better visualization as to which objects are moving. So let's run this processing loop and see what we get.
And there's a video of our clean foreground. And there are results where you can see I've drawn green rectangles around the moving objects. And I'm actually displaying a count of the moving objects in the top left hand corner of the screen there.
Now detecting moving objects is actually just a first step in analyzing the flow of traffic. The next thing to do would be to actually track these moving objects through the video to figure out which direction they're moving and how fast they're going. I'm going to talk about object tracking a little bit more in the next section. Now let me show you how I extended the current example, where we simply detected the objects.
Now what I did was I used a Kalman filter to track the detected objects. And the Kalman filter actually gives me a prediction of the direction the cars are moving. And I use this information to change the color of the bounding boxes around the cars.
So in summary, to analyze the flow of traffic, we learned about using background subtraction to detect the moving pixels. We then learned about how you can use morphology to clean up the foreground mask, and then how to use connected component analysis to segment these objects. And then we also talked really briefly about how you can extend this analysis with object tracking.
Now for my final example, I'm going to show you how to track a person's movement from frame to frame in video. So I'm going to take this video of Dima here. Now Dima is one of the developers of the Computer Vision Toolbox. And I'm going to detect his face and then track how he's moving from frame to frame in this video.
Now doing this is a 2-step process. I first need to detect the person object I'm trying to track, which in this case would be Dima's face. And I'm then going to use a point tracking algorithm to track the person's movements from frame to frame. Now to perform the object detection under detect Dima's face, I'm going to use something called a cascade object detector.
Now this is based on the Viola-Jones algorithm. And it's particularly good for detecting faces and other facial features, like your nose and your eyes. And the Viola-Jones algorithm actually detects categories of objects, so it could detect all faces in an image. Now one thing I'd like to point out is this algorithm does not do face recognition. It can simply detect faces but can't recognize a specific phase.
So now let's jump into MATLAB and see how to do this. So I have a script here called facetrack. So I'm going to start by clearing my workspace. Then I'm going to read in the video, creating a video player to display my results. I'm then going to read in the first frame of video and display it.
So you see that's an image of Dima's face there. I'm then going to create a system object, a cascade object detector system object, to actually perform the face detection. So I'm going to step through this cascade object detector system object, and I'm going to draw a bounding box around whatever the results are. So let me try that.
And as you can see, it's done a great job detecting Dima's face. Now as I mentioned in my slides, this cascade object detector is really good for detecting other facial features. So let's look at the help to see what else I can detect using the cascade object detector. And let me scroll down, and there, if I change the classification model, I can actually detect different features. So I can detect upper bodies, and this sounds interesting, eye pair. OK, and I have a classification model of eye pair big. So let me see if I can use that to detect Dima's eyes.
Wow, there you go. So as you can see, the cascade object detector's done a really good job of detecting Dima's eyes, and this is in spite of the fact that Dima's wearing glasses. So let me step back, and let me put that back on faces.
Another thing I wanted to mention really quick is the Computer Vision System Toolbox ships with a number of pre-trained object detectors. But we also have functions and an app that you can use to train a detector of your own. So if you wanted to train our detector to detect coffee mugs, we've provided the framework to help you do that.
So let's get back to our examples, and let me run this section again. OK. Another next thing for me to do is to detect interesting features for me to track, using my point tracker algorithm. Now to perform this feature detection, I'm going to use something called a min eigen features. Now, the reason I'm using these min eigen features are not the SURF features like I did on my first example, is these min eigen features are known to be particularly good for tracking. In some other tools, min eigen features are referred to as good features to track.
So I'm going to use the detect min eigen features function to detect the features. Here's something interesting. I have an argument that says that I'm only going to give it the ROI, or the region of interest. So what this does is it only detects features within that ROI or region of interest, which in this case, is the bounding box of the object I detected, or Dima's face. So let me run that really quick. And here you can see the points have been detected only within that bounding box of the detection of Dima's face.
Now the next step is to initialize a point tracker system object, and this is what's going to help us track those group of points from frame to frame, that then enables us to keep track of Dima's movements through this video. So let's open up the help again just to see what this point tracker does. And you can see it tracks points and video using the Kanad-Lucas-Tomasi algorithm, or KLT.
For those of you who are getting more into computer vision, you're going to see the KLT algorithm a lot, because this really is one of the more robust point tracking algorithms available. And if you scroll down in the help, it actually tells you exactly how to construct the system object, how to initialize it with feature points, and pretty much everything you need to do to use that system object.
So I'm going to go ahead, I'm going to initialize the point tracker system object with the points that I just detected, the min eigen feature points. So let me run this real quick. Let's take a look at our processing loop before I run it. So I'm going to start by reading in an image using my video file reader. I'm then going to pass in that frame, through my point tracker system object.
When I pass it through, it gives me two variables as an output. It gives me a set of points and a set of flags called validity that tells me which points are visible and which points are not. So I'm going to use the validity flags to remove the points that are not visible anymore in the next frame, so that I'm only keeping track of points that are visible from frame to frame.
And this is actually very common when you're tracking points, is that, like, say if Dima turns his face, a whole set of points on one side of his face will stop being visible, which is why you need to keep track of which points stop being visible from frame to frame. I then just insert a green plus sign, just to mark the points that I'm tracking, so I can have a better feel as to how well the algorithm is working. Let me run the process and look to see how it works.
As you can see, it's doing a really good job of tracking most of the features on Dima's face. But there is one problem. Like you can see here, we have a few outliers that are sticking to points that aren't really in Dima's face. Now to filter these out, we can actually rely on the same method, RANSAC, that we used in the first example to filter out the outliers that don't belong to Dima's face. So we can filter out the outliers from frame to frame. So let me show you how that's done.
So now to do this, I actually have a separate script called faceTrackRANSAC. And it's identical to the previous script right up until this point, where I then take my two sets of points from my previous frame and my current frame. And I use RANSAC to estimate the geometric transform of the points from frame to frame. And what this does is RANSAc actually gives me a set of inliers, after filtering out the outliers that don't lie on Dima's face. So let me run this to see how this works.
And as you can see, now it's doing a much better job of tracking Dima's face, without too many outliers outside of the region around Dima's face. Now this is one way to do tracking, but let me show you another example from the Computer Vision System Toolbox that illustrates a completely different way of doing tracking.
So to do that, I'm going to go back to my Doc center, I'm going to click on the Computer Vision System Toolbox, and I'm going to go to the Examples tab. Let me scroll down to where all the tracking examples are, and you can see there's a huge variety of tracking examples. Now I'm going to click on this motion based multiple object tracking.
Now this is a very, very cool example. And the reason I think it's so cool is it actually gives you a framework that you can use to track moving objects. So you could just take this example, and use it to solve some fairly complicated problems. Now the tracking algorithm this does is the same as I mentioned after my last example, is it uses Kalman filers to track objects.
Now Kalman filers are really popular for object tracking and computer vision. And that's for many reasons. One, Kalman filters work pretty well in real time. And two, Kalman filters actually give you a prediction of an output of where an object is, even when you can't see the object, which is great in computer vision, where very often the object could be occluded by something else.
So let's just open this example and see what it looks like. So let me run that. And you can see I have my background subtraction result on the right, and my tracking results on the left. And it's going to start tracking people as they move into the scene, and it's going to give them unique IDs, that-- those are the numbers on the top. As you can see, as they go out of view and they stop being visible, the Kalman filter actually tries to predict their locations. And this is really one of the most interesting features about using Kalman filters, is that they actually let you continue tracking, even when the object is not visible for a few frames.
Now some of you are probably thinking to yourself that I've learned three ways to do object detection, why don't I just detect the object from frame to frame, instead of going through the hassle of object tracking? Well, there's three main reasons to do this. One, object detectors are not perfect. They do fail from time to time. Two, if you had an application that needed to maintain an identity of an object from frame to frame, an object detector just cannot do that.
And three, and this is very key, that in general, object tracking algorithms require less processing power than object detection. So if you wanted to deploy your algorithm onto a mobile device, you might want to be doing less object detection and more object tracking.
And here's a quick summary of the algorithm we used to track a person's movements. We used the cascade object detector to detect Dima's face. We then used min eigen features to find the feature points on the detected object. And we used the KLT point tracker, which is implemented in the vision.PointTracker system object to actually track the object from frame to frame.
So in summary, why would you use MATLAB for computer vision? Well, for starters, MATLAB is a great environment for exploration and discovery. But the simple programming syntaxes that enable you to implement fairly complicated systems in as little as 30 lines of code, as witnessed by the face tracking example. Then there's the comprehensive documentation that I've been leveraging throughout the course of this webinar. And then there's a huge catalog of examples that you can use to learn more about computer vision and that you could even adapt for your own work.
Now here are some recommended next steps for you. I highly encourage you to try out the code from this webinar. Please explore more computer vision examples on the links shown. And also check out these two webinars, Image Processing Made Easy and Computer Vision with MATLAB for Object Detection and Tracking. And lastly, if you have any more questions, please send us your questions at the link below, and we will be happy to answer them.
Recorded: 6 Mar 2014