Image Processing and Computer Vision with MATLAB
Overview
In this presentation, you'll discover how to use computer vision and image processing techniques in MATLAB to solve practical image analysis, automation, and detection problems using real-world examples. Explore the latest features in image processing and computer vision such as interactive apps, new image enhancement algorithms, data preprocessing for deep learning, and 3D algorithms.
Highlights
Learn what’s new in Image Processing and Computer Vision:
- Apps for Exploration and Preprocessing
- Object Detection, including deep learning techniques
- 3D Volume Visualization and Segmentation
- Using OpenCV algorithms
About the Presenter
Johanna Pingel joined the MathWorks team in 2013, specializing in Image Processing and Computer Vision applications with MATLAB. She has a M.S. degree from Rensselaer Polytechnic Institute and a B.A. degree from Carnegie Mellon University. She has been working in the Computer Vision application space for over 5 years, with a focus on object detection and tracking.
Gabriel Ha is a Product Marketing Manager supporting MathWorks’ Big Data and Enterprise Integration tools, and is the video consultant and personality for many different MATLAB product areas. He specializes in creating informative yet entertaining videos to explain technical concepts to beginners and advanced users of MATLAB alike. Prior to MathWorks, Gabriel worked as a Program Manager for Microsoft’s Visual C++ Team focusing on IDE features, and holds a B.S. Electrical Engineering and Computer Science from MIT. He enjoys performing music and consuming tasty foods.
Recorded: 23 Apr 2020
Hi, everyone. I'm Joanna Pingle.
I'm Gabriel Huck.
And we'll be showcasing lots of new features and demos in Image Processing and Computer Vision with MATLAB. Many algorithms involved in IPCV use deep learning. So while this isn't a deep-learning webinar, we'll definitely touch on that too.
We have three items in our agenda. First is image preprocessing. Before you go about developing or testing out any algorithm in image processing, computer vision, or deep learning, you need to make sure you have clean data.
Then we'll move on to the main part of our webinar covering object detection. You'll learn a variety of ways to implement it, from classic image processing to new deep-learning techniques.
And finally, we'll conclude with a brief mention of using MATLAB with OpenSource for Computer Vision.
Image Processing, Computer Vision, and deep learning are related but distinct areas. We won't be going into the differences between them in detail. But instead, our goal is to convey when to use them using the right technique for a given problem.
Our first two items correspond to a typical image processing workflow. First, you prepare and clean, which is important for successful vision application. After that you can get down to the real work of developing the algorithm-- in our case, object detection.
We've chosen to cover object detection because there are different ways to implement it that let's us talk about new features across a number of applications, all of which you can utilize as an image-processing person.
As a quick aside, I have to say when I first think about Image Processing versus Computer Vision, I mean, they seem kind of similar. If a computer is going to see something, it's probably going to be an image. And it has to process that image. So Computer Vision, Image Processing, what's the difference?
We'll get into it more in the demos, but it's safe to say many people use these terms interchangeably. Here's a diagram that you might find useful of how these terms are used. Image Processing deals with manipulating pixels in an image. Computer Vision is tightly related, but instead deals with understanding the scene.
Some techniques can be accomplished by both, such as segmentation or image alignment. Deep learning is also expanding into this world. Some classic IPCV techniques can be replaced with a deep learning algorithm. And we'll show that in our first demo.
So for this webinar, you could say that everything is image processing in one form or another. But we use the term computer vision when we start talking about understanding what the thing in the image actually is, as opposed to just saying, hey, there's something in this image.
First up is image preprocessing. As we've said before, it's important to do it regardless of your final application. Image Processing and Computer Vision algorithms will work better if you clean, sharpen, and remove noise from the images.
And especially in the case of deep learning, you'll get much better results using clean, cropped, properly labeled images before spending hours training a deep-learning algorithm.
Image Processing algorithms for deep learning, denoising, and contrast enhancement have been around for a while. Let's look at a new way to do preprocessing using deep learning. We'll take a look at our original image and add some noise, which is very easy to do in MATLAB. Then we'll use a pretrained deep-learning network to denoise this image.
That's great. Just two lines of code for denoising and we got great results. And on a side note, it's amazing that we're using deep learning to prepare something that could be used for deep learning.
I guess so. And just to finish the example, here are the original and the final images side by side. And you can see that they are indeed different.
As you mentioned before, this process is very important, especially for deep learning. Cropping and labeling training data is perhaps the least glamorous part of deep learning. But good training data is the difference between ending up with a model that's 99% accurate versus just 90%
And MATLAB provides many tools to accelerate this time sink. For example, you can quickly crop objects in your image for object detection and recognition algorithms. And we also have an app for semantic segmentation so you can quickly label individual pixels in an image.
And that wraps up our first item. So just keep three things in mind. First, preprocessing equals good no matter your application. Second, you can preprocess using deep learning. And third, MATLAB offers a variety of these models and apps that you can use to quickly get started, whether it's preprocessing or learning more about deep learning itself.
Image Processing Toolbox does more than just preprocessing. And you'll see that as we move into object detection. You can use three different techniques-- Image Processing, Computer Vision, or deep learning.
We'll start with object detection using Image Processing. Our first example is simple. How many noteworthy objects are there in an image? We'll start by analyzing a single image. And you'll see that whatever we come up with can quickly be used as a starting point for live video.
So here we have a box of--
Cylindrical breath mints
--that we can-- what are you doing?
Just in case they don't want the free advertising.
But they're Tic Tacs.
Yes, they're clearly Tic Tacs. They're Tic Tacs, all right. We have a box of Tic Tacs which we will be dumping out in arbitrary amounts and counting them with an object detection algorithm. And in the spirit of showing off what's new, we will do this with a minimal amount of actual code writing.
Which-- which we can accomplish using MATLAB apps. You can call apps by clicking on the icon or calling them programmatically. We'll start with an app for image segmentation, which lets you experiment with sliders and built in segmentation functions for separating your objects from the background.
Here, I've chosen an adaptive thresholding method which looks promising. We can export the images generated or generate MATLAB code if we plan to repeat this process, which is definitely the case.
Next, we'll look at the Image Region Analyzer app which gives us properties of the objects-- things like area, perimeter, and more obscure attributes like eccentricity, orientation and Euler number, all of which you can use to filter your results. For example, you can see we have detected noise in the image. And one way to filter those out is to sort our results on area and then use that property to filter them out.
To quickly reproduce these results, we'll export this code. And now we have two functions-- one that segments the image and one that can quickly get the specified properties which we can use to get a count of the number of objects.
And if visually convinced you that we've detected the objects, we can display them in MATLAB with their bounding boxes.
Now, we implicitly promised a live demo, so let's turn our code into something that can count objects using a simple webcam. We already have our two generated functions that segment and detect the objects. And we can add code to display the final result.
Now, we just need to add the webcam. You'll note a line of code with a call to a function called Webcam. This is from a support package that you can download through the Add-On Manager.
In fact, if you don't have it, you can quickly find the installation page by calling the Webcam function and intentionally generate an error message. As you can see, it contains a hyperlink that takes you directly there.
Once it's installed, we can call it with Webcam. This activates the connection after which you can preview the stream and then use Snapshot to get a current frame of the webcam. Now all you have to do is process the image and display the output. And if you want to run it continuously, just put that code inside a loop. So now we can throw a bunch of--
Mint flavored capsules.
What are you doing?
I'm just covering our bases.
Let's just get the demo going.
All right, so I have a bunch of Tic Tacs here. I've got my webcam, and we are going to look at it.
All right, let's see what we got. So seven oh, let me--
There are actually seven. Let's go ahead and dock that.
All right, great.
Flashy numbers, I don't what's happening there. Yeah, that looks good. So we've got seven there.
Seven, success.
Dump a few more on, see here-- there we go. How many is that? 2, 4, 6, 8, 10, 12. Let's see if we get 12. We do.
12, and if I take one away--
That's way faster than I did. Did you just eat that?
Yep.
All right, so there's-- yeah, way faster than I could ever count, and it's pretty accurate.
All right,
That's great, yeah.
And that wraps things up for this demo. It was a lot of info in a short period of time, but we've got lots of documentation for those who want to learn more about webcam processing. Here's the example about acquiring images in a loop as we just did.
Image Processing is used not only with quantifying over-the-counter halitosis relief tablets but also in lots of real world applications. And everything you've seen done in two dimensions can also be done in three. Here's a medical image for which we want to segment out the lungs. Using New Functions in the Image Processing Toolbox, we can not only visualize in 3D but also segment in 3D with segmentation-like active contours.
So Image Processing can be used for a variety of things, but it does have its limitations. Let's say I want to detect numbers on a piece of paper. Using Image Processing, I can easily write the code to do this and process it with a webcam like we just did.
But if I want the algorithm to tell me which numbers are in the image, we need computer vision. So let's move on to computer vision techniques which help answer the question, what are the objects? in addition to locating them in an image.
Some of you might know about optical character recognition. And you might say, just use an OCR algorithm. Yeah, that's true. You can recognize handwriting using OCR, and we even have an app for OCR to recognize custom languages or fonts as well. Here's the app right here. And you can indeed use it to train a model that recognizes our examples.
Handwriting is a highly researched topic in the area of computer vision. For our example, we want to apply it to something for which there isn't already an app or a function to help solve this problem.
Many of you might be familiar with the game called Pictionary in which players must guess the word or phrase being drawn by their teammate. What?
You said Pictionary.
Yeah, so?
Wouldn't that be visual art inference contest?
Why would I say that? We're going to use computer vision to play Pictionary with our live hand drawings in real time. That being said, to temper your expectations, we'll have to confine our problem space to a fixed set of categories. But hey, if you have time on your hands, take what we're doing and go big.
So how do we do that? First, let's talk about features and how we can use them to create a custom object detector using a machine-learning model. Features are a significant part of Computer Vision. A simple example of a feature is something like color or size-- anything that helps identify the object with unique characteristics.
Other features may be more abstract. You may have heard of SURF or a newer one called KAZE, both of which are in the Computer Vision Toolbox.
An example of a feature that works quite well when dealing with simple digits and drawings is HOG, Histogram of Oriented Gradients. You can see how this feature works when we plot it on a set of random images. The arrows indicate the flow or orientation of the gradient.
And we can use these features as a representation of the images. And rather than pass in the images themselves, we'll pass in the feature representation versions which are usable in a machine-learning algorithm.
Our biggest takeaway from this example is to use the Classification Learner app, which lets you apply many different machine learning models on the same data and see which one has the best accuracy. We pass the features of our training images to the app, select a bunch of models to try out, and train them all with one click. After it's done, you can easily see how all of them did and then export the best one to perform recognition on new images.
Some of you have probably made the connection that this sort of object recognition is something that deep learning helps solve as well. We have a number of other webinars and videos that talk about the difference between deep learning and machine learning. Here, we'll just show an example of using a deep-learning model to identify drawings in an image.
Deep learning can be complicated if you're just getting started. So afterwards, feel free to use the example you're about to see, or take a look at the many other examples provided in the documentation.
With any deep-learning model, we need lots of training data. Fortunately, there's a dataset available that has thousands of drawings of common items. The ones we are going to use are ant, umbrella, cake, wristwatch, and wine glass-- basically a list of my favorite things.
All datasets are going to be structured differently. For this dataset, we have one large text file for each category that contains the x,y locations of the drawing. Each drawing looks something like this.
Here's a great example of how crucial preprocessing is to deep learning. We want our training images to look as similar as possible to the images we plan to use as live test images. So if our training data looks like this, and our test images look like that, our model probably won't work very well.
So all we need to do to start training is redraw everything using MS Paint-- all our sample images with thicker lines. So I just need to pull up MS Paint here. I'm just kidding, guys. It's a joke. It's a joke. Of course, we'll be using MATLAB's Image Processing Tools.
Those of you familiar with Image Processing already how to take an image and transform its pixels. Those of you new to IP can take a look at the documentation where I have examples of how to enlarge the boundary of our object. We can also use MATLAB to center the images and then quickly apply this process to all the images in our data set-- in our case, 5,000 per category.
We then pass our images and our deep-learning model into the training algorithm and let it do its thing. This training takes approximately 20 to 30 minutes, but we'll fast forward to the end of that where we've created our new recognition model.
We've set aside some test images that the model has never seen before. And it gets roughly 90% of them correct. Not a bad start, but let's try it on live images.
One thing that will improve accuracy is incorporating an object detector so we can extract out the object of interest. Now, we can definitely do this with deep learning. And there's new research on a variety of deep-learning object detection algorithms. Popular ones include R-CNN, Fast R-CNN, and Faster R-CNN, all of which are implemented and ready to use with our computer vision tools.
All we need to do is retrain our deep-learning model as an object detector, and we'll be ready to play Pictionary. Here's the thing, though. While deep-learning object detectors can be very powerful, let's be honest. There's a lot of trial and error that has to happen before you get a really solid, robust solution.
And in light of that, you might recall we already solved this problem earlier using Image Processing. So rather than train an object detector using deep learning, let's just use Image Processing to find the object.
You should definitely utilize the deep-learning object detectors. But remember that we mentioned using the right technique to solve the problem. We have a very simple scene and a simple task here.
And in some ways, it's too simple for deep learning. Ironically, that makes it not a good fit. And you can see this from footage of an earlier test we did. We'll get better results extracting the drawing just using Image Processing.
So that's what we plan to do-- Image Processing to extract, deep learning to identify. Let's try it out.
All right, it is Pictionary time-- a marker for you.
Thank you.
And I only have one piece of paper, so I guess you'll draw--
OK.
--on half. I'll draw on this one. So let's see-- ant, wristwatch, umbrella.
Cake.
Cake.
Yes.
OK, there's probably one more. That's OK. You like ants.
Yeah, I can do the ants.
I will do umbrella and cake. All right, let's see here-- loop, loop, loop, big arch, the letter J. All right, that is my umbrella. A little triangle for the top of the cake, I'm going to go with--
And I've got two ants because you're taking a while.
All right, sounds good. And here is my cherry on top of the cake-- umm.
There.
Oh, that's cute. That's really cute. All right, so here are the images that we have drawn. And now we will see how well our ob detector does. All right, trusty webcam, let's see what we've got here. So we'll do when your ants first because they look adorable. And that is definitely an ant. And that is--
Yes.
--definitely an ant. Slight wristwatch there, that's OK. All right, let's see how my umbrella drawings are-- doot de do.
Uh, thinks it's an ant too. That's weird.
Curious-- all right, say it isn't a cake. Hopefully it gets this one.
There we go.
There it is. Cake for that one. See if I can get it to--
So you're 50/50.
I'm 50/50. Yeah, maybe my umbrellas just look like ants.
OK.
So, you know, not everything is perfect, but that is what happens.
So as a reminder, there are many reasons to use deep learning to solve a problem. In many cases, deep learning gives impressive results, especially with objects in a real-world scene like vehicle detection.
But in simpler cases, Image Processing techniques can be a great fit. As a general guideline, if something can be solved with Image Processing techniques, start with that. And then if it's not enough, work your way up to more sophisticated solutions.
Let's summarize what we've talked about in terms of when to use Image Processing, Computer Vision, and deep learning techniques. Image Processing is great if it seems like you can easily segment the image into a foreground and background. You can quickly try it out with MATLAB apps and see if color, intensity, texture, or size gets the job done.
Next, Computer Vision-- use this when you need to understand what an object is. Also, consider how many objects you're trying to distinguish. Computer vision typically works well for fewer categories, especially if there are obvious external features that differentiate them.
We used HOG features in our example, but definitely look into other feature detectors like SURF and KAZE to find the best approach for your application.
At the end of the spectrum is deep learning. It's powerful. It handles complex scenarios. But the trade off is that you need to be prepared to spend lots of time training and refining your algorithm.
Yeah, your first-time results generally won't be as accurate as you might find using the other two techniques for their relevant scenarios. You need to spend time getting the right combination of training options and network configurations. But your solution will be more robust and refined so long as you're willing to put in the time.
Remember, you can always combine approaches like we just did. Another example is if you're trying to classify objects in motion and they move in a predictable way, you can do motion detection using Image Processing and then use deep learning to recognize them.
Finally, we want to mention how you can use MATLAB with OpenSource solutions. Sometimes you find yourself wanting to use OpenSource to solve a problem while also leveraging capabilities in MATLAB. As an example, we'll look at OpenCV, which is a popular OpenSource tool for Computer Vision applications.
MATLAB makes it easy to work with OpenCV in MATLAB with a C++ support package interface. This can be great for any custom functions you or a colleague wrote in OpenCV and don't want to rewrite in MATLAB.
So thanks for joining us in our Image Processing and Computer Vision webinar. You can quickly get started by downloading the code used in the links below. If you have any questions or feedback, feel free to email us.