제품
솔루션
학습자
교육

자기 주도형 온라인 교육

강사 주도 교육

MathWorks 자격증 프로그램

이벤트

MATLAB EXPO

예정 라이브 이벤트

온디맨드 웨비나

학습 관련 자료

MATLAB 교육

MATLAB을 활용한 연구

학생 대상 프로그램

서적

문의하기

도움말 센터를 방문하면 제품 문서를 살펴보고, 커뮤니티 포럼에 참여하며, 릴리스 정보 등을 확인할 수 있습니다.

MATLAB 및 Simulink 비디오

제품에 대해 자세히 알아보고, 시연을 보며, 새로운 기능을 살펴볼 수 있습니다.

비디오 살펴보기
회사
회사

회사 정보

사명과 가치

사회적 미션

MathWorks의 탈탄소화

고객 사례

채용

채용 개요

직무 검색

다양성, 형평성 및 포용성

팀 및 역할

지사 위치

문의하기

MathWorks의 탈탄소화

MathWorks에서 지구의 자원을 보존하고 복원하기 위해 기울이는 노력에 대해 알아볼 수 있습니다.

자세히 알아보기
도움말 센터
MATLAB 받기 MATLAB
MathWorks 계정로그인.
MATLAB 받기 MATLAB 문의하기
검색

비디오 및 웨비나

Hyperparameter Optimization | Applied Machine Learning, Part 3

From the series: Applied Machine Learning

Machine learning is all about fitting models to data. This process typically involves using an iterative algorithm that minimizes the model error. The parameters that control a machine learning algorithm’s behavior are called hyperparameters. Depending on the values you select for your hyperparameters, you might get a completely different model. So, by changing the values of the hyperparameters, you can find different, and hopefully better, models.    

This video walks through techniques for hyperparameter optimization, including grid search, random search, and Bayesian optimization. It explains why random search and Bayesian optimization are superior to the standard grid search, and it describes how hyperparameters relate to feature engineering in optimizing a model.

Published: 18 Jan 2019

Machine learning is all about fitting models to data. The models consist of parameters, and we find the value for those through the fitting process. This process typically involves some type of iterative algorithm that minimizes the model error. That algorithm has parameters that control how it works, and those are what we call hyperparameters.

In deep learning, we also call the parameters that determine the layer characteristics hyperparameters. Today, we’ll be talking about techniques for both.

So, why do we care about hyperparameters? Well, it turns out that most machine learning problems are non-convex. This means that depending on the values we select for the hyperparameters, we might get a completely different model. By changing the values of the hyperparameters, we can find different, and hopefully better, models.

Ok, so we know that we have hyperparameters, and we know we want to tweak them, but how do we do that? Some hyperparameters are continuous, some are binary, and others might take on any number of discrete values. This makes for a tough optimization problem. It is almost always impossible to run an exhaustive search of the hyperparameter space, since it takes too long.

So, traditionally, engineers and researchers have used techniques for hyperparameter optimization like grid search and random search. In this example, I’m using a grid search method to vary 2 hyperparameters – Box Constraint and Kernel Scale – for an SVM model. As you can see, the error of the resulting model is different for different values of the hyperparameters. After 100 trials, the search has found 12.8 and 2.6 to be the most promising values for these hyperparameters.

Recently, random search has become more popular than grid search.

“How could that be?” you may be asking.

Wouldn’t grid search do a better job of evenly exploring the hyperparameter space?

Let’s imagine you have 2 hyperparameters, “A” and “B”. Your model is very sensitive to “A,” but not sensitive to “B.” If we did a 3x3 grid search, we would only ever evaluate 3 different values of “A.” But if we did a random search, we would probably get 9 different values of “A”, even though some may be close together. As a result, we have a much better chance of finding a good value for “A.” In machine learning, we often have many hyperparameters. Some have a big influence over the results, and some don’t. So random search is typically a better choice.

Grid search and random search are nice because it’s easy to understand what’s going on. However, they still require many function evaluations. They also don’t take advantage of the fact that, as we evaluate more and more combinations of hyperparameters, we learn how those values affect our results. For that reason, you can use techniques that create a surrogate model – or an approximation of the error as a function of the hyperparameters.

Bayesian optimization is one such technique. Here we see an example of a Bayesian optimization algorithm running, where each dot corresponds to a different combination of hyperparameters. We can also see the algorithm’s surrogate model, shown here as the surface, which it is using to pick the next set of hyperparameters.

One other really cool thing about Bayesian optimization is that it doesn’t just look at how accurate a model is. It can also take into account how long it takes to train. There could be sets of hyperparameters that cause the training time to increase by factors of 100 or more, and that might not be so great if we’re trying to hit a deadline. You can configure Bayesian optimization in a number of ways, including expected improvement per second, which penalizes hyperparameter values that are expected to take a very long time to train.

Now, the main reason to do hyperparameter optimization is to improve the model. And, although there are other things we could do to improve it, I like to think of hyperparameter optimizations as being a low-effort, high-compute type of approach. This is in contrast to something like feature engineering, where you have higher effort to create the new features, but you need less computational time. It’s not always obvious which activity is going to have the biggest impact, but the nice thing about hyperparameter optimization is it lends itself well to “overnight runs,” so you can sleep while your computer works.

That was a quick explanation of hyperparameter optimization. For more information, check out the links in the description.

Related Products

Statistics and Machine Learning Toolbox

Learn More

Bayesian Optimization Workflow

Model Building and Assessment

Bayesian Optimization Documentation

What Is AutoML?

Bridging Wireless Communications Design and Testing with MATLAB

Read white paper

Related Information

MATLAB for Machine Learning

Featured Product

Statistics and Machine Learning Toolbox

Up Next:

Walk through several key techniques and best practices for running your machine learning model on embedded devices.

Part 4: Embedded Systems

View full series (4 Videos)

Related Videos:

Machine Learning for Predictive Modelling (Highlights)

Machine Learning for Predictive Modelling

Machine Learning with MATLAB

Machine Learning with MATLAB: Getting Started with...

The Basics | Machine Learning Made Easy

View more related videos