MathWorks - Mobile View
  • MathWorks 계정에 로그인합니다.MathWorks 계정에 로그인합니다.
  • Access your MathWorks Account
    • 내 계정
    • 나의 커뮤니티 프로필
    • 라이선스를 계정에 연결
    • 로그아웃
  • 제품
  • 솔루션
  • 아카데미아
  • 지원
  • 커뮤니티
  • 이벤트
  • MATLAB 다운로드
MathWorks
  • 제품
  • 솔루션
  • 아카데미아
  • 지원
  • 커뮤니티
  • 이벤트
  • MATLAB 다운로드
  • MathWorks 계정에 로그인합니다.MathWorks 계정에 로그인합니다.
  • Access your MathWorks Account
    • 내 계정
    • 나의 커뮤니티 프로필
    • 라이선스를 계정에 연결
    • 로그아웃

비디오 및 웨비나

  • MathWorks
  • 비디오
  • 비디오 홈
  • 검색
  • 비디오 홈
  • 검색
  • 영업 상담
  • 평가판 신청
  Register to watch video
  • Description
  • Full Transcript
  • Related Resources

Applied Machine Learning, Part 1: Feature Engineering

From the series: Applied Machine Learning

Adam Filion, MathWorks

Explore how to perform feature engineering, a technique for transforming raw data into features that are suitable for a machine learning algorithm. 

Feature engineering starts with your best guess about what features might influence the action you’re trying to predict. After that, it’s an iterative process where you create new features, add them to your model, and see if your results have improved.   

This video provides a high-level overview of the topic, and it uses several examples to illustrate basic principles behind feature engineering and established ways for extracting features from signals, text, and images. 

­­­­­Machine learning algorithms don’t always work so ­­­­well on raw data. Part of our jobs as engineers and scientists is to transform the raw data to make the behavior of the system more obvious to the machine learning algorithm. This is called feature engineering.   

Feature engineering starts with your best guess about what features might influence the thing you’re trying to predict.  After that, it’s an iterative process where you create new features, add them to your model, and see if the result improved. 

Let’s take a simple example where we want to predict whether a flight is going to be delayed or not. 

In the raw data, we have information such as the month of the flight, the destination, and the day of the week.  

If I fit a decision tree just to this data, I’ll get an accuracy of 70%. What else could we calculate from this data that might help improve our predictions?

Well, how about the number of flights per day?  There are more flights on some days than others, which may mean they’re more likely to be delayed. 

I already have this feature from my dataset in the app, so let’s add it and retrain the model. You can see the model accuracy improved to 74%. Not bad for just adding a feature.

Feature engineering is often referred to as a creative process, more of an art than a science.  There’s no correct way to do it, but if you have domain expertise and a solid understanding of the data, you’ll be in a good position to perform feature engineering.  As you’ll see later, techniques used for feature engineering are things you may already be familiar with, but you might not have thought about them in this context before.

Let’s see another example that’s a bit more interesting.  Here, we’re trying to predict whether a heart is behaving normally or abnormally by classifying the sounds it makes.

The sounds come in the form of audio signals.  Rather than training on the raw signals, we can engineer features and then use those values to train a model.  

Recently, deep learning approaches are becoming popular, as they require less manual feature engineering. Instead, the features are learned as part of the training process.  While this has often shown very promising results, deep learning models require more data, take longer to train, and the resulting model is typically less interpretable than if you were to manually engineer the features.

The features we used to classify heart sounds come from the signal processing field.  We calculated things such as skewness, kurtosis, and dominant frequencies.  These calculations extract characteristics that make it easier for the model to distinguish between an abnormal heart sound and a normal one.

So what other features do people use?  Many use traditional statistical techniques like mean, median, and mode, as well as basic things like counting the number of times something happens.

Lots of data has a timestamp associated with it. There are a number of features you can extract from a timestamp that might improve model performance.  What was the month, or day of week, or hour of the day?  Was it a weekend or a holiday?  Such features play a big role in determining human behavior, for example, if you were trying to predict how much electricity people use.

Another class of feature engineering has to do with text data.  Counting the number of times certain words occur in a text is one technique, which is often combined with normalization techniques like term-frequency-inverse-document-frequency.  Word2vec, in which words are converted to a high-dimensional vector representation, is another popular feature engineering technique for text.

The last class of techniques I’ll talk about has to do with images.  Images contain lots of information, so you often need to extract the important parts. Traditional techniques calculate the histogram of colors or apply transforms such as the Haar wavelet.  More recently, researchers have started using convolutional neural networks to extract features from images.

Depending on the type of data you’re working with, it may make sense to use a variety of the techniques we’ve discussed. Feature engineering is a trial and error process.  The only way to know if a feature is any good is to add it to a model and check if it improves the results.

To wrap up, that was a brief explanation of feature engineering. We have many more examples on our site, so check them out.

 

Related Products

  • Statistics and Machine Learning Toolbox

Learn More

Feature Extraction for Signals
Feature Extraction for Images
Text Feature Extraction
Statistical Feature Extraction
Related Information
MATLAB for Machine Learning

Feedback

Featured Product

Statistics and Machine Learning Toolbox

  • Request Trial
  • Get Pricing

Up Next:

Use ROC curves to assess classification models. Walk through several examples that illustrate what ROC curves are and why you’d use them.  
4:43
Part 2: ROC Curves
View full series (4 Videos)

Related Videos:

34:34
Machine Learning Made Easy
5:36
Machine Learning for Predictive Modelling (Highlights)
44:37
Machine Learning for Predictive Modelling
41:25
Machine Learning with MATLAB
34:31
Machine Learning with MATLAB: Getting Started with...

View more related videos

MathWorks - Domain Selector

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web site

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)
  • 中国
    • 简体中文Chinese
    • English
  • 日本Japanese (日本語)
  • 한국Korean (한국어)

Contact your local office

  • 영업 상담
  • 평가판 신청

제품 소개

  • MATLAB
  • Simulink
  • 학생용 소프트웨어
  • 하드웨어 지원
  • File Exchange

다운로드 및 구매

  • 다운로드
  • 평가판 신청
  • 영업 상담
  • 가격 및 라이선스
  • MathWorks 스토어

사용 방법

  • 문서
  • 튜토리얼
  • 예제
  • 비디오 및 웨비나
  • 교육

지원

  • 설치 도움말
  • 사용자 커뮤니티
  • 컨설팅
  • 라이선스 센터
  • 지원 문의

회사 정보

  • 채용
  • 뉴스 룸
  • 사회적 미션
  • 영업 상담
  • 회사 정보

MathWorks

Accelerating the pace of engineering and science

MathWorks는 엔지니어와 과학자들을 위한 테크니컬 컴퓨팅 소프트웨어 분야의 선도적인 개발업체입니다.

활용 분야 …

  • Select a Web Site United States
  • 특허
  • 등록 상표
  • 정보 취급 방침
  • 불법 복제 방지
  • 매스웍스코리아 유한회사
  • 주소: 서울시 강남구 삼성동 테헤란로 521 파르나스타워 14층
  • 전화번호: 02-6006-5100
  • 대표자 : 이종민
  • 사업자 등록번호 : 120-86-60062

© 1994-2021 The MathWorks, Inc.

  • Naver
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • RSS

대화에 참여하기