Skip to content
MathWorks - Mobile View
  • MathWorks 계정에 로그인합니다.MathWorks 계정에 로그인합니다.
  • Access your MathWorks Account
    • 내 계정
    • 나의 커뮤니티 프로필
    • 라이선스를 계정에 연결
    • 로그아웃
  • 제품
  • 솔루션
  • 아카데미아
  • 지원
  • 커뮤니티
  • 이벤트
  • MATLAB 받기
MathWorks
  • 제품
  • 솔루션
  • 아카데미아
  • 지원
  • 커뮤니티
  • 이벤트
  • MATLAB 받기
  • MathWorks 계정에 로그인합니다.MathWorks 계정에 로그인합니다.
  • Access your MathWorks Account
    • 내 계정
    • 나의 커뮤니티 프로필
    • 라이선스를 계정에 연결
    • 로그아웃

비디오 및 웨비나

  • MathWorks
  • 비디오
  • 비디오 홈
  • 검색
  • 비디오 홈
  • 검색
  • 영업 담당 문의
  • 평가판 신청
4:15 Video length is 4:15.
  • Description
  • Full Transcript
  • Related Resources

Introduction to Machine Learning, Part 2: Unsupervised Machine Learning

From the series: Introduction to Machine Learning

Get an overview of unsupervised machine learning, which looks for patterns in datasets that don’t have labeled responses. You’d use this technique when you want to explore your data but don’t yet have a specific goal, or you’re not sure what information the data contains. It’s also a good way to reduce the dimensionality of your data. 

Most unsupervised learning techniques are a form of cluster analysis. Clustering algorithms fall into two broad groups: 

  • Hard clustering, where each data point belongs to only one cluster 
  • Soft clustering, where each data point can belong to more than one cluster 

This video uses examples to illustrate hard and soft clustering algorithms, and it shows why you’d want to use unsupervised machine learning to reduce the number of features in your dataset.

­­Unsupervised machine learning looks for patterns in datasets that don’t have labeled responses.

You’d use this technique when you want to explore your data but don’t yet have a specific goal, or you’re not sure what information the data contains.

It’s also a good way to reduce the dimension of your data.

As we’ve previously discussed, most unsupervised learning techniques are a form of cluster analysis, which separates data into groups based on shared characteristics.

Clustering algorithms fall into two broad groups:

  • Hard clustering, where each data point belongs to only one cluster
  • Soft clustering, where each data point can belong to more than one cluster

For context, here’s a hard clustering example: 

Say you’re an engineer building cell phone towers. You need to decide where, and how many, towers to construct. To make sure you’re providing the best signal reception, you need to locate the towers within clusters of people.

To start, you need an initial guess at the number of clusters. To do this, compare scenarios with three towers and four towers to see how well each is able to provide service.

Because a phone can only talk to one tower at a time, this is a hard clustering problem.

For this, you could use k-means clustering, because the k-means algorithm treats each observation in the data as an object having a location in space. It finds cluster centers, or means, that reduce the total distance from data points to their cluster centers.

So, that was hard clustering. Let’s see how you might use a soft clustering algorithm in the real world.

Pretend you’re a biologist analyzing the genes involved in normal and abnormal cell division. You have data from two tissue samples, and you want to compare them to determine whether certain patterns of gene features correlate to cancer.

Because the same genes can be involved in several biological processes, no single gene is likely to belong to one cluster only.

Apply a fuzzy c-means algorithm to the data, and then visualize the clusters to see which groups of genes behave in similar ways.

You can then use this model to help see which features correlate with normal or abnormal cell division.

This covers the two main techniques (hard and soft clustering) for exploring data with unlabeled responses.

Remember though, that you can also use unsupervised machine learning to reduce the number of features, or the dimensionality, of your data.

You’d do this to make your data less complex – especially if you’re working with data that has hundreds or thousands of variables. By reducing the complexity of your data, you’re able to focus on the important features and gain better insights.

Let's look at 3 common dimensionality reduction algorithms:

  • Principal Component Analysis (PCA) performs a linear transformation on the data so that most of the variance in your dataset is captured by the first few principal components. This could be useful for developing condition indicators for machine health monitoring.
  • Factor Analysis identifies underlying correlations between variables in your dataset. It provides a representation of unobserved latent, or common, factors. Factor analysis is sometimes used to explain stock price variation.
  • Nonnegative matrix factorization is used when model terms must represent nonnegative quantities, such as physical quantities. If you need to compare a lot of text on webpages or documents, this would be a good method to start with as text is either not present, or occurs a positive number of times.

In this video, we took a closer look at hard and soft clustering algorithms, and we also showed why you’d want to use unsupervised machine learning to reduce the number of features in your dataset.

As for your next steps:

Unsupervised learning might be your end goal. If you’re just looking to segment data, a clustering algorithm is an appropriate choice.

On the other hand, you might want to use unsupervised learning as a dimensionality reduction step for supervised learning. In our next video we’ll take a closer look at supervised learning.

For now, that wraps up this video. Don’t forget to check out the description below for more resources and links.

Related Products

  • Statistics and Machine Learning Toolbox

Bridging Wireless Communications Design and Testing with MATLAB

Read white paper

Feedback

Featured Product

Statistics and Machine Learning Toolbox

  • Request Trial
  • Get Pricing

Up Next:

Learn how to use supervised machine learning to train a model to map inputs to outputs and predict the response for new inputs.
4:35
Part 3: Supervised Machine Learning
View full series (4 Videos)

Related Videos:

34:34
Machine Learning Made Easy
5:36
Machine Learning for Predictive Modelling (Highlights)
44:37
Machine Learning for Predictive Modelling
41:25
Machine Learning with MATLAB
34:31
Machine Learning with MATLAB: Getting Started with...

View more related videos

MathWorks - Domain Selector

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

  • Switzerland (English)
  • Switzerland (Deutsch)
  • Switzerland (Français)
  • 中国 (简体中文)
  • 中国 (English)

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)
  • 中国
    • 简体中文Chinese
    • English
  • 日本Japanese (日本語)
  • 한국Korean (한국어)

Contact your local office

  • 영업 담당 문의
  • 평가판 신청

MathWorks

Accelerating the pace of engineering and science

MathWorks는 엔지니어와 과학자들을 위한 테크니컬 컴퓨팅 소프트웨어 분야의 선도적인 개발업체입니다.

활용 분야 …

제품 소개

  • MATLAB
  • Simulink
  • 학생용 소프트웨어
  • 하드웨어 지원
  • File Exchange

다운로드 및 구매

  • 다운로드
  • 평가판 신청
  • 영업 상담
  • 가격 및 라이선스
  • MathWorks 스토어

사용 방법

  • 문서
  • 튜토리얼
  • 예제
  • 비디오 및 웨비나
  • 교육

지원

  • 설치 도움말
  • MATLAB Answers
  • 컨설팅
  • 라이선스 센터
  • 지원 문의

회사 정보

  • 채용
  • 뉴스 룸
  • 사회적 미션
  • 고객 사례
  • 회사 정보
  • Select a Web Site United States
  • 신뢰 센터
  • 등록 상표
  • 정보 취급 방침
  • 불법 복제 방지
  • 애플리케이션 상태
  • 매스웍스코리아 유한회사
  • 주소: 서울시 강남구 삼성동 테헤란로 521 파르나스타워 14층
  • 전화번호: 02-6006-5100
  • 대표자 : 이종민
  • 사업자 등록번호 : 120-86-60062

© 1994-2022 The MathWorks, Inc.

  • Naver
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • RSS

대화에 참여하기