제품
솔루션
학습
교육

자기 주도형 온라인 교육과정

강사 주도형 교육

MathWorks 자격증 프로그램

이벤트

MATLAB 및 Simulink 이벤트

이벤트 진행기록

온디맨드 웨비나

학습 관련 자료

MATLAB 교육

MATLAB을 활용한 연구

학생 대상 프로그램

서적

문의하기

도움말 센터를 방문하면 제품 문서를 살펴보고, 커뮤니티 포럼에 참여하며, 릴리스 정보 등을 확인할 수 있습니다.

MATLAB 및 Simulink 비디오

제품에 대해 자세히 알아보고, 시연을 보며, 새로운 기능을 살펴볼 수 있습니다.

비디오 살펴보기
회사
회사

회사 정보

사명과 가치

사회적 미션

MathWorks의 탈탄소화

고객 사례

채용

채용 개요

채용 공고 검색

팀 및 역할

문의하기

MathWorks의 탈탄소화

MathWorks에서 지구의 자원을 보존하고 복원하기 위해 기울이는 노력에 대해 알아볼 수 있습니다.

자세히 알아보기
도움말 센터
MATLAB 받기 MATLAB
로그인
MATLAB 받기 MATLAB 문의하기
검색

Description

Unsupervised Machine Learning | Introduction to Machine Learning, Part 2

From the series: Introduction to Machine Learning

Get an overview of unsupervised machine learning, which looks for patterns in datasets that don’t have labeled responses. You’d use this technique when you want to explore your data but don’t yet have a specific goal, or you’re not sure what information the data contains. It’s also a good way to reduce the dimensionality of your data.

Most unsupervised learning techniques are a form of cluster analysis. Clustering algorithms fall into two broad groups:

Hard clustering, where each data point belongs to only one cluster
Soft clustering, where each data point can belong to more than one cluster

This video uses examples to illustrate hard and soft clustering algorithms, and it shows why you’d want to use unsupervised machine learning to reduce the number of features in your dataset.

Published: 6 Dec 2018

Full Transcript

Unsupervised machine learning looks for patterns in datasets that don’t have labeled responses.

You’d use this technique when you want to explore your data but don’t yet have a specific goal, or you’re not sure what information the data contains.

It’s also a good way to reduce the dimension of your data.

As we’ve previously discussed, most unsupervised learning techniques are a form of cluster analysis, which separates data into groups based on shared characteristics.

Clustering algorithms fall into two broad groups:

Hard clustering, where each data point belongs to only one cluster
Soft clustering, where each data point can belong to more than one cluster

For context, here’s a hard clustering example:

Say you’re an engineer building cell phone towers. You need to decide where, and how many, towers to construct. To make sure you’re providing the best signal reception, you need to locate the towers within clusters of people.

To start, you need an initial guess at the number of clusters. To do this, compare scenarios with three towers and four towers to see how well each is able to provide service.

Because a phone can only talk to one tower at a time, this is a hard clustering problem.

For this, you could use k-means clustering, because the k-means algorithm treats each observation in the data as an object having a location in space. It finds cluster centers, or means, that reduce the total distance from data points to their cluster centers.

So, that was hard clustering. Let’s see how you might use a soft clustering algorithm in the real world.

Pretend you’re a biologist analyzing the genes involved in normal and abnormal cell division. You have data from two tissue samples, and you want to compare them to determine whether certain patterns of gene features correlate to cancer.

Because the same genes can be involved in several biological processes, no single gene is likely to belong to one cluster only.

Apply a fuzzy c-means algorithm to the data, and then visualize the clusters to see which groups of genes behave in similar ways.

You can then use this model to help see which features correlate with normal or abnormal cell division.

This covers the two main techniques (hard and soft clustering) for exploring data with unlabeled responses.

Remember though, that you can also use unsupervised machine learning to reduce the number of features, or the dimensionality, of your data.

You’d do this to make your data less complex – especially if you’re working with data that has hundreds or thousands of variables. By reducing the complexity of your data, you’re able to focus on the important features and gain better insights.

Let's look at 3 common dimensionality reduction algorithms:

Principal Component Analysis (PCA) performs a linear transformation on the data so that most of the variance in your dataset is captured by the first few principal components. This could be useful for developing condition indicators for machine health monitoring.
Factor Analysis identifies underlying correlations between variables in your dataset. It provides a representation of unobserved latent, or common, factors. Factor analysis is sometimes used to explain stock price variation.

Nonnegative matrix factorization is used when model terms must represent nonnegative quantities, such as physical quantities. If you need to compare a lot of text on webpages or documents, this would be a good method to start with as text is either not present, or occurs a positive number of times.

In this video, we took a closer look at hard and soft clustering algorithms, and we also showed why you’d want to use unsupervised machine learning to reduce the number of features in your dataset.

As for your next steps:

Unsupervised learning might be your end goal. If you’re just looking to segment data, a clustering algorithm is an appropriate choice.

On the other hand, you might want to use unsupervised learning as a dimensionality reduction step for supervised learning. In our next video we’ll take a closer look at supervised learning.

For now, that wraps up this video. Don’t forget to check out the description below for more resources and links.

Related Resources

Related Products

Statistics and Machine Learning Toolbox

Featured Product

Statistics and Machine Learning Toolbox

Up Next:

Learn how to use supervised machine learning to train a model to map inputs to outputs and predict the response for new inputs. — Supervised Machine Learning

View full series (4 Videos)

Related Videos:

This session explores the fundamentals of machine learning using MATLAB . Rory reviews typical workflows for both supervised (classification and regression) and unsupervised learning, through examples. — Machine Learning for Predictive Modelling (Highlights)

This session explores the fundamentals of machine learning using MATLAB . Rory reviews typical workflows for both supervised (classification and regression) and unsupervised learning, through examples. — Machine Learning for Predictive Modelling

Machine Learning may seem difficult to understand and even harder to use but in practice, incorporating machine learning in your workflow can be as easy as a couple of clicks. — The Basics | Machine Learning Made Easy

In this webinar you will learn how to get started using machine learning tools to detect patterns and build predictive models from your datasets. In this session, you will learn about several machine learning techniques available in MATLAB and how to — Machine Learning with MATLAB

Learn how machine learning tools in MATLAB can be used to solve regression, clustering, and classification problems. — Machine Learning with MATLAB Overview

View more related videos