tall형 배열 및 `mapreduce`

Spark™ 및 Hadoop^® 클러스터와 병렬 풀에서 MATLAB^® tall형 배열과 데이터저장소 또는 mapreduce를 사용하여 빅데이터 세트를 병렬로 분석합니다.

Parallel Computing Toolbox™를 사용하면 데스크탑에서 병렬 풀을 사용하여 tall형 배열 표현식을 병렬로 실행할 수 있습니다. tall형 배열을 사용하면 컴퓨터의 메모리에 담을 수 없는 빅데이터 애플리케이션을 실행할 수 있습니다. 또한 Parallel Computing Toolbox를 사용하면 MATLAB Parallel Server™ 클러스터에서 실행 중인 병렬 풀에 연결하여 tall형 배열 처리를 확장할 수도 있습니다. 또는 MATLAB Parallel Server를 구동하는 Spark 지원 Hadoop 클러스터를 사용할 수 있습니다. 자세한 내용은 Big Data Workflow Using Tall Arrays and Datastores 항목을 참조하십시오.

함수

모두 확장

주요 함수

`tall`	Create tall array
`datastore`	대규모의 데이터 모음을 저장할 데이터저장소 만들기
`mapreduce`	메모리에 담을 수 없을 정도로 큰 데이터 세트를 분석하기 위한 프로그래밍 기법
`mapreducer`	Define parallel execution environment for mapreduce and tall arrays
`partition`	데이터저장소 파티셔닝
`numpartitions`	데이터저장소 파티션 개수

클래스

모두 확장

주요 클래스

`parallel.Pool`	워커의 병렬 풀
`parallel.cluster.Hadoop`	Hadoop cluster for mapreducer, mapreduce and tall arrays
`parallel.cluster.Spark`	Spark cluster for mapreducer, mapreduce and tall arrays (R2022b 이후)

예제 및 방법

Big Data Workflow Using Tall Arrays and Datastores
Learn about typical workflows using tall arrays to analyze big data sets.
Use Tall Arrays on a Parallel Pool
Discover tall arrays in Parallel Computing Toolbox and MATLAB Parallel Server.
Process Big Data in the Cloud
This example shows how to access a large data set in the cloud and process it in a cloud cluster using MATLAB® capabilities for big data.
Use Parallel Computing to Optimize Big Data Set for Analysis
This example shows how to optimize data preprocessing for analysis using parallel computing. (R2024a 이후)
Use Tall Arrays on a Spark Cluster
Create and use tall tables on Spark clusters without changing your MATLAB code.
Run mapreduce on a Parallel Pool
Try mapreduce for advanced analysis of big data using Parallel Computing Toolbox.
Run mapreduce on a Hadoop Cluster
Learn about mapreduce for advanced big data analysis on a Hadoop cluster.
Partition a Datastore in Parallel
Use partition to split your datastore into smaller parts.