Main Content

Tall Arrays and mapreduce

Analyze big data sets in parallel using MATLAB® tall arrays and datastores or mapreduce on Spark® and Hadoop® clusters, and parallel pools

You can use Parallel Computing Toolbox™ to evaluate tall-array expressions in parallel using a parallel pool on your desktop. Using tall arrays allows you to run big data applications that do not fit in memory on your machine. You can also use Parallel Computing Toolbox to scale up tall-array processing by connecting to a parallel pool running on a MATLAB Parallel Server™ cluster. Alternatively, you can use a Spark enabled Hadoop cluster running MATLAB Parallel Server. For more information, see Big Data Workflow Using Tall Arrays and Datastores.

Functions

expand all

tallCreate tall array
datastoreCreate datastore for large collections of data
mapreduceProgramming technique for analyzing data sets that do not fit in memory
mapreducerDefine parallel execution environment for mapreduce and tall arrays
partitionPartition a datastore
numpartitionsNumber of datastore partitions
parpoolCreate parallel pool on cluster
gcpGet current parallel pool

Classes

expand all

parallel.PoolParallel pool of workers
parallel.cluster.HadoopHadoop cluster for mapreducer, mapreduce and tall arrays
parallel.cluster.SparkSpark cluster for mapreducer, mapreduce and tall arrays

Examples and How To

Concepts