partition

Partition a datastore

Syntax

subds = partition(ds,n,index)

subds = partition(ds,'Files',index)

subds = partition(ds,'Files',filename)

Description

subds = partition(ds,n,index) partitions datastore ds into the number of parts specified by n and returns the partition corresponding to the index index.

example

subds = partition(ds,'Files',index) partitions the datastore by files and returns the partition corresponding to the file of index index in the Files property.

example

subds = partition(ds,'Files',filename) partitions the datastore by files and returns the partition corresponding to the file specified by filename.

Examples

collapse all

Partition Datastore into Specific Number of Parts

Open Live Script

Create a datastore for a large collection of files. For this example, use ten copies of the sample file airlinesmall.csv. To handle missing fields in the tabular data, specify the name-value pairs TreatAsMissing and MissingValue.

files = repmat({'airlinesmall.csv'},1,10);
ds = tabularTextDatastore(files,...
                 'TreatAsMissing','NA','MissingValue',0);

Partition the datastore into three parts and return the first partition. The partition function returns approximately the first third of the data from the datastore ds.

subds = partition(ds,3,1);

The Files property of the datastore contains a list of files included in the datastore. Check the number of files in the Files property of the datastore ds and the partitioned datastore subds. The datastore ds contains ten files and the partition subds contains the first four files.

length(ds.Files)

ans = 
10

length(subds.Files)

ans = 
4

Partition Datastore into Default Number of Parts

Open Live Script

Create a datastore from the sample file, mapredout.mat, which is the output file of the mapreduce function.

ds = datastore('mapredout.mat');

Get the default number of partitions for ds.

n = numpartitions(ds);

Partition the datastore into the default number of partitions and return the datastore corresponding to the first partition.

subds = partition(ds,n,1);

Read the data in subds.

while hasdata(subds)
	data = read(subds);
end

Partition Datastore by Files

Open Script

Create a datastore that contains three image files.

ds = imageDatastore({'street1.jpg','peppers.png','corn.tif'})

ds = 

  ImageDatastore with properties:

       Files: {
              ' ...\matlab\toolbox\matlab\demos\street1.jpg';
              ' ...\matlab\toolbox\matlab\imagesci\peppers.png';
              ' ...\matlab\toolbox\matlab\imagesci\corn.tif'
              }
    ReadSize: 1
      Labels: {}
     ReadFcn: @readDatastoreImage

Partition the datastore by files and return the part corresponding to the second file.

subds = partition(ds,'Files',2)

subds = 

  ImageDatastore with properties:

       Files: {
              ' ...\matlab\toolbox\matlab\imagesci\peppers.png'
              }
    ReadSize: 1
      Labels: {}
     ReadFcn: @readDatastoreImage

subds contains one file.

Partition Data in Parallel

Create a datastore from the sample file, mapredout.mat, which is the output file of the mapreduce function.

ds = datastore('mapredout.mat');

Partition the datastore into three parts on three workers in a parallel pool.

numWorkers = 3;
p = parpool('local',numWorkers);
n = numpartitions(ds,p);

parfor ii=1:n
    subds = partition(ds,n,ii);
    while hasdata(subds)
        data = read(subds);
    end
end

Compare Data Granularities

Open Live Script

Compare a coarse-grained partition with a fine-grained subset.

Read all the frames in the video file xylophone.mp4 and construct an ArrayDatastore object to iterate over it. The resulting object has 141 frames.

v = VideoReader("xylophone.mp4");
allFrames = read(v);
arrds = arrayDatastore(allFrames,IterationDimension=4,OutputType="cell",ReadSize=4);

To extract a specific set of adjacent frames, create four coarse-grained partitions of arrds. Extract the second partition, which has 35 frames.

partds = partition(arrds,4,2);
imshow(imtile(partds.readall()))

Figure contains an axes object. The hidden axes object contains an object of type image.

Extract six nonadjacent frames from arrds at specified indices using a fine-grained subset.

subds = subset(arrds,[67 79 82 69 89 33]);
imshow(imtile(subds.readall()))

Figure contains an axes object. The hidden axes object contains an object of type image.

Input Arguments

collapse all

`ds` — Input datastore
datastore

Input datastore. You can use the datastore function to create a datastore object from your data.

`n` — Number of partitions
positive integer

Number of partitions, specified as a positive integer.

If you specify a number of partitions that is not a numerical factor of the number of files in the datastore, partition will place each of the remaining observations in the existing partitions, starting with the first partition.

The number of existing partitions that contain an additional observation is equal to the remainder obtained when dividing the number of files in the datastore by the number of partitions. For example, if your datastore object contains 23 files that you want to partition into 3 parts, the first two partitions that partition creates will contain 8 files, and the last partition will contain 7 files.

Example: 3

Data Types: double

`index` — Index
positive integer

Index, specified as a positive integer.

Example: 1

Data Types: double

`filename` — file name
character vector | string scalar

File name, specified as a character vector or string scalar.

The value of filename must match exactly the file name contained in the Files property of the datastore. To ensure that the file names match exactly, specify filename using ds.Files{N} where N is the index of the file in the Files property. For example, ds.Files{3} specifies the third file in the datastore ds.

Example: ds.Files{3}

Example: 'file1.csv'

Example: '../dir/data/file1.csv'

Example: 'hdfs://myserver:7867/data/file1.txt'

Data Types: char

Output Arguments

collapse all

`subds` — Output datastore
datastore

Output datastore. The output datastore is of the same type as the input datastore ds.

Extended Capabilities

expand all

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

Usage notes and limitations:

In a thread-based environment, you can use partition only with the following datastores:
- ImageDatastore objects
- CombinedDatastore, SequentialDatastore, or TransformedDatastore objects you create from ImageDatastore objects by using combine or transform
You can use partition with other datastores if you have Parallel Computing Toolbox™. To do so, run the function using a process-backed parallel pool instead of using backgroundPool or ThreadPool (use either ProcessPool or ClusterPool).

For more information, see Run MATLAB Functions in Thread-Based Environment.

Version History

Introduced in R2015a

partition

Syntax

Description

Examples

Partition Datastore into Specific Number of Parts

Partition Datastore into Default Number of Parts

Partition Datastore by Files

Partition Data in Parallel

Compare Data Granularities

Input Arguments

`ds` — Input datastore
datastore

`n` — Number of partitions
positive integer

`index` — Index
positive integer

`filename` — file name
character vector | string scalar

Output Arguments

`subds` — Output datastore
datastore

Extended Capabilities

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

Version History

See Also

Topics

partition

Syntax

Description

Examples

Partition Datastore into Specific Number of Parts

Partition Datastore into Default Number of Parts

Partition Datastore by Files

Partition Data in Parallel

Compare Data Granularities

Input Arguments

ds — Input datastore datastore

n — Number of partitions positive integer

index — Index positive integer

filename — file name character vector | string scalar

Output Arguments

subds — Output datastore datastore

Extended Capabilities

Thread-Based Environment Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

See Also

Topics

`ds` — Input datastore
datastore

`n` — Number of partitions
positive integer

`index` — Index
positive integer

`filename` — file name
character vector | string scalar

`subds` — Output datastore
datastore

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.