Main Content

matlab.io.datastore.FileSet

File-set for collection of files in datastore

Since R2020a

Description

The matlab.io.datastore.FileSet object helps you process a large collection of files when moving through the files iteratively. Use the FileSet object together with the DsFileReader object to manage and read files from your datastore.

Creation

Description

fs = matlab.io.datastore.FileSet(location) creates a FileSet object for a collection of files based on the specified location.

example

fs = matlab.io.datastore.FileSet(location,Name,Value) specifies the file extension, whether to include subfolders, or sets object properties. You can specify multiple name-value pairs. Enclose names in quotes.

Input Arguments

expand all

Files or folders to include in the FileSet object, specified as a character vector, cell array of character vectors, string array, or a structure. If the files are not in the current folder, then location must be a full or relative path. Files within subfolders of the specified folder are not automatically included in the FileSet object.

Typically for a Hadoop® workflow, when you specify location as a structure, it must contain the fields FileName, Offset, and Size. This requirement enables you to use the location argument directly with the initializeDatastore method of the matlab.io.datastore.HadoopLocationBased class. For an example, see Add Support for Hadoop.

You can use the wildcard character (*) when specifying location. Specifying this character includes all matching files or all files in the matching folders in the file-set object.

If the files are not available locally, then the full path of the files or folders must be a uniform resource locator (URL), such as
hdfs://hostname:portnumber/path_to_file.

Data Types: char | cell | string | struct

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: fs = matlab.io.datastore.FileSet(location,'IncludeSubfolders',true)

Subfolder inclusion flag, specified as a numeric or logical 1 (true) or 0 (false). Specify true to include all files and subfolders within each folder or false to include only the files within each folder.

Example: 'IncludeSubfolders',true

File extensions, specified as a character vector, cell array of character vectors, or string array. You can use the empty quotes '' to represent files without extensions.

If 'FileExtensions' is not specified, then BlockedFileSet automatically includes all file extensions.

Example: 'FileExtensions','.jpg'

Example: 'FileExtensions',{'.txt','.csv'}

Properties

expand all

Alternate file system root paths, specified as a string array or a cell array. Use 'AlternateFileSystemRoots' when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use 'AlternateFileSystemRoots' to associate the root paths.

  • To associate a set of root paths that are equivalent to one another, specify 'AlternateFileSystemRoots' as a string array. For example,

    ["Z:\datasets","/mynetwork/datasets"]

  • To associate multiple sets of root paths that are equivalent for the datastore, specify 'AlternateFileSystemRoots' as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string array or a cell array of character vectors. For example:

    • Specify 'AlternateFileSystemRoots' as a cell array of string arrays.

      {["Z:\datasets", "/mynetwork/datasets"];...
       ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}

    • Alternatively, specify 'AlternateFileSystemRoots' as a cell array of cell array of character vectors.

      {{'Z:\datasets','/mynetwork/datasets'};...
       {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}

The value of 'AlternateFileSystemRoots' must satisfy these conditions:

  • Contains one or more rows, where each row specifies a set of equivalent root paths.

  • Each row specifies multiple root paths and each root path must contain at least two characters.

  • Root paths are unique and are not subfolders of one another.

  • Contains at least one root path entry that points to the location of the files.

For more information, see Set Up Datastore for Processing on Different Machines or Clusters.

Example: ["Z:\datasets","/mynetwork/datasets"]

Data Types: string | cell

This property is read-only.

Number of files in the file-set object, specified as a numeric scalar.

Example: fs.NumFiles

Data Types: double

This property is read-only.

Number of files read from the FileSet object, specified as a numeric scalar.

Example: fs.NumFilesRead

Data Types: double

This property is read-only.

Information about files in the matlab.io.datastore.FileSet object, returned as a matlab.io.datastore.FileInfo object with these properties:

  • Filename — Name of the file in the FileSet object. The name contains the full path of the file.

  • FileSize — Size of the file in number of bytes.

For information about a specific file, specify the file index. For example, fs.FileInfo(2) returns the file name and file size for the second file. If you call fs.FileInfo specifying (:) or without specifying an index, it returns information for all of the files.

Example: fs.FileInfo(2)

Object Functions

hasNextFile Determine if file-set has another file in file-set
nextfile Information on next file or file chunk
hasPreviousFile Determine if a file-set has a previous file
previousfile Information on previous file in file-set
progress Determine how many blocks or files have been read
maxpartitions Maximum number of partitions
partition Partition file-set object
subsetCreate subset of datastore or FileSet
reset Reset the file-set object

Examples

collapse all

Create a file-set and query information for specific files in the file-set.

Create a file-set fs for a collection of files.

folder = {'accidents.mat','airlineResults.mat','census.mat','earth.mat'}
folder = 1x4 cell
    {'accidents.mat'}    {'airlineResults.mat'}    {'census.mat'}    {'earth.mat'}

fs = matlab.io.datastore.FileSet(folder)
fs = 
  FileSet with properties:

                    NumFiles: 4
                NumFilesRead: 0
                    FileInfo: FileInfo for all 4 files
    AlternateFileSystemRoots: {}

Obtain information for specific files using either the nextfile function or by querying the FileInfo property and specifying an index. Obtain information for consecutive files using nextfile. For example, obtain information for the first two files in the set.

file1 = nextfile(fs)
file1 = 
  1x1 FileInfo
                                                        Filename                                                         FileSize
    _________________________________________________________________________________________________________________    ________

    "/mathworks/devel/bat/filer/batfs2561-0/Bdoc24b.2679053/build/runnable/matlab/toolbox/matlab/demos/accidents.mat"      7343  


file2 = nextfile(fs)
file2 = 
  1x1 FileInfo
                                      Filename                                       FileSize 
    ____________________________________________________________________________    __________

    "/tmp/Bdoc24b_2679053_31096/tpfbab165a/matlab-ex98758341/airlineResults.mat"    1.5042e+05


Query the FileInfo property to get information about the last file in the set.

lastfile = fs.FileInfo(4)
lastfile = 
  1x1 FileInfo
                                                      Filename                                                       FileSize
    _____________________________________________________________________________________________________________    ________

    "/mathworks/devel/bat/filer/batfs2561-0/Bdoc24b.2679053/build/runnable/matlab/toolbox/matlab/demos/earth.mat"     32522  


Version History

Introduced in R2020a