List S3 Files and Subdirectories

조회 수: 19 (최근 30일)
Nick
Nick 2023년 5월 19일
댓글: Stephane 2024년 3월 12일
I am trying to extract only a list of filenames of a certain extension type (*.nc) from an S3 bucket which includes all subdirectories. All of the fileDatastore examples that I have found seem to load or read the files which is not of interest to me and is slow for very large datasets when all I need is the filename.
If it helps time efficiency, I also know the specific subdirectories of interest within the S3 bucket that I am interested in, although sorting through all of the subdirectories will work too if necessary. If S3 bucket subdirectories are indexable, I would also be interested in implementing some sorting with this information as well.
  댓글 수: 2
Nick
Nick 2023년 5월 21일
I was never able to get filedatastore or webread to work how I wanted so I ended up using system commands in Matlab after installing AWS CLI locally. Something like:
[status,cmdout] = system(['aws s3 ls s3://[your s3 bucket]']
Stephane
Stephane 2024년 3월 12일
You may want to use matlab.io.datastore.FileSet (https://au.mathworks.com/help/matlab/ref/matlab.io.datastore.fileset.html). This method has a 'IncludeSubfolders',true as argument.

댓글을 달려면 로그인하십시오.

답변 (1개)

Askic V
Askic V 2023년 5월 19일
편집: Askic V 2023년 5월 19일
Well, this small code snippet create a file datastore that include all csv files starting with 'd' or 'D' from the current folder and all subfolders.
Not sure how large your data sets are, but you can give it a try:
% Set the folder path where your files and subfolders are located
folderPath = 'C:\Users\AsV\Documents\Matlab';
% Define the file pattern to match specific file names
filePattern = fullfile(folderPath, '**', 'd*.csv'); % ** includes subfolders
% Create a file data store using the file pattern
fDatastore = fileDatastore(filePattern,'ReadFcn',@readtable,'ReadMode','file');
% Display the files in the data store
disp('Files in the data store:');
disp(fDatastore.Files);
In my understanding fileDatasore is specifically created in Matlab to efficiently handle large datasets.
Not really sure how to make it even more efficient in Matlab. Perhaps some other programming language would be a better match.

카테고리

Help CenterFile Exchange에서 Big Data Processing에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by