MATLAB Answers

Reading data from Amazon S3 on Matlab Parallel Cloud Worker

조회 수: 5(최근 30일)
Jhon Wine
Jhon Wine 2018년 1월 24일
댓글: Jhon Wine 2018년 7월 5일
Hi, I'm tying to process a big dataset that is stored on Amazon S3. My code architecture is as following:
Matlab client calls Matlab Parallel Cloud (my default cluster is Parallel Cloud, 16 workers):
r = zeros(100,1);
readTimes = r;
parfor i=1:100
[ri,readTimesi] = myProcess(i);
r(i) = ri;
readTimes(i) = readTimesi;
end
fprintf('Mean Read Time %.1f sec\n',mean(readTimes));
Each worker access Amazon S3 independently to retrieve data for processing using dataStore.
function [r,readTime] = myProcess(i)
%Set S3 Credentials
setenv('AWS_ACCESS_KEY_ID', 'ID');
setenv('AWS_SECRET_ACCESS_KEY','Key');
setenv('AWS_REGION', 'us-west-2');
%Load Data
fp= ['s3://mybucket/data/file' num2str(i) '.data'];
t=tic;
ds=fileDatastore(fp,'ReadFcn',@AWSRead);
data=ds.read;
readTime=toc(t);
%Process
%...
r = mean(data);
end
function data= AWSRead(fileName)
fid = fopen(fileName);
data= fread(fid,inf,'short');
fclose(fid);
end
I'm trying to trouble shoot why my Mean Read Time is slow, and how can I speed it up.
I noticed that Mean Read Time is much faster if I am using my local machine as the parallel worker pool parpool('local') rather then Matlab Parallel Cloud. I read in Matlab's documentation that Matlab Parallel Cloud runs on EC2 which should integrate with S3 automatically to have very good data transfer speeds if both EC2 and S3 are on the same site.
My questions are: Which site should I use to have maximal data transfer performances? Where is Matlab Parallel Cloud hosted? Or how can I speed my data transfer performances (except running it locally, as I need many more workers)?
I did not use Matlab Drive to host my files, as they are too big and will not fit drive's 5GB maximum allocation.

채택된 답변

Jhon Wine
Jhon Wine 2018년 1월 26일
After looking to this matter further, I think Matlab Parallel Cloud runs off US East Virginia region - which is a different region then where I store my data. Upon switching storage location the problem was solved
  댓글 수: 2
Jhon Wine
Jhon Wine 2018년 7월 5일
Hi, thank you for the comment. It was a typo. Instead of 'spectralFilePath' you should write 'fp'. I corrected my code above

댓글을 달려면 로그인하십시오.

추가 답변(0개)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by