Parrellel server dir() for files local to the server

Christopher McCausland
Christopher McCausland . 2022년 4월 25일
Raymond Norris . 2022년 5월 10일
I wish to move computation from a local machine with .m files stored locally to a parrellel compute server with the .m files stored on the server.
Processing the files sequentially on my local machine this usally looks something like this.
Files = dir('C:\my_data'); % Retrieve all patients .m files names
for i=1:length(Files) %
load(strcat('C:\my_data,Files(i).name')) % Load each file in turn
% Put functions to run on data
I now want to move this compute to a parrellel server, I have a PS liscense and the server is validated, I have also uploaded the files to the server.
However I cannot figure out how to call the dir() command so that it queries the files on the server (as they are about 1Tb total in size - so too large to transfere to the remote server eachtime). I had though it would look something like this;
Files = dir('~/home/user/Database/Physionet/training/'); % Rather than query locally, querey the data on the server
However the directory isn't found correctly, Can anyone explain to me how to point to this data on the parrellel compute server? Or if anyone has suggestions on better ways to do this please let me know!
Raymond Norris
Raymond Norris 2022년 4월 25일
For starters, you don't want to hard code files/paths in your code. Your code should be functions so that you can pass in root folder locations to where you want to read/write. I'll show you an example, but first a couple of questions.
How do you submit your code to the cluster? Are you using parpool or batch. For example
c = parcluster('cluster');
pool = c.parpool(16);
Files = dir('~/home/user/Database/Physionet/training/');
parfor i=1:length(Files)
% Had a typo in your line. Also, will want to make sure Files(i).name
% is always a MAT-file (think at least about . and ..)
c = parcluster('cluster');
job = c.batch(@mycode,...,'Pool',16);
I'm guess you want the former, but you probably gonna need the latter. It also depends on what you're going to do with the data after the parfor finishes (or while it's running). I have a thought, but you might need to update to R2022a.
Raymond Norris
Raymond Norris 2022년 5월 10일
Keep in mind that if you place the additional folder names in the profile, they will be used for each job you submit to the cluster. Adding it to the call to batch explicitly sets it for that job. In the case of adding paths, there's no overhead to speak of. However, wait until you need to debug a job where you can't understand why a job fails, only to discover that you included another path (listed in the profile) that was shadowing your other function. Listing the additional paths in the call to batch doesn't solve this issue, but it hopefully at least puts it in your face that you are adding /home/cmcausland/work/... to your job.

