Hello dear masterminds!
I have a working code for collecting strings from several txt.files (appr. 53000 files) in a network folder. I have one problem though. It is quite time consuming (17-19 seconds) using the dir command to identify the filenames. Is there a better way of doing this? I will attach a part of the code and some of the filenames.
Thank you!
addpath \\winfs\data\prod\maprod\arkivering\TAF;
DS = StartDTstr;
DE = StopDTstr;
TI = duration(timein,'InputFormat','hh:mm');
TO = duration(timeout,'InputFormat','hh:mm');
P = '\\winfs\data\prod\maprod\arkivering\TAF';
S = dir(fullfile(P,'*.txt'));
D = regexp({S.name},'\d{10}','match','once');
T = datetime(D, 'InputFormat','yyMMddHHmm');
X = isbetween(T,DS,DE);
tod = timeofday(T);
Y = tod>=TI & tod<=TO;
Z = S(X&Y);
Z.name;
numfiles = length(Z(:,1)); %create empty cell
for h=1:numfiles
filename=Z(h).name;
fileID=fopen(filename); %open filename to create fileID
Data{h}=textscan(fileID,'%s','delimiter','=','headerlines',1);
fclose(fileID); %close fileID
end
rmpath \\winfs\data\prod\maprod\arkivering\TAF;
Utcell = cell(1,length(Data)); %output data of all TAF

댓글 수: 6

Stephen23
Stephen23 2021년 10월 10일
편집: Stephen23 2021년 10월 10일
"Is there a better way of doing this?"
Don't add folders of data files to the MATLAB search path, using absolute/relative filenames is faster.
The MATLAB search path is for folders which contain code, not data.
Linus Dock
Linus Dock 2021년 10월 10일
Hi Stephen!
Ok so I removed the addpath command. I now get the following error message:
"Error using textscan
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in METARochTAF (line 98)
Data{h}=textscan(fileID,'%s','delimiter','=','headerlines',1); %read all characters in fileID"
Any ideas why this happens?
Regards
/Linus
Stephen23
Stephen23 2021년 10월 10일
편집: Stephen23 2021년 10월 10일
"Any ideas why this happens?"
You need to use an absolute/relative filename, for example:
fnm = fullfile(P,Z(h).name);
fileID = fopen(fnm);
This is exactly what Jan's answer shows too.
Linus Dock
Linus Dock 2021년 10월 10일
Thank you Stephen! I see that now. So, now it works without the addpath command!
Great thanks!
I'm still stuck on the 17 seconds runtime for the dir command though which I was hoping to improve. Is it the network and the size of that folder that's the issue then?
Best regards
/Linus
Stephen23
Stephen23 2021년 10월 11일
"Is it the network and the size of that folder that's the issue then?"
If the files are being accessed over a network then that is likely to be a bottleneck.
But only measuring the properties of your storage+network+local machine would answer that question.
Linus Dock
Linus Dock 2021년 10월 11일
Ok thanks!
Really great to have your help!
/Linus

댓글을 달려면 로그인하십시오.

 채택된 답변

Jan
Jan 2021년 10월 10일
편집: Jan 2021년 10월 10일

0 개 추천

addpath \\winfs\data\prod\maprod\arkivering\TAF;
Why do you append this folder to Matlab's path? Are the files stored in this folder or some required Matlab functions? If this folder does not contain Matlab functions, there is no reason to append it to the path. Simply omit the addpath/rmpath commands. See below.
What is the purpose of this:
DS = StartDTstr;
DE = StopDTstr;
TI = duration(timein,'InputFormat','hh:mm');
TO = duration(timeout,'InputFormat','hh:mm');
Are this variables or functions?
What exactly takes 17-19 seconds?
Is Data pre-allocated? If the final size is known, allocate it instead of letting the array grow iteratively:
numfiles = size(Z, 1); % Better than: length(Z(:,1));
Data = cell(1, numfiles);
for h=1:numfiles
filename = fullfile(P, Z(h).name); % Absolute path
fileID = fopen(filename);
Data{h} = textscan(fileID, '%s', 'delimiter', '=', 'headerlines', 1);
fclose(fileID);
end
It is more efficient to use the absolute path name because with fopen('file.txt') Matlab's complete path is searched.
Reading 53'000 files over a potentially slow network connection takes time. How large are the files? Maybe the runtime is limited by the network connection. Then there is no way to accelerate it magically.

댓글 수: 4

Thank you Jan!
I removed the addpath command and got the above mentioned error message (see reply to Stephen).
Yes the files are stored in that folder and it does not contain any matlab functions.
DS, DE, TI, TO are variables based on input from the user via UIfigure. If the filename contains the date and time information that is between these start and stop variables I would like to open and read these files.
The line of code that takes 17-19 seconds is:
S = dir(fullfile(P,'*.txt'));
I have preallocated Data now.
What exactly do you mean by absolute path name? Is this a wrong way of doing it:
numfiles = length(Z(:,1)); %create empty cell
for h=1:numfiles
filename=Z(h).name;
fileID=fopen(filename); %open filename to create fileID
Data{h}=textscan(fileID,'%s','delimiter','=','headerlines',1);
fclose(fileID); %close fileID
end
Could you give me an example?
Yes I understand the limitations of network connection and that it must affect the runtime. I'm just looking for improvements anywhere I can :)
Many thanks!
Linus
Linus Dock
Linus Dock 2021년 10월 10일
This is from the profiler when using Run and Time:
Stephen23
Stephen23 2021년 10월 10일
"Is this a wrong way of doing it"
Yes, that is the wrong way of doing it.
"Could you give me an example?"
Jan's answer already shows you an example of how to use an absolute filename.
Linus Dock
Linus Dock 2021년 10월 10일
I see that now, sorry :)
/Linus

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Search Path에 대해 자세히 알아보기

제품

릴리스

R2018b

태그

질문:

2021년 10월 10일

댓글:

2021년 10월 11일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by