I have a main folder on a network containing a lot of subfolders(~1000), each subfolder has ~1000 DICOM files as well. My code needs to find a string in the DICOM header fields. All the files for each subfolder will have the same field so I only need to compare a file of each subfolder...but the problem I find is that for each subfolder I have to user the dir command and that is time consuming.
My code is:
all_folders=dir(path_browse); %struct containing every folder
no_folders=length(all_folders)-2; %number of folders, excluding '.' and '..'
for i=1:no_folders
name_folder=all_folders(i+2).name; %subfolder to find match
aux_dir=dir(name_folder); %files in subfolder
cd(name_folder) %moves to subfolder
test_file=dicominfo(aux_dir(3,1).name); %DICOM header from first file in the folder
search_field(i)=strcmp(lower(test_file.field),field_query); %compare fields
cd(path_browse) %back to main folder
end
Then I would just need to find the 1s in search_field. Is there any option to open a file without using dir or ls? The code works but I want it to be more efficient.
Regards,
Sergio

댓글 수: 1

Stephen23
Stephen23 2018년 2월 15일
편집: Stephen23 2018년 2월 15일
"I have to user the dir command and that is time consuming."
How do you know that dir is the bottleneck? I can see two cd calls in that code: cd makes debugging harder and is slower than using relative/absolute filepaths.
"Is there any option to open a file without using dir or ls"
It is not required to use dir or ls before opening a file: it is also possible to generate filenames from some sequence. Which method to use depends on those filenames, and how much you know about them. Read the MATLAB documentation to know more:
"The code works but I want it to be more efficient."
Then get rid of cd by using absolute/relative paths, and run the profiler so that you can show us which lines are taking the most time.

댓글을 달려면 로그인하십시오.

 채택된 답변

Jan
Jan 2018년 2월 15일

0 개 추천

Do you have any evidence that dir is the time consuming command? This is not likely, but it could happen if you work on a network drive which is connected over a slow connection. Even then dir is not the problem, but the connection.
It is not documented, that '.' and '..' are the first 2 replies of dir. So better remove these special names explicitly.
% UNTESTED CODE!
all_folders = dir(path_browse);
all_folders(ismember({all_folders.name}, {'.', '..'})) = []; % exclude '.' and '..'
no_folders = numel(all_folders);
search_field = false(1, no_folders); % Pre-allocate!!!
for k = 1:no_folders
name_folder = fullfile(path_browse, all_folders(k).name); % subfolder to find match
aux_dir = dir(name_folder); % files in subfolder
aux_dir(ismember({aux_dir.name}, {'.', '..'})) = [];
test_file = dicominfo(fullfile(name_folder, aux_dir(1).name));
search_field(k) = strcmpi(test_file.field, field_query); %compare fields
end
This is the method to use absolute paths instead of hopping through the disk by cd().
strcmpi(a,b) is faster and nicer than strcmp(lower(a), b).
I assume, that this is not much faster than your version, because the most time is spent in dicominfo. But the code is safer.

댓글 수: 1

Thanks for the answer, it seems to be the aux_dir command as for each loop the times are:
Elapsed time is 42.927361 seconds.
Elapsed time is 44.147963 seconds.
Elapsed time is 44.151739 seconds.
Elapsed time is 44.198647 seconds.
Elapsed time is 44.198661 seconds.
So it must be a problem with the connection. I've realized I can index the folders (folder name and its string) so I just need to run this code once to create a database.
Thanks All,

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 File Operations에 대해 자세히 알아보기

태그

질문:

2018년 2월 15일

댓글:

2018년 2월 16일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by