Test existence of files with EXIST

조회 수: 42(최근 30일)
Jan
Jan 2012년 11월 4일
Actually the command exist(FileName, 'file') seems to be sufficient to check the existence of a file. Therefore I used this code to check, if the input of a function is an existing file (thanks to David who has found the bug):
function Hash = DataHash(Data)
...
if exist(Data, 'file') ~= 2
error('File not found: %s.', Data);
end
The help text of exist explains, when the value 2 is replied:
2 if A is an M-file on MATLAB's search path. It also returns 2 when A is
the full pathname to a file or when A is the name of an ordinary file on
MATLAB's search path
But "when A is the full pathname to a file" does not match, when A is a MEX-, MDL- or P-file, because in these cases 3, 4 or 6 is replied respectively. So let's try to improve the check:
if ~any(exist(Data, 'file') == [2, 3, 4, 6])
error('File not found: %s.', Data);
end
But even then, exist() is smarter then expected:
File1 = fullfile(matlabroot, '\toolbox\matlab\graph2d\plot')
File2 = fullfile(matlabroot, '\toolbox\matlab\graph2d\plot.m')
File3 = fullfile(matlabroot, '\toolbox\signal\signal\@dspdata\plot')
File4 = fullfile(matlabroot, '\toolbox\signal\signal\@dspdata\plot.m')
exist(File1, 'file') % 0 !
exist(File2, 'file') % 2
exist(File3, 'file') % 2 !
exist(File4, 'file') % 2
I guess that File1 is not recognized, because plot is a built-in function, while @dspdata\plot (File3) is not a built-in function. But File3 is not an existing file:
fopen(File1, 'r') % -1
fopen(File2, 'r') % 3
fopen(File3, 'r') % -1 !! inspite of: exist(File3, 'file') ~= 0
fopen(File4, 'r') % 4
fclose('all')
So how can we check the existence of a file in a simple and reliable way?
function Ex = FileExist(FileName)
FID = fopen(FileName, 'r');
if FID == -1
Ex = false;
else
Ex = true;
fclose(FID);
end
But there are still exceptions, because even fopen() is smart also:
cd(tempdir);
fopen('plot.m', 'r') % 3, file is *found*!
Here fopen() searches in all folders of the Matlab PATH, but actually it should be searched in the current folder only. This has the side-effect, that fopen(name, 'r') is relatively slow. Another idea:
cd(tempdir);
fopen('plot.m', 'r+') % -1, file is not found
This is faster than the 'r' mode, especially if folders of the PATH are stored on network drives. And requesting write access does restrict the search to the local folder only. But this fails, if the current user does not have write privileges to the file.
The next approach:
function Ex = FileExist(FileName)
dirFile = dir(FileName);
if length(dirFile) == 1
Ex = ~(dirFile.isdir);
else
Ex = false;
end
I could not find a file, where this test fails. It is very slow, if FileName is a folder on a network drive which contain very much files. But this is a rare case such that I prefer this test.
Finally a C-Mex using either GetFileAttributes under Windows or _open or _wopen under Linux/MacOS is faster: 10% for existing files, 90% for missing files. But the handling of the unicode strings is not trivial: 2 bytes per wchar under Windows, 4 bytes per wchar under Linux and MacOS, but under Linux wchar's are not used in common, but utf-8 encoded 1 byte per char strings. See Answers: Matlab string to wchar under Linux. I'm going to publish the Mex functions in the FEX, also a DirExist(), because exist(name, 'dir') has similar problems.
  • Did you consider such effects caused by the smartness of exist() in your programs?
  • Did a user of your programs run into troubles due to weak tests of file existence, e.g. when the resulting error messages are misleading?
  • Do your or your programs profit or suffer from the smartness of exist() and fopen()?
  • Do you think the behavior of these function is explained clearly enough in the help and doc text?
  • Do you want standard jobs solved reliably by simple commands in Matlab?
NOTE: Usage of the recursive font: I mean that smart is not smart.

채택된 답변

Malcolm Lidierth
Malcolm Lidierth 2012년 11월 4일
편집: Malcolm Lidierth 2012년 11월 4일
Easy with Java:
File1 = fullfile(matlabroot,'toolbox','matlab','graph2d','plot');
File2 = fullfile(matlabroot, 'toolbox','matlab','graph2d','plot.m');
File3 = fullfile(matlabroot, 'toolbox','signal','signal','@dspdata','plot');
File4 = fullfile(matlabroot, 'toolbox','signal','signal','@dspdata','plot.m');
file=java.io.File(File1);
file.exists()
file=java.io.File(File2);
file.exists()
file=java.io.File(File3);
file.exists()
file=java.io.File(File4);
file.exists()
ans =
0
ans =
1
ans =
0
ans =
1
  댓글 수: 5
Jan
Jan 2012년 11월 5일
@Malcolm: Fine, now I understand you hint "File.isFile()". Timings now to test existence of 981 files, 10 repetitions, existing / not existing files:
  • File=java.io.file(Name); Ex=File.isFile(); 0.90 / 0.80sec
  • Ex = (length(dirFile) == 1) && ~(dirFile.isdir); 0.70 / 0.60 sec
  • C-Mex, 0.29 / 0.21 sec sec
My conclusion concerning speed: These three methods are equivalent, because usual applications do not test millions of files. So we have good workarounds for the weak EXIST. Anyhow, I'm still disappointed by the built-in EXIST, because it is too over-featured to fulfill the simple test of the existence of a file.

댓글을 달려면 로그인하십시오.

추가 답변(1개)

Daniel Shub
Daniel Shub 2012년 11월 5일
I am not sure you are using EXIST how it was intended to be used. The H1 line is: %EXIST Check if variables or functions are defined. The documentation says little about checking if files exist. I agree that the argument names and output values are confusing. I think, however, that EXIST should not be used for checking if a file exists. Determining if a function exists seems harder than determining if a file exists, therefore I wouldn't expect it to compete in terms of speed.
  댓글 수: 3
Jan
Jan 2012년 11월 6일
I do not believe that the level of hugeness can be measured. Any unexpected behavior can have severe effects.
A user of DataHash got problems, because the check for existence rejected P-files. Without the chance to modify the code, e.g. when DataHash would be P-coded, the user would need tedious workarounds like renaming the file before calculating the hash. In the real world there can be even files like "D:\MFiles\file.m.p.mex", which should not confuse the detection of the file existence also. The reliability of a function must be proved using non-standard input, because "reliable for standard input" is a very weak label.
I assume, the smartness of fopen() is more dangerous: It opens a file anywhere in the path, when the file name is relative. Lukily this does not concern opening the file with write-access. And again the workaround is a standard good programming practize at all: never work with relative paths, but always use fully qualified path names - therefore I spend so much time in GetFullPath.
So perhaps all I want to say is:
Do not use exist(Name, 'file') with relative paths!!!

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by