downloading files from a website with conditions on names of files

조회 수: 2(최근 30일)
alpedhuez
alpedhuez 2022년 2월 9일
댓글: Walter Roberson 2022년 3월 12일
Question: I work on a website https://www.somecompany.com/xml/.
This directory has files whose filename starts with a letter "A" and "B".
The filenames in the directory are like:
A_20080403.xml
A_20080403_1.xml
A_20080403_2.xml
A_20080404_1.xml
B_20080403_1.xml
That is
  • Filenames are of the form "Capital letters"+"_"+"date"+"_"+"numbers".xml or "Capital letters"+"_"+"date".xml
  • There are dates that do not have corresponding files
I would like to download all the files whose filenames start with a letter "A".
What has been tried:
(a) I was able to save a single file using "websave" command.
for k = 20080401:20100101
filename = sprintf('A%d.xml', k);
url = ['https://www.somecompany.com/xml/' filename];
outfilename = websave(filename,url);
end
Problems with the above code: The above code does not work because
  • This code assumes the filename of the form "Capital letters"+"date".xml and not the filenames that explained above
  • This code returns the error for a date when there are no corresponding files and stops then
How shall one improve the above code?

답변(1개)

Walter Roberson
Walter Roberson 2022년 2월 9일
It would be more robust / faster if the site provided a way to list the available files, instead of having to do trial and error.
baseurl = "https://www.somecompany.com/xml/";
datelimits = datetime({'20080401', '20100101'}, 'InputFormat', 'yyyyMMdd');
subfile_limit = 5; %no more than _5 -- adjust as appropriate
subfile_modifier = ["", "_" + (1:subfile_limit)] + ".xml";
for Day = datelimits(1):datelimits(2)
daystr = string(Day);
for Sub = subfile_modifier
filename = "A_" + daystr + Sub;
url = baseurl + filename;
try
outfilename = websave(filename,url);
fprintf('fetched %s\n', filename);
catch
break; %skip remaining subfiles for this date upon first failure
end
end
end
  댓글 수: 2
Walter Roberson
Walter Roberson 2022년 3월 12일
datelimits = datetime({'20080401', '20100101'}, 'InputFormat', 'yyyyMMdd', 'Format', 'yyyyMMdd');

댓글을 달려면 로그인하십시오.

태그

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by