MATLAB Answers

Failed to read xml error when using xmlread

조회 수: 32(최근 30일)
Sarah Immanuel
Sarah Immanuel 13 Aug 2020
댓글: Walter Roberson 17 Aug 2020
I am trying to read several xml files in a loop using xmlread. An error 'Failed to read xml file' occurs. On examining the xml file I noticed that in the first line that says <?xml version="1.0" encoding="ISO8859-1"?>, if I change ISO8859-1 to ISO-8859-1, xmlread works. Is there an automated way to corect this or any other way to read the files in bulk without having to manually correct the header in each file?

  댓글 수: 0

댓글을 달려면 로그인하십시오.

답변(3개)

dpb
dpb 13 Aug 2020
...
try
DOMnode=xmlread(filename(i)); % try to read the file
catch ME % catch the failure; fixup
fidi=fopen(filename(i),'r'); % open the file
fido=fopen('tmp','r'); % open a scratch temp file
while ~feof(fidi)
l=fgetl(fidi);
if ~empty(strfind(l,'ISO8859'))
l=strrep(l,'ISO8859','ISO-8859'); % fixup the record
end
fprintf(fid0,l) % output to temp file...
end
fidi=fclose(fidi);
fido=fclose(fido);
copyfile('tmp',filename(i)) % and copy over the original
end
DOMnode=xmlread(filename(i)); % and try again with corrected file...

  댓글 수: 2

Sarah Immanuel
Sarah Immanuel 14 Aug 2020
Thanks a lot for your help.
Just a clarification: in this command fprintf(fid0,l) only the content of 'l' will be writtten to the tmp file? How do we get back all the other remaining content of the original file please?
Walter Roberson
Walter Roberson 14 Aug 2020
It is within the loop, so eventually the entire content is written.
However, the
fprintf(fid0, l)
should be
fwrite(fid0, l)

댓글을 달려면 로그인하십시오.


Walter Roberson
Walter Roberson 14 Aug 2020
편집: Walter Roberson 14 Aug 2020
filename = 'InputFileName.xml';
S = fileread(filename);
SS = regexprep(S, 'encoding="ISO8859-', 'encoding="ISO-8859-', 'once');
if strcmp(S, SS)
remove = false; %optimization, do not write new file if not needed
tname = filename;
else
tname = tempname();
fid = fopen(tname, 'w');
fwrite(fid, tname);
fclose(fid);
remove = true;
end
DOMnode = xmlread(tname);
if remove; delete(tname); end
This code is deliberate in narrowing down to encoding= and only doing the first instance, so as to avoid accidentally changing any ISO8859 that might happen to be part of the data.

  댓글 수: 3

Sarah Immanuel
Sarah Immanuel 14 Aug 2020
Hi Walter, thanks a lot for your response.
I tried the above codes. To clarify: The replaced content is still within 'SS' and assuming strcmp(S,SS) is false, the 'tname = filename' is executed with filename still refering to the original (faulty) file with ISO8859 isnt it? How are the contents actually replaced within the faulty xml file please? Can you clarify this for me.
Walter Roberson
Walter Roberson 14 Aug 2020
tname is set to filename when strcmp is true, not when it is false.
The comparison is true when the two strings S and SS are exactly the same, which would happen if regexprep did not make a change. Such as for a file that already has the right pattern, or which has a different encoding. In this situation the original file name is used directly for the later xmlread.
When the strcmp is false that means the original and regexprep versions are different, which means that the regexprep worked to make a new string. In that situation, a temporary file name is fetched, and the file is opened and the new content is written, and the temporary file is closed. It is this temporary file whose name is passed to xmlread. After the reading the temporary file is deleted
Walter Roberson
Walter Roberson 14 Aug 2020
See also https://www.mathworks.com/matlabcentral/answers/101632-how-can-i-use-a-function-such-as-xmlread-to-parse-xml-data-from-a-string-instead-of-from-a-file-i#comment_972999 which shows a Java related method. To use it you would do the fileread(), regexprep(), and then java.io.StringBufferInputStream() the result, and xmlread() what you get from that.

댓글을 달려면 로그인하십시오.


Sarah Immanuel
Sarah Immanuel 14 Aug 2020
Thanks a lot Walter, yes that makes sense. One last question, hope it is the last!. I am using Matlab2020a. The command tempfile() doesnt seem to work?

  댓글 수: 3

Walter Roberson
Walter Roberson 14 Aug 2020
Sorry, should be tempname() instead of tempfile()
Sarah Immanuel
Sarah Immanuel 17 Aug 2020
Hi Walter, thanks - the tempname() creates a tempfile but is not handled by the xmlread. It shows an error again. Can you help?
Walter Roberson
Walter Roberson 17 Aug 2020
maybe
tname = [tempfile() '.xml'];

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by