필터 지우기
필터 지우기

Encoding problem reading data using fread

조회 수: 19 (최근 30일)
Michael Liedlgruber
Michael Liedlgruber 2023년 5월 24일
댓글: Michael Liedlgruber 2023년 9월 6일
Hi,
I'm using the following code to read in data from a file which contains text as well as binary data (European Data Format, to be more specific):
fid = fopen('test.edf', 'r', 'l');
fileType = fread(fid, 1, 'uint8');
id = char(fread(fid, [1 7], 'char'));
fclose(fid);
On my machine (Windows 10, MATLAB R2020a Update 6) this code runs fine and the values returned (i.e. fileType and id) are correct.
However, when this code is run on a different machine (one of our customers; also running Windows 10 but using MATLAB 2020a Update 1) using the same input file, the value of id seems to be read in incorrectly (the encoding used seems to be UTF-16BE. In fact, I get the same incorrect results on my machine if I specify UTF-16BE as the file encoding in the fopen call.
More interestingly, if I open the file on my machine without specifying an encoding and determine the used encoding using
[filename, permission, machineformat, encoding] = fopen(fid);
then the encoding UTF-16BE is returned.
And the default encoding in Windows is the same across the machines compared.
So, to me it seems like MATLAB on my machine detects an incorrect encoding because the file contains the BOM somewhere in the data but nevertheless returns the correct values. On the customers machine, however, it seems like the detected encoding is used, yielding different results.
My question is now: how is it possible that MATLAB obviously detects a wrong encoding but reads in the data correctly on my machine? And why do I get incorrect data if I explicitly specify the incorrect encoding (which is detected by MATLAB)? And why does the customer get different results although the same input file is used and although MATLAB detects the same (incorrect) encoding?
Is it possible that something has changed between Update 1 and Update 6 of MATLAB R2020a which causes MATLAB to behave differently? Unfortunately, I did not find any hint in the release notes of the updates with respect to the behavior of fopen.
Best,
Michael
  댓글 수: 2
Mathieu NOE
Mathieu NOE 2023년 5월 25일
you may want to contact TMW support for that
Michael Liedlgruber
Michael Liedlgruber 2023년 5월 25일
Thank you. Yes, if nobody in the community has an idea what may cause these inconsistencies, I will contact TMW support.
Fortunately, a fix is quite easy: by specifying UTF-8 encoding explicitly, everything works as expected on all machines.
But I'm still curious what's going on here.
Best,
Michael

댓글을 달려면 로그인하십시오.

답변 (1개)

Ayush
Ayush 2023년 9월 4일
  댓글 수: 1
Michael Liedlgruber
Michael Liedlgruber 2023년 9월 6일
Thank you. But this does not really answer my question. And, funnily, the page you linked to says "For more information, see ."
So, I already know that MATLAB defaults to UTF-8. But as you can see in my original post, the behavior is inconsistent between Update 1 and Update 6.And I have no explanation why on my machine the incorrect encoding is returned by fopen(fid), while the correct encoding is used when reading the data.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Import and Export에 대해 자세히 알아보기

태그

제품


릴리스

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by