Import data from a bad format

Question

J T 2023년 5월 10일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1961019-import-data-from-a-bad-format

댓글: dpb 2023년 5월 11일

Hello, I have a set of data and they were saved in a bad format (basically were saved from Python with lists of numpy arrays)

An example data file look like this, each file is supposed to be import into MATLAB as a matrix, where contents in eac [...] goes into each row, for as many row as the number of [...] the file contains. I am having trouble to import these, and it is too expensive to regenerate these data. Could anyone help me please?

*Note: I attached a zip file of an example data file .dat

*Note: I also converted an example from the source data from .dat to .txt to upload here

[0.01643466 0.014102 0.00989389 0.00854453 0.00811339 0.00641578

0.00615053 0.00540413 0.00452342 0.00427268 0.0041174 0.00352849

0.00273467 0.00265508 0.00239323 0.00225965 0.00199268 0.00180934

0.00174052 0.00154865 0.00143824 0.00140056 0.00130063 0.00111959

0.00085831]

[0.01242517 0.00959429 0.00663475 0.00480379 0.0041159 0.00370299

0.00346792 0.00315736 0.00289833 0.00248943 0.00233303 0.00205719

0.00184254 0.0016187 0.00137933 0.00123405 0.00114122 0.00100773

0.00094038 0.00088898 0.00078643 0.00077108 0.0006717 0.00062967

0.00058109]

[ 2.71704623e-03 2.10584618e-03 8.72114136e-04 7.73112590e-04

5.71653378e-04 5.33790412e-04 3.39630885e-04 2.40184459e-04

1.30327127e-04 8.07570547e-05 4.93676189e-05 3.99133858e-05

-6.96552090e-05 -8.84689362e-05 -1.73745252e-04 -1.92295775e-04

-2.88978292e-04 -3.33804546e-04 -4.48600012e-04 -5.03108816e-04

-6.09854318e-04 -6.76489121e-04 -7.41927073e-04 -8.22272102e-04

-1.01214861e-03]

[ 2.48950496e-03 1.32848678e-03 7.77518243e-04 4.46048853e-04

1.82546718e-04 5.68524734e-05 -2.03947611e-05 -1.22789817e-04

-1.42331199e-04 -2.27905262e-04 -2.54901789e-04 -3.21797964e-04

-4.10908018e-04 -4.31102320e-04 -5.76116105e-04 -6.20647464e-04

-6.61513106e-04 -8.03798804e-04 -8.85422390e-04 -9.60254905e-04

-1.05730808e-03 -1.21679564e-03 -1.29680491e-03 -1.65221752e-03

-1.89191346e-03]

[0.01148437 0.00831067 0.00569898 0.00435051 0.00369133 0.00313336

0.00282179 0.00252201 0.00221526 0.0020089 0.00178797 0.00135555

0.00117 0.00106878 0.00099295 0.00081433 0.00073677 0.00068778

0.00068557 0.00063079 0.00057153 0.00053233 0.0004835 0.00046683

0.00042318]

[0.01074849 0.00739927 0.00473212 0.00377076 0.00318848 0.00255984

0.00228395 0.00197474 0.00166971 0.00144228 0.00128842 0.00088904

0.00081689 0.00072367 0.00064738 0.00060256 0.00053549 0.00049838

0.00046984 0.00042499 0.0003706 0.00034885 0.00028414 0.0002643

0.00023334]

........

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

J T 2023년 5월 10일

@Walter Roberson each [] is splitted into multiple lines (depends on the number format?), but yes it is always 25 entries in one [] group

J T 2023년 5월 10일

@Walter Roberson It appears that in decimal format the 25 entries are splitted into 5 rows, and in scientific format, splitted into 7 lines

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Stephen23 2023년 5월 11일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1961019-import-data-from-a-bad-format#answer_1233154

편집: Stephen23 2023년 5월 11일

MATLAB Online에서 열기

example.txt

TEXTSCAN is very efficient, and imports numeric data as numeric (i.e. no fiddling around with text):

fmt = repmat('%f',1,25);
fid = fopen('example.txt');
out = textscan(fid,fmt,'EndOfLine',']','Whitespace',' \b\t\r\n[', 'CollectOutput',true);
fclose(fid);
mat = out{1}
mat = 250×25
    0.0164    0.0141    0.0099    0.0085    0.0081    0.0064    0.0062    0.0054    0.0045    0.0043    0.0041    0.0035    0.0027    0.0027    0.0024    0.0023    0.0020    0.0018    0.0017    0.0015    0.0014    0.0014    0.0013    0.0011    0.0009
    0.0124    0.0096    0.0066    0.0048    0.0041    0.0037    0.0035    0.0032    0.0029    0.0025    0.0023    0.0021    0.0018    0.0016    0.0014    0.0012    0.0011    0.0010    0.0009    0.0009    0.0008    0.0008    0.0007    0.0006    0.0006
    0.0115    0.0083    0.0057    0.0044    0.0037    0.0031    0.0028    0.0025    0.0022    0.0020    0.0018    0.0014    0.0012    0.0011    0.0010    0.0008    0.0007    0.0007    0.0007    0.0006    0.0006    0.0005    0.0005    0.0005    0.0004
    0.0107    0.0074    0.0047    0.0038    0.0032    0.0026    0.0023    0.0020    0.0017    0.0014    0.0013    0.0009    0.0008    0.0007    0.0006    0.0006    0.0005    0.0005    0.0005    0.0004    0.0004    0.0003    0.0003    0.0003    0.0002
    0.0103    0.0073    0.0041    0.0034    0.0030    0.0026    0.0023    0.0019    0.0016    0.0014    0.0011    0.0008    0.0007    0.0006    0.0006    0.0005    0.0004    0.0004    0.0003    0.0003    0.0003    0.0003    0.0002    0.0002    0.0002
    0.0103    0.0072    0.0039    0.0032    0.0028    0.0026    0.0022    0.0019    0.0016    0.0014    0.0010    0.0008    0.0008    0.0006    0.0006    0.0005    0.0004    0.0004    0.0003    0.0003    0.0003    0.0002    0.0002    0.0002    0.0001
    0.0105    0.0071    0.0037    0.0027    0.0026    0.0023    0.0021    0.0018    0.0015    0.0013    0.0010    0.0008    0.0008    0.0006    0.0005    0.0005    0.0004    0.0003    0.0003    0.0003    0.0002    0.0002    0.0002    0.0002    0.0001
    0.0108    0.0070    0.0035    0.0026    0.0024    0.0022    0.0019    0.0016    0.0013    0.0012    0.0009    0.0008    0.0007    0.0006    0.0005    0.0004    0.0004    0.0003    0.0003    0.0003    0.0002    0.0002    0.0002    0.0001    0.0001
    0.0110    0.0068    0.0035    0.0026    0.0023    0.0022    0.0018    0.0014    0.0012    0.0010    0.0009    0.0008    0.0007    0.0006    0.0005    0.0004    0.0004    0.0003    0.0003    0.0002    0.0002    0.0002    0.0001    0.0001   -0.0000
    0.0111    0.0065    0.0036    0.0028    0.0023    0.0021    0.0018    0.0013    0.0013    0.0010    0.0009    0.0008    0.0007    0.0007    0.0005    0.0004    0.0004    0.0003    0.0003    0.0002    0.0002    0.0001    0.0001    0.0001   -0.0000

Automagically detecting the matrix size also works, but is not documented:

fid = fopen('example.txt');
out = textscan(fid,'','EndOfLine',']','Whitespace',' \b\t\r\n[', 'CollectOutput',true);
fclose(fid);
mat = out{1}
mat = 250×25
    0.0164    0.0141    0.0099    0.0085    0.0081    0.0064    0.0062    0.0054    0.0045    0.0043    0.0041    0.0035    0.0027    0.0027    0.0024    0.0023    0.0020    0.0018    0.0017    0.0015    0.0014    0.0014    0.0013    0.0011    0.0009
    0.0124    0.0096    0.0066    0.0048    0.0041    0.0037    0.0035    0.0032    0.0029    0.0025    0.0023    0.0021    0.0018    0.0016    0.0014    0.0012    0.0011    0.0010    0.0009    0.0009    0.0008    0.0008    0.0007    0.0006    0.0006
    0.0115    0.0083    0.0057    0.0044    0.0037    0.0031    0.0028    0.0025    0.0022    0.0020    0.0018    0.0014    0.0012    0.0011    0.0010    0.0008    0.0007    0.0007    0.0007    0.0006    0.0006    0.0005    0.0005    0.0005    0.0004
    0.0107    0.0074    0.0047    0.0038    0.0032    0.0026    0.0023    0.0020    0.0017    0.0014    0.0013    0.0009    0.0008    0.0007    0.0006    0.0006    0.0005    0.0005    0.0005    0.0004    0.0004    0.0003    0.0003    0.0003    0.0002
    0.0103    0.0073    0.0041    0.0034    0.0030    0.0026    0.0023    0.0019    0.0016    0.0014    0.0011    0.0008    0.0007    0.0006    0.0006    0.0005    0.0004    0.0004    0.0003    0.0003    0.0003    0.0003    0.0002    0.0002    0.0002
    0.0103    0.0072    0.0039    0.0032    0.0028    0.0026    0.0022    0.0019    0.0016    0.0014    0.0010    0.0008    0.0008    0.0006    0.0006    0.0005    0.0004    0.0004    0.0003    0.0003    0.0003    0.0002    0.0002    0.0002    0.0001
    0.0105    0.0071    0.0037    0.0027    0.0026    0.0023    0.0021    0.0018    0.0015    0.0013    0.0010    0.0008    0.0008    0.0006    0.0005    0.0005    0.0004    0.0003    0.0003    0.0003    0.0002    0.0002    0.0002    0.0002    0.0001
    0.0108    0.0070    0.0035    0.0026    0.0024    0.0022    0.0019    0.0016    0.0013    0.0012    0.0009    0.0008    0.0007    0.0006    0.0005    0.0004    0.0004    0.0003    0.0003    0.0003    0.0002    0.0002    0.0002    0.0001    0.0001
    0.0110    0.0068    0.0035    0.0026    0.0023    0.0022    0.0018    0.0014    0.0012    0.0010    0.0009    0.0008    0.0007    0.0006    0.0005    0.0004    0.0004    0.0003    0.0003    0.0002    0.0002    0.0002    0.0001    0.0001   -0.0000
    0.0111    0.0065    0.0036    0.0028    0.0023    0.0021    0.0018    0.0013    0.0013    0.0010    0.0009    0.0008    0.0007    0.0007    0.0005    0.0004    0.0004    0.0003    0.0003    0.0002    0.0002    0.0001    0.0001    0.0001   -0.0000

Avoid unnecessary complexity in your code.

댓글 수: 2
없음 표시없음 숨기기

J T 2023년 5월 11일

This is amazing! Also works in r2020a too! Thank you!

dpb 2023년 5월 11일

Good thinking to use the closing bracket as newline @Stephen23; that didn't occur to me in initial response to Walter's counted attempt that fails because the count changes; hence the text processing...

댓글을 달려면 로그인하십시오.

Answer 2

Walter Roberson 2023년 5월 10일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1961019-import-data-from-a-bad-format#answer_1232534

MATLAB Online에서 열기

If it is stored in a file and it is always exactly 25 entries per logical row, then you could use textscan,

PerRow = 25;
fmt = "[" + repmat('%f', 1, PerRow) + "]";
FID = fopen(FILENAME, 'r');
output = cell2mat( textscan(FID, fmt) );
fclose(FID)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

dpb 2023년 5월 10일

As requested, attach a section of the text file in a usable format, not as a zipped file..."help us help you!"

J T 2023년 5월 10일

@dpb Hi, it doesn't allow me to upload the raw file .dat, that's why I zipped it. I am going to convert it to .txt and give it a try as well.

댓글을 달려면 로그인하십시오.

Answer 3

dpb 2023년 5월 11일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1961019-import-data-from-a-bad-format#answer_1233019

편집: dpb 2023년 5월 11일

MATLAB Online에서 열기

example.txt

The '%g' format has struck again -- that's what killed @Walter Roberson's approach. While not the most efficient, a simple way in MATLAB would be

f=readlines('example.txt');     % import as string array
f=strrep(f,"[","");             % remove the brackets
f=strrep(f,"]","");             % remove the brackets
f=join(f);                      % turn into long string
f=strtrim(split(f));            % convert to array
f=f(strlength(f)>0);
data=str2double(strtrim(split(f)));             % convert
whos data
  Name         Size            Bytes  Class     Attributes

  data      6250x1             50000  double              
data=reshape(data,[],25).';
data(1:3,:)
ans = 3×250
    0.0164    0.0141    0.0099    0.0085    0.0081    0.0064    0.0062    0.0054    0.0045    0.0043    0.0041    0.0035    0.0027    0.0027    0.0024    0.0023    0.0020    0.0018    0.0017    0.0015    0.0014    0.0014    0.0013    0.0011    0.0009    0.0124    0.0096    0.0066    0.0048    0.0041
    0.0111    0.0063    0.0038    0.0029    0.0023    0.0022    0.0018    0.0014    0.0013    0.0010    0.0009    0.0007    0.0007    0.0006    0.0006    0.0004    0.0004    0.0003    0.0002    0.0002    0.0002    0.0001    0.0001    0.0000    0.0000    0.0110    0.0062    0.0039    0.0031    0.0024
    0.0088    0.0071    0.0062    0.0051    0.0047    0.0040    0.0033    0.0028    0.0023    0.0020    0.0016    0.0015    0.0011    0.0011    0.0009    0.0008    0.0007    0.0007    0.0006    0.0005    0.0005    0.0004    0.0004    0.0004    0.0003    0.0084    0.0071    0.0064    0.0054    0.0050

Alternatively,

f=readlines('example.txt');     % import as string array
f=split(join(f),']');           % turn into array by section
f=f(strlength(f)>0);
f=strtrim(f);
f=extractAfter(f,"[");
f=f(strlength(f)>0);
data=cell2mat(arrayfun(@(l)str2double(split(strtrim(l))).',f,'uni',0));
ans = 2×1 string array
    "0.01643466 0.014102   0.00989389 0.00854453 0.00811339 0.00641578  0.00615053 0.00540413 0.00452342 0.00427268 0.0041174  0.00352849  0.00273467 0.00265508 0.00239323 0.00225965 0.00199268 0.00180934  0.00174052 0.00154865 0.00143824 0.00140056 0.00130063 0.00111959  0.00085831"
    "0.00837889 0.00778493 0.00703359 0.00646615 0.00562914 0.00480321  0.00431015 0.00361664 0.00308546 0.00267183 0.0022049  0.00195752  0.00153126 0.00126947 0.00105491 0.00103345 0.0009544  0.00088708  0.00083375 0.000798   0.00070615 0.00065023 0.00062888 0.00053734  0.00047918"
[data(1:3,:);data(end-3:end,:)]
  Name        Size            Bytes  Class     Attributes

  data      250x25            50000  double              
ans = 7×25
    0.0164    0.0141    0.0099    0.0085    0.0081    0.0064    0.0062    0.0054    0.0045    0.0043    0.0041    0.0035    0.0027    0.0027    0.0024    0.0023    0.0020    0.0018    0.0017    0.0015    0.0014    0.0014    0.0013    0.0011    0.0009
    0.0124    0.0096    0.0066    0.0048    0.0041    0.0037    0.0035    0.0032    0.0029    0.0025    0.0023    0.0021    0.0018    0.0016    0.0014    0.0012    0.0011    0.0010    0.0009    0.0009    0.0008    0.0008    0.0007    0.0006    0.0006
    0.0115    0.0083    0.0057    0.0044    0.0037    0.0031    0.0028    0.0025    0.0022    0.0020    0.0018    0.0014    0.0012    0.0011    0.0010    0.0008    0.0007    0.0007    0.0007    0.0006    0.0006    0.0005    0.0005    0.0005    0.0004
    0.0084    0.0078    0.0070    0.0065    0.0056    0.0048    0.0043    0.0036    0.0031    0.0027    0.0022    0.0020    0.0015    0.0013    0.0011    0.0010    0.0010    0.0009    0.0008    0.0008    0.0007    0.0006    0.0006    0.0005    0.0005
    0.0084    0.0078    0.0070    0.0065    0.0056    0.0048    0.0043    0.0036    0.0031    0.0027    0.0022    0.0020    0.0015    0.0013    0.0011    0.0010    0.0010    0.0009    0.0008    0.0008    0.0007    0.0006    0.0006    0.0005    0.0005
    0.0084    0.0078    0.0070    0.0065    0.0056    0.0048    0.0043    0.0036    0.0031    0.0027    0.0022    0.0020    0.0015    0.0013    0.0011    0.0010    0.0010    0.0009    0.0008    0.0008    0.0007    0.0006    0.0006    0.0005    0.0005
    0.0084    0.0078    0.0070    0.0065    0.0056    0.0048    0.0043    0.0036    0.0031    0.0027    0.0022    0.0020    0.0015    0.0013    0.0011    0.0010    0.0010    0.0009    0.0008    0.0008    0.0007    0.0007    0.0006    0.0005    0.0005

댓글 수: 2
없음 표시없음 숨기기

J T 2023년 5월 11일

Hi, thank you so much for your inputs! However, I cannot run any other versions of matlab except for r2020a, and the readlines function doesn't seem to be available here.

dpb 2023년 5월 11일

MATLAB Online에서 열기

f=textread('example.txt','%s','delimiter','\n','whitespace','');
f=string(strtrim(f));

then. textread has been deprecated, but it's often still of real use/value where textscan is more trouble to deal with...

댓글을 달려면 로그인하십시오.

Answer 4

J T 2023년 5월 11일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1961019-import-data-from-a-bad-format#answer_1233134

MATLAB Online에서 열기

Based on @dpb's and @Walter Roberson's answers, I worked out the following codes that is valid for R2020a:

FID = fopen('example.txt'); 
data = textscan(FID,'%s');     
fclose(FID);
stringData = string(data{:});  % import as string array
f=strrep(stringData,"[","");  % remove the brackets
f=strrep(f,"]",""); % remove the brackets
f=join(f);% turn into long string
f=strtrim(split(f));% convert to array
f=f(strlength(f)>0);
data=str2double(strtrim(split(f)));     % convert
data=reshape(data,[],length(data)/25)';
data(1:3,:)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Import data from a bad format

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (3개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Import data from a bad format

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (3개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기