필터 지우기
필터 지우기

Why does matlab save strings from delimited text file as individual characters? And how to prevent.

조회 수: 8 (최근 30일)
So, I have a cell structure in Matlab (containing words, dates and numbers separated by ";" loaded from a very large file) which I take certain lines from, then do some calculations on and finally write each field to a separate file as a table (the words being the headers, the dates and numbers the data).
I have the script functioning more or less okay, be it that I keep running into a particular problem; namely that when splitting the lines using strsplit all entries are treated as individual characters. So when I select a cell entry and add a position, for example A.a{1,1}(2) it returns the second letter of the string. It also does this for numbers, making manipulation difficult. Being splitted strings Matlab treats multi-digit numbers as single numbers, so when I do A.a{1,2} it returns 122, but when I do A.a{1,2}*2 I get ans = 98 100 100 rather then 244. Now I could use str2num, but that doesn't work for words or dates so can become pretty cumbersome... I have a hard time finding the right command to convert all entries to single 'words'. I've also tried using cell2array and array2table commands, but I somehow keep running into issues. Any help would be appreciated!
  댓글 수: 4
Stephen23
Stephen23 2017년 9월 8일
@Sjouke Rinsma: Thank you for uploading some sample data. I note that all of the columns appear to be numeric, except for the date in the first column. I have no idea why you are wasting your time with importing that data as characters. Why not simply import the data directly as numeric?
Sjouke Rinsma
Sjouke Rinsma 2017년 9월 8일
Hi Stephen; I get what you're saying, though I'm somewhat fuzzy on how to import a ;-delimited text file as numeric data, since this one also contains the 'non-numeric dates'. dlmwrite does not recognize these, and readtable still imports everything as chars.. but maybe I'm just not familiar with right function to use in this case, or I'm just completely overlooking something.
Nevertheless, for as far as I can see, by the time I've reached line 22 I've got a completely numeric array (if I remove the ; at the end) in which I then rewrite the date. Also, for the files I've uploaded, the script seems to work fine, though as I mentioned before; when I'm working with the larger file I somehow get a matrix where toward the right most columns of a field the data types become mixed (randomly quoted and non-quoted entries in the same column). This also results in written files where some numbers are written as numeric and others as chars (?) with, resulting in different number of digits which makes everything look really messy (I've uploaded the resulting mat-file of the result structure and the final text file for one field, if you're interested). Especially that last part has got me puzzled... I would assume it's not because of the large data set, since that is actually the reason I'm using Matlab in the first place.

댓글을 달려면 로그인하십시오.

채택된 답변

Stephen23
Stephen23 2017년 9월 8일
편집: Stephen23 2017년 9월 8일
Rather than wasting time importing the data as character, you would be much better of using textscan to import numeric values as numeric data, for example this reads your entire example file:
opt = {'Delimiter',';', 'CollectOutput',true};
fid = fopen('merged.txt','rt');
hdr = fgetl(fid);
fmt = ['%s',repmat('%f',1,nnz(hdr==';'))];
C = textscan(fid,fmt,opt{:});
fclose(fid);
and checking:
>> size(C{1}) % the number of date strings
ans =
6076 1
>> size(C{2}) % the size of the numeric matrix
ans =
6076 47
>> C{1}{[1,end]} % the first and last dates
ans = 07-09-2017 08:25:33
ans = 07-09-2017 10:40:54
" I work with a 200M+ lines file"
If you have a very large file that cannot be imported at once then you can adapt the code I have shown above using the method given in the MATLAB documentation, which reads blocks of data at-a-time:
Basically the trick is to use the third optional input to specify how many lines to read, and call textscan in a loop.
  댓글 수: 1
Sjouke Rinsma
Sjouke Rinsma 2017년 9월 8일
편집: Sjouke Rinsma 2017년 9월 12일
Should've refreshed before answering that previous post... nevertheless thanks for this, I will definitely look into it!
And so I did. Seems to be working fine now, thanks :)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Logical에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by