Hi Everyone,
I am trying to organize a txt file with 12000 lines, which is too large to use readtable. And i choose to use textscan.
But the problem is textscan just skip all the empty lines, but i need to the exact lines number of certain element in the original file.
I searched a lot online but didn't help. i tried code like this to delete all whitespace but doesn't help.
default = textscan(fid,'%s%s','Delimiter','=','whitespace', '')
Thank you for your help!

댓글 수: 2

Rik
Rik 2019년 4월 11일
Did you try either suggested solution? If you still have issues, we'll be happy to help.
Jeremy Hughes
Jeremy Hughes 2019년 4월 11일
I know someone has already added a solution, and it's a fine solution for what you're doing. But I'm surprised that READTABLE has a problem. Can you attach a sample?
12,000 lines isn't all that large especially if there are only two columns.
If you have 19a, you might also try:
M = readmatrix(filename,'OutputType','string','Delimiter','=','Whitespace','')

댓글을 달려면 로그인하십시오.

 채택된 답변

Rik
Rik 2019년 4월 10일
편집: Rik 2019년 4월 10일

2 개 추천

If your file doesn't contain any special characters, you could try fileread (which reads a file as one long char array), then split it with regexp. If you aren't sure about the encoding of special characters, you may consider my readfile function (which returns a cell array with 1 element per line, also for empty lines).
default = fileread(filename);
default = regexp(default,'\n','split');
%or:
default = readfile(filename);
The output of those two methods is equivalent if there are no special characters encoded in the file. The allowed characters are shown below. (readfile doesn't have this restriction)
% $%&'()*+,-./0123456789:;<=>?@
% ABCDEFGHIJKLMNOPQRSTUVWXYZ
% [\]^_`abcdefghijklmnopqrstuvwxyz{|}~

댓글 수: 5

Adam Danz
Adam Danz 2019년 4월 10일
편집: Adam Danz 2019년 4월 10일
Another alternative is to use fgetl() and read the file line by line within a while-loop but try fileread() first.
zhiwen wan
zhiwen wan 2019년 4월 11일
Thank you very very much Rik, it works perfectly with the function readfile you created.
I also tried using
default = fileread(filename);
default = regexp(default,'\n','split');
But failed, the cells was read as
{'[Header]←'}
with one arrow behind. then i tried to use
a = strcmp(default,'[Header]')
a = strcmp(default,'[Header]←')
a = strcmp(default,{'[Header]'})
a = strcmp(default,{'[Header]←'})
all of them does not work. Could you be so kind give me some information about this?
Rik
Rik 2019년 4월 11일
If you do a unicode lookup, you can see that the arrow has a decimal value above 127. That means that fileread will not read that value correctly (for some encodings it may read it correctly up to 255). My function should be able to read it, as your file is likely a UTF-8 encoded if it contains these characters.
This will probably be the result for fileread:
input_char=char(8592);
output=unicode2native(input_char,'UTF-8');
char(output)%226 134 144
ans =
'←'
So a strcmp call should probably be something like this:
a = strcmp(default{1},['[Header]' 226 134 144])
But using readfile instead should allow you to use the ← symbol in your strcmp.
Note that default is a cell array, so to compare the full array you should use ismember instead of strcmp.
Jeremy Hughes
Jeremy Hughes 2019년 4월 11일
편집: Jeremy Hughes 2019년 4월 11일
default = regexp(default,'\n','split');
This won't work if there are \r\n windows new lines (or at least you'll have trailing \r characters.)
If you're using 16b or later, try:
default = splitlines(default);
It's a little more robust, and since it has only one job to do, probably slightly faster than regexp.
Rik
Rik 2019년 4월 11일
편집: Rik 2019년 4월 11일
To make the regexp splitting more robust (which will be in my nest version of readfile):
CRLF=[13 10];
CRLF=CRLF([any(default==13) any(default==10)]);
if isempty(CRLF),CRLF=10;end
default = regexp(default,CRLF,'split');
splitlines will probably be faster, while the code I showed here is backwards compatible to R14 (v7.0, which was when regexp was expanded to support outkeys).
Edit:
I just noticed I had this line already in my function:
str(str==13)='';
So readfile already splits it correctly for \r\n files.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Bob Thompson
Bob Thompson 2019년 4월 10일
편집: Rik 2019년 4월 10일

0 개 추천

I'm going to guess that the extra lines are not consistent?
Generally, I would suggest reading the entire file in as one string, then splitting it at the new line characters. The exact coding may be a bit off from the below example, but it should put you on the right track.
default = textscan(fid,'%s'); % Read the file as one block
default = regexp(default,'\n','split'); % Split the string into multiple cells at each new line character

댓글 수: 3

Rik
Rik 2019년 4월 10일
I suspect you mean regexp instead of repmat.
Bob Thompson
Bob Thompson 2019년 4월 10일
Yes, I do. Thank you for catching that, I was using repmat for other things recently.
zhiwen wan
zhiwen wan 2019년 4월 11일
Thank you very much Bob, problem solved:)

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Characters and Strings에 대해 자세히 알아보기

제품

릴리스

R2018b

질문:

2019년 4월 10일

편집:

Rik
2019년 4월 11일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by