how to use textscan to split a string containing numbers, NaN and strings with quotes (or not)?

조회 수: 12 (최근 30일)
Edit: the final purpose is to use textscan on a large file (~1gb), so processing the string before applying texscan is not possible.
This is the string I want to split with "textscan":
s = '-0.27,"NAN","NAN",0.6,"22/09/17 22:59"';
I have tried different syntax:
- test 1
textscan(s, '%f%f%f%f%s', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}
the best result, only problem: the left over quote at the end of the string. I don't understand why, are the chars listed in "Whitespace" not supposed to be removed?
- test 2
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}
same as above
- test 3
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
- test 4
textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {0x1 cell}
fail to read the string
- test 5
textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
- test 6
textscan(s, '%f"%f""%f"%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [NaN] [0x1 double] [0x1 double] {0x1 cell}
fail to read the 2nd NAN
- test 7
textscan(s, '%f"%f""%f"%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
Any suggestion? Thanks

채택된 답변

Walter Roberson
Walter Roberson 2017년 9월 29일
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'treat','"NAN"')
  댓글 수: 2
david
david 2017년 9월 29일
Yes, it works! But the name is 'TreatAsEmpty', at least in my matlab version (2014b):
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'TreatAsEmpty','"NAN"')
thanks a lot!
Walter Roberson
Walter Roberson 2017년 9월 29일
In the version I tested in, 'TreatAsEmpty' can be abbreviated -- most parameter names can be abbreviated to their leading unique portion.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Guillaume
Guillaume 2017년 9월 28일
textscan always annoys me, it seems to have lots of hidden rules that are not explicitly stated. I would guess the problem is caused by your NaNs enclosed in quotes. The %f tells textscan to expect numbers yet it get strings. And if you ignore the quotes it throws the string detection off.
Easiest might be to just replace quoted nans by unquoted ones:
textscan(regexprep(s, '"NAN"', 'NAN', 'ignorecase'), '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' ')
works with your example.
  댓글 수: 3
Walter Roberson
Walter Roberson 2017년 9월 29일
You do not have a spare gigabyte of memory that you could read the entire string into with fileread() ? Though I guess you would need a second gigabyte to temporarily store the modified version.
david
david 2017년 9월 29일
To be honest I did not even try, it sounds like not the best solution for large files.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Type Identification에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by