how to use textscan to split a string containing numbers, NaN and strings with quotes (or not)?
조회 수: 12 (최근 30일)
이전 댓글 표시
Edit: the final purpose is to use textscan on a large file (~1gb), so processing the string before applying texscan is not possible.
This is the string I want to split with "textscan":
s = '-0.27,"NAN","NAN",0.6,"22/09/17 22:59"';
I have tried different syntax:
- test 1
textscan(s, '%f%f%f%f%s', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}
the best result, only problem: the left over quote at the end of the string. I don't understand why, are the chars listed in "Whitespace" not supposed to be removed?
- test 2
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}
same as above
- test 3
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
- test 4
textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {0x1 cell}
fail to read the string
- test 5
textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
- test 6
textscan(s, '%f"%f""%f"%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [NaN] [0x1 double] [0x1 double] {0x1 cell}
fail to read the 2nd NAN
- test 7
textscan(s, '%f"%f""%f"%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
Any suggestion? Thanks
댓글 수: 0
채택된 답변
Walter Roberson
2017년 9월 29일
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'treat','"NAN"')
댓글 수: 2
Walter Roberson
2017년 9월 29일
In the version I tested in, 'TreatAsEmpty' can be abbreviated -- most parameter names can be abbreviated to their leading unique portion.
추가 답변 (1개)
Guillaume
2017년 9월 28일
textscan always annoys me, it seems to have lots of hidden rules that are not explicitly stated. I would guess the problem is caused by your NaNs enclosed in quotes. The %f tells textscan to expect numbers yet it get strings. And if you ignore the quotes it throws the string detection off.
Easiest might be to just replace quoted nans by unquoted ones:
textscan(regexprep(s, '"NAN"', 'NAN', 'ignorecase'), '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' ')
works with your example.
댓글 수: 3
Walter Roberson
2017년 9월 29일
You do not have a spare gigabyte of memory that you could read the entire string into with fileread() ? Though I guess you would need a second gigabyte to temporarily store the modified version.
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Type Identification에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!