how to use textscan to split a string containing numbers, NaN and strings with quotes (or not)?

Question

david 2017년 9월 28일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/358791-how-to-use-textscan-to-split-a-string-containing-numbers-nan-and-strings-with-quotes-or-not

댓글: Walter Roberson 2017년 9월 29일

Edit: the final purpose is to use textscan on a large file (~1gb), so processing the string before applying texscan is not possible.

This is the string I want to split with "textscan":

s = '-0.27,"NAN","NAN",0.6,"22/09/17 22:59"';

I have tried different syntax:

- test 1

textscan(s, '%f%f%f%f%s', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')

Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}

the best result, only problem: the left over quote at the end of the string. I don't understand why, are the chars listed in "Whitespace" not supposed to be removed?

- test 2

textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')

Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}

same as above

- test 3

textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')

Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}

fail to read NANs

- test 4

textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')

Result: [-0.2700] [NaN] [NaN] [0.6000] {0x1 cell}

fail to read the string

- test 5

textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')

Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}

fail to read NANs

- test 6

textscan(s, '%f"%f""%f"%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')

Result: [-0.2700] [NaN] [0x1 double] [0x1 double] {0x1 cell}

fail to read the 2nd NAN

- test 7

textscan(s, '%f"%f""%f"%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')

Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}

fail to read NANs

Any suggestion? Thanks

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Walter Roberson 2017년 9월 29일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/358791-how-to-use-textscan-to-split-a-string-containing-numbers-nan-and-strings-with-quotes-or-not#answer_283622

MATLAB Online에서 열기

textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'treat','"NAN"')

댓글 수: 2
없음 표시없음 숨기기

david 2017년 9월 29일

MATLAB Online에서 열기

Yes, it works! But the name is 'TreatAsEmpty', at least in my matlab version (2014b):

textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'TreatAsEmpty','"NAN"')

thanks a lot!

Walter Roberson 2017년 9월 29일

In the version I tested in, 'TreatAsEmpty' can be abbreviated -- most parameter names can be abbreviated to their leading unique portion.

댓글을 달려면 로그인하십시오.

Answer 2

Guillaume 2017년 9월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/358791-how-to-use-textscan-to-split-a-string-containing-numbers-nan-and-strings-with-quotes-or-not#answer_283508

MATLAB Online에서 열기

textscan always annoys me, it seems to have lots of hidden rules that are not explicitly stated. I would guess the problem is caused by your NaNs enclosed in quotes. The %f tells textscan to expect numbers yet it get strings. And if you ignore the quotes it throws the string detection off.

Easiest might be to just replace quoted nans by unquoted ones:

textscan(regexprep(s, '"NAN"', 'NAN', 'ignorecase'), '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' ')

works with your example.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Walter Roberson 2017년 9월 29일

You do not have a spare gigabyte of memory that you could read the entire string into with fileread() ? Though I guess you would need a second gigabyte to temporarily store the modified version.

david 2017년 9월 29일

To be honest I did not even try, it sounds like not the best solution for large files.

댓글을 달려면 로그인하십시오.

how to use textscan to split a string containing numbers, NaN and strings with quotes (or not)?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

how to use textscan to split a string containing numbers, NaN and strings with quotes (or not)?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기