textscan difficulties with mixed datatypes

Question

Phillip 2014년 5월 27일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/131389-textscan-difficulties-with-mixed-datatypes

댓글: Phillip 2014년 5월 28일

채택된 답변: Cedric

MATLAB Online에서 열기

Hi

I am having difficulty solving a particular problem. I might just be missing the wood for the trees but here goes:

I have a large (> 1mio) cellstr that has the following type of format (only 3 row example shown):

    blockCSV = {'record1,2,3,string4,s5';'rec2,22,33,str4,str5';'r3,222,333,s4,st5'};

I then attempt to textscan through each cellstr (for loop, as textscan is not "vectorized" for cellstr) using one of the following two syntaxes:

temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',0)

or

temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',1)

Now, the problem is that temp comes out as a cell that contains cells and matrices ie. indexing within indexing on different datatypes. I can't afford to index each one individually inside the loop (large dataset as mentioned) but I need the output to come out as :

   ans = 
    'record1'    [  2]    [  3]    'string4'    's5'  
    'rec2'       [ 22]    [ 33]    'str4'       'str5'
    'r3'         [222]    [333]    's4'         'st5'

[Edited for clarity (hopefully)]: Instead I get something like (CollectOutput is false):

ans =

    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}
    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}
    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}

or (CollectOutput is true):

ans =

    {1x1 cell}    [1x2 double]    {1x2 cell}
    {1x1 cell}    [1x2 double]    {1x2 cell}
    {1x1 cell}    [1x2 double]    {1x2 cell}

With CollectOutput == false I would expect to see what I stated above instead of a cell within a cell within makes any indexing very difficult?

I hope this makes sense. I'm sure i'm missing something simplistic.

PS: I think textscan is inconsistent because when you read the example from an actual file (instead of a cellstr) it works exactly like I want the outcome to be without any for loop or indexing.

Regards, Phillip

댓글 수: 2
없음 표시없음 숨기기

per isakson 2014년 5월 27일

Why use textscan in the first place?

Phillip 2014년 5월 28일

Why not? I have tried a couple of things and it seemed to be best. Please elaborate if you think it's not so that I can reply appropriately

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Cedric 2014년 5월 28일

3
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/131389-textscan-difficulties-with-mixed-datatypes#answer_138554

편집: Cedric 2014년 5월 28일

MATLAB Online에서 열기

Why do you get the CSV content as a cell array of rows? If you cannot change this, you could just merge/concatenate all these rows inserting line breaks, and use TEXTSCAN on the whole.

 merger = [blockCSV, repmat({sprintf('\n')}, numel(blockCSV), 1)].' ;
 data   = textscan([merger{:}], '%s%f%f%s%s', 'Delimiter', ',') ;

with that you get

 >> data
 data = 
    {3x1 cell}    [3x1 double]    [3x1 double]    {3x1 cell}    {3x1 cell}

which is most appropriate memory-wise and for further indexing, as numeric entries are stored in numeric arrays, and non-numeric entries in cell arrays.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Phillip 2014년 5월 28일

Nice use of the "inconsistency". Should have thought of that. Speeds up the code nicely and now I can finally generalise the larger code. Thanks!

댓글을 달려면 로그인하십시오.

Answer 2

dpb 2014년 5월 27일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/131389-textscan-difficulties-with-mixed-datatypes#answer_138540

MATLAB Online에서 열기

Is only one of the many inconsistencies/quirks in textscan...

AFAIK about the best you can do is to then post-process another step by substituting the value of the cell for the cell in the three string cell columns. By for loop, it's

>> for i=1:3,t(i,1)=t{i,1};t(i,4)=t{i,4};t(i,5)=t{i,5};end
>> t
t = 
  'record1'    [  2]    [  3]    'string4'    's5'  
  'rec2'       [ 22]    [ 33]    'str4'       'str5'
  'r3'         [222]    [333]    's4'         'st5'

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Phillip 2014년 5월 28일

Yes, it's a bit frustrating to be honest. The solution from Cedric below uses that inconsistency nicely to get it working though. Thanks for the response.

댓글을 달려면 로그인하십시오.

textscan difficulties with mixed datatypes

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

textscan difficulties with mixed datatypes

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기