Read file with non-uniform lines?

Question

bene1 2020년 10월 25일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/625818-read-file-with-non-uniform-lines

댓글: bene1 2020년 10월 27일

Hi. I'm a Matlab newbie. I would like to read in a file where the lines have different formats, as below.

% Coordinates
%   Code    ID      X         Y
    C       101     0.001     0.001
    C       102     1.002     0.002
    C       103     1.003     1.003
    C       104     0.004     1.004
% Distances
%   Code    ID      From      To      Dist
    D       201     101       103     1.417
    D       202     102       104     1.414

If the first character is C, use...

A = textscan(fid,'%c %d %f %f')

If the first character is D, use...

A = textscan(fid,'%c %d %d %d %f')

After, I'd like to assign the data to structs (c.id, c.x, c.y, d.id, d.from, d.to, d.dist), but first I think I just need to get it scanned in. Is it possible to apply some logic to reading the file? Thank you.

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Walter Roberson 2020년 10월 26일

MATLAB Online에서 열기

'^\s*C.*$', 'dotexceptnewline', 'lineachors'

or

'(?<=(^|\n))\s*C[^\n]*'

with no additional options needed

bene1 2020년 10월 26일

MATLAB Online에서 열기

Great, thanks again. Now have...

C =
  4×1 cell array
    {'    C       101     0.001     0.001←'}
    {'    C       102     1.002     0.002←'}
    {'    C       103     1.003     1.003←'}
    {'    C       104     0.004     1.004←'}

With C as a 4x1, I believe my next step is to extract out the columns. My first thought was

A = textscan(C,'%c %d %f %f')

but I see I can't do that. Looking into cell2struct?

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Walter Roberson 2020년 10월 26일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/625818-read-file-with-non-uniform-lines#answer_524468

MATLAB Online에서 열기

Named tokens, I said. Do not extract the lines ahead of time.

FileText = fileread(YourFileName);
Ctokens = regexp(FileText, '^\s*C\s+(?<ID>\d+)\s+(?<X>\S+)\s+(?<Y>\S+)', 'names', 'lineanchors');
%Ctokens will now be a struct array with field names ID, X, and Y, each of which are character vectors.
C.ID = str2double({Ctokens.ID});
C.X = str2double({Ctokens.X});
C.Y = str2double({Ctokens.Y});
Dtokens = regexp(FileText, '^\s*D\s+(?<ID>\d+)\s+(?<From>\d+)\s+(?<To>\d+)\s+(?<Dist>\S+)', 'names', 'lineanchors');
%Dtokens will now be a struct array with field names ID, From, To, Dist, each of which are character vectors.
D.ID = str2double({Dtokens.ID});
D.From = str2double({Dtokens.From});
D.To = str2double({Dtokens.To});
D.Dist = str2double({Dtokens.Dist});

Amount of processing work is pretty minimial. Pretty much all of the effort is in figuring out the proper regexp patterns to use (which can be pretty tricky when there are variant lines.)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

bene1 2020년 10월 27일

Cool, thank you kindly!

댓글을 달려면 로그인하십시오.

Answer 2

per isakson 2020년 10월 26일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/625818-read-file-with-non-uniform-lines#answer_524413

MATLAB Online에서 열기

>> S = cssm( 'd:\m\cssm\cssm.txt' )
S = 
  1×2 struct array with fields:
    header
    colhead
    Code
    data
>> S(1)
ans = 
  struct with fields:
     header: "Coordinates"
    colhead: ["Code"    "ID"    "X"    "Y"]
       Code: [4×1 string]
       data: [4×3 double]
>> S(2)
ans = 
  struct with fields:
     header: "Distances"
    colhead: ["Code"    "ID"    "From"    "To"    "Dist"]
       Code: [2×1 string]
       data: [2×4 double]

where

function    sas = cssm( ffs )
    
    chr = fileread( ffs );
    str = string( chr );
    str = replace( str, char([13,10]), newline );   % get rid of the carriage return
   
    % split the string into blocks. Use the block header as delimiter. 
    [blk,del] = strsplit( str, '(?m)^\x20*%\x20\w+\x20*\n'  ...      
                        , 'DelimiterType','RegularExpression' );
                    
    blk(1) = [];  % remove empty block before the first delimiter                    
    
    len = numel( del );
    sas(1,len) = struct( 'header',"", 'colhead',"", 'Code',"", 'data',nan );
    
    for jj = 1 : len    % loop over all blocks
        
        sas(jj).header = regexp( del(jj), '\w+', 'match','once' );  % match the name
        
        cac = textscan( blk(jj), "%[^\n]", 1 ); % read the first row
        tmp = strsplit( string(cac{1}) );       % split the row into column headers
        tmp(1) = [];                            % remove the comment character, "%"
        sas(jj).colhead = tmp;
        
        cac = textscan( blk(jj), ['%s',repmat('%f',1,numel(tmp)-1)] ...
                    ,   'Headerlines',1, 'CollectOutput',true );
        sas(jj).Code = string(cac{1});
        sas(jj).data = cac{2};
    end
end

and where cssm.txt contains the data given in of your question.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

bene1 2020년 10월 27일

Thank you for the idea. :-)

댓글을 달려면 로그인하십시오.

Read file with non-uniform lines?

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Read file with non-uniform lines?

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기