Reading a N columns table which sometimes have N+1 columns
    조회 수: 3 (최근 30일)
  
       이전 댓글 표시
    
Hi Everyone,
I search a lot in the forum without findinding a solution. And pardon my English I am French :)
My problem is the following:
I have a log file with a lot of information. The log file could be up to 500Mo (even bigger sometimes). This log file is seperated in 2 main parts. A header part which is easy and fast to retrieve info line by line, and a data part. 
The data part is composed of several tasks with a table of data with text before and after.
The data table structure is the following :
     1     25.870     1.000     lhc       0.000     0.000   -20.140    24.449      1.42061512
     2     25.870     1.000     lhc       0.000     0.000   -20.520    24.519      1.35075912
     3     25.870     1.000     lhc       0.000     0.000   -20.951    24.582      1.28833133
     4     25.870     1.000     lhc       0.000     0.000   -21.434    24.638      1.23204173
     5     25.870     1.000     lhc       0.000     0.000   -21.958    24.689      1.18086597
     6     25.870     1.000     lhc       0.000     0.000   -22.503    24.735      1.13498198
     7     25.870     1.000     lhc       0.000     0.000   -23.148    24.781      1.08854135
     8     25.870     1.000     lhc       0.000     0.000   -23.741    24.824      1.04623596
     9     25.870     1.000     lhc       0.000     0.000   -24.244    24.863      1.00744521
    10     25.870     1.000     lhc       0.000     0.000   -24.626    24.898      0.97159033
    11     25.870     1.000     lhc       0.000     0.000   -24.876    24.932      0.93839531
    12     25.870     1.000     lhc       0.000     0.000   -25.010    24.962      0.90779039
    13     25.870     1.000     lhc       0.000     0.000   -25.057    24.990      0.87971152
    14     25.870     1.000     lhc       0.000     0.000   -25.063    25.016      0.85443812
    15     25.870     1.000     lhc       0.000     0.000   -25.072    25.038      0.83238819
    16     25.870     1.000     lhc       0.000     0.000   -25.115    25.056      0.81396378
    17     25.870     1.000     lhc       0.000     0.000   -25.220    25.070      0.79981872
    18     25.870     1.000     lhc       0.000     0.000   -25.406    25.079      0.79060410
    19     25.870     1.000     lhc       0.000     0.000   -25.611    25.078      0.79173920
    20     25.870     1.000     lhc       0.000     0.000   -25.936    25.068      0.80208976
    21     25.870     1.000     lhc       0.000     0.000   -26.373    25.047      0.82291587
    22     25.870     1.000     lhc       0.000     0.000   -26.891    25.014      0.85576164
    23     25.870     1.000     lhc       0.000     0.000   -27.437    24.969      0.90124460
    24     25.870     1.000     lhc       0.000     0.000   -27.928    24.910      0.96048807
    25     25.870     1.000     lhc       0.000     0.000   -28.254    24.835      1.03468974
    26     25.870     1.000     lhc       0.000     0.000   -28.317    24.746      1.12353854
    27     25.870     1.000     lhc       0.000     0.000   -28.070    24.642      1.22847010
    28     25.870     1.000     lhc       0.000     0.000   -27.662    24.552      1.31821801
    29     25.870     1.000     lhc       0.000     0.000   -27.101    24.452      1.41784749
    30     25.870     1.000     lhc       0.000     0.000   -26.466    24.343      1.52711338
    31     25.870     1.000     lhc       0.000     0.000   -25.820    24.224      1.64568471 **   
As you can see in line 31, ** appears randomly as a 6th column. This is just a part of the data it goes for thousand of lines.
I am using the following code to retrieve those data. It works fine but I have performance problem with big file. It takes too long. Do you have a solution to help me improve performances ? My problem if the interruption cause by these **. The more I have the slower it gets.
Where fid is the identication of current file opened
    % Store all the file in one variable in order to find line of begining and end of tasks and
    % doing more quickly research
    outFile = textscan(fid, '%s', 'Delimiter', '\n');
    frewind(fid);
    %Variable
    taskSummaryFlagOn='No.     goal     weight     pol.      rot.      att.    1. comp.  2. comp.      residue';
    taskSummaryFlagOff='Maximum of 1. component:';
    % Find the rows where tasks results are
    needle=strfind(outFile{1}, taskSummaryFlagOn);
    rowsStartTask= find(~cellfun('isempty', needle));
    needle=strfind(outFile{1}, taskSummaryFlagOff);
    rowsEndTask= find(~cellfun('isempty', needle));
    nbStartLine=0;nbEndLine=2;
    %PreAllocation of the variable for better performances
    dataSimu=cell(max(size(nbLineData)),9);
    nbLineData=zeros(max(size(rowsStartTask)),1);% nbLineData will be to ensure that all the data are correctly retrieve
    % Loop
    for i=1:max(size(rowsStartTask))
        nbLineData(i)=rowsEndTask(i)-rowsStartTask(i)-nbStartLine-nbEndLine;
        dataSimu(i,:)=textscan(fid,'%f %f %f %s %f %f %f %f %f','headerlines', rowsStartTask(i));
        % Exception when the line of data finish with **
        while size(dataSimu{i,1},1)~=nbLineData(i)
            fgetl(fid);% reading the final '**'
            buff=textscan(fid,'%f %f %f %s %f %f %f %f %f');
            for j=1:max(size(buff))
                dataSimu{i,j}=[dataSimu{i,j};buff{:,j}];
            end
        end
        frewind(fid);
    end
If you need more information to understand my problem, I will provide you more details.
Thanks for the time you will spend to help me :)
댓글 수: 0
채택된 답변
  Sindar
      
 2020년 5월 7일
        Assuming you don't need the '**' info, you could try this solution from the fscanf examples which skips the remainder of the line after the data you expect:
dataSimu(i,:)=textscan(fid,'%f %f %f %s %f %f %f %f %f %*[^\n]'','headerlines', rowsStartTask(i));
추가 답변 (0개)
참고 항목
카테고리
				Help Center 및 File Exchange에서 Particle Swarm에 대해 자세히 알아보기
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!