How to use readtable when skipping every second (non-data) line?

Hi all,
I'm trying to load a file using readtable - see attached snip. The file has 314 colums and many rows. Every second row consists of actual data (e.g. line 10, 12, 14...) and the other rows are headerlines (lines <8) or non-data (e.g. line 9, 11, 13...). How can I read this file so it skips all the non-data lines?
I know how to skip headerlines: data = readtable ('file.dat', 'NumHeaderLines', 10);
But I don't know how to skip every second line. I've been trying to play around with detectImportOptions en setvaropts, but so far (very) unsuccesful. Help! :)

댓글 수: 6

@Sjoukje de Lange: please upload a sample data file by clicking the paperclip button.
I attached a sample file now :)
dpb
dpb 2022년 5월 9일
편집: dpb 2022년 5월 9일
readtable doesn't have an such option -- it wasn't designed with such a file structure in mind --
You can try
'MissingRule', 'omitrow','ExpectedNumVariables',13
parameters in the options object and see if that is sufficient help.
Otherwise, go at it in two steps -- use readcell to read the whole file (less header) and then use cell2table() to convert the long rows using indexing by 2.
thank for your suggestion! However, using readcell (when skipping 10 headlines) produces this
And with some delimiter options ('Delimiter', ' ', 'ConsecutiveDelimitersRule', 'join') I don't come any further than this - in which it takes all my data together in column 12.
I'm now trying to figure sometihng out with:
fid = fopen('data.txt');
formatspec = ['%s%d%d%d%s', repmat('%d', 1, 309)];
c = textscan(fid, formatspec, 'headerLines', 10);
fclose (fid);
But I'm not quite there yet....
This is an abomination -- how was this file created; no chance of fixing it on creation I suppose?
Yeahh... the way this file is saved is absolutely painful. But no way of fixing it indeed, it's straight output from an measurement instrument.

댓글을 달려면 로그인하십시오.

 채택된 답변

Sjoukje de Lange
Sjoukje de Lange 2022년 5월 10일
편집: Sjoukje de Lange 2022년 5월 10일
I think I fixed it (for future reference)! It might be not as clean and professional as it can be, but I think this works:
%read data
fid = fopen(file);
formatspec = ['%D%f%f%f%s', repmat('%f', 1, 309)];
c = textscan(fid, formatspec, 'headerLines', 10);
fclose (fid);
clear line;
line=table; %make empty table
%fill table
for i = 1:length(c)
line{:,i} = (c{i});
end
%delete every second row
line(1:2:end,:) = [];
clear formatspec c header t fid i

추가 답변 (2개)

Mathieu NOE
Mathieu NOE 2022년 5월 9일
hello
try this
T = readtable ('samplefile.txt', 'NumHeaderLines', 8,"Delimiter",' ');
[m,n] = size(T);
select_rows = 1:2:m;
TT = T(select_rows,:)

댓글 수: 2

Thanks for your suggestion! The file has 314 columns and many rows. The problem is that readtable does not read the file as having 314 rows, but only 4, due to the rows without data. The data in the other lines is therefore merged in one column, without any clear delimiters.
hello
can you share the entire file and tell us which data is to be retrieved (important) and what can be left behind ?

댓글을 달려면 로그인하십시오.

dpb
dpb 2022년 5월 9일
>> C=readcell('samplefile.txt');
>> C=C(9:2:end,:);
>> C(1:12,:)
ans =
12×3 cell array
{[16]} {[26]} {'52.7 1900 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'52.7 1900 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'52.9 1902 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'53.0 1904 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'53.1 1906 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'53.2 1908 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'53.3 1910 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'53.4 1912 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'53.6 1914 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'53.7 1916 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'53.8 1918 200 450→→ RGC→154→0…'}
{[16]} {[26]} {'53.9 1920 200 450→→ RGC→154→0…'}
>> strlength(C(1,3))
ans =
2199
>> sum(C{1,3}==9)
ans =
618
>> D=split(C(1,3));
>> whos D
Name Size Bytes Class Attributes
D 314x1 35256 cell
>> D=str2double(split(C(1,3)));
>> whos D
Name Size Bytes Class Attributes
D 314x1 2512 double
>> D(1:20)
ans =
52.7
1900
200
450
NaN
154
0
0
0
0
0
0
0
0
464.2
459.9
456.5
453.7
450.7
447.8
>> sum(isfinite(D))
ans =
313
>> C{1,3}(1:50)
ans =
'52.7 1900 200 450 RGC 154 0.0'
>>
Do you know how many variables are in each file a priori, somehow?
The multiple tabs really screw things up although it looks like maybe can deal with it.
I've not had much direct success with in import options object although with enough time/effort it can undoubtedly be done.
I might just resort to the old "one at a time" here with fgetl and split(), though, and be done with it.

댓글 수: 1

Thanks for your suggestion! Playing around with fgetl and split seems to get me quite far!
By the way, the file is supposed to have 314 columns (the variables) and many (more than 10,000) lines (the measurement points).

댓글을 달려면 로그인하십시오.

카테고리

제품

릴리스

R2021b

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by