Splitting a file into multiple files based on trigger words in the first column.

조회 수: 4 (최근 30일)
Scott Spurgeon
Scott Spurgeon 2021년 8월 2일
편집: dpb 2021년 8월 2일
I've got a large set of data (.dat) that I need to split whenever a specific text string is mentioned. For example, I've got:
Dataset_1_1 Set Number
1234 1234
.... ....
Dataset_1_2 Set Number2
5678 5678
.... ....
[I need to make the split here]
Dataset_2_1 Set Number
1234 1234
.... ....
Dataset_2_2 Set Number2
5678 5678
.... ....
etc, etc, etc. I need to keep all of the "Dataset_1" sets together, meaning "Dataset_1_1" needs to be with "Dataset_1_34" but the split needs to be made as soon as "Dataset_2_1" is detected/read. Unfortunately, the number of rows between "Dataset_1" and "Dataset_2" isn't known (millions of rows) and each Dataset is differently sized, so I need to primarily split them up based on names.
Can Matlab "read" the first column of lines, find where "Dataset_1_1", "Dataset_2_1", "Dataset_3_1", etc. is and split them at those points and then save each to a new dat file?

답변 (1개)

dpb
dpb 2021년 8월 2일
편집: dpb 2021년 8월 2일
This is where a filter is probably best given size of file and unknown numbers between sections...and since don't need to have anything but a single record at a time...
fid=fopen('inputfile.dat');
fnum=1;
fout=compose("Dataset%04d.dat",fnum); % initial output file
fod=fopen(fout,'w'); % open it for writing
fnum=fnum+1; % ready for next file
linechk=compose("Dataset_%d",fnum); % next set indicator string
while ~feof(fid)
l=fgets(fid); % get input line w/ \n
if contains(l,linechk) % found the new test record
fod=fclose(fod); % close the finished test file
fout=compose("Dataset%04d.dat",fnum); % next output file
fod=fopen(fout,'w'); % open it for writing
fnum=fnum+1; % get ready for next file
linechk=compose("Dataset_%d",fnum); % next set indicator string
end
fprintf(fod,'%s',l); % echo line from input to output file
end
fclose('all') % close both files
"Air code", untested but think it's close...

카테고리

Help CenterFile Exchange에서 Large Files and Big Data에 대해 자세히 알아보기

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by