필터 지우기
필터 지우기

how to split a file

조회 수: 36 (최근 30일)
Carolina Sainz
Carolina Sainz 2019년 10월 25일
댓글: Guillaume 2019년 10월 30일
Hi, I want to read a long file (with unknown lines) then split it into 'x' lines files and read the data for each new splited data. Any help on how to do this will be very appreciated,
Thanks in advance
  댓글 수: 8
Guillaume
Guillaume 2019년 10월 28일
Do you have an example text file we can test code against?

댓글을 달려면 로그인하십시오.

채택된 답변

Guillaume
Guillaume 2019년 10월 28일
Your file has a bit of an odd format, in particular some lines have an extra *** at the end. It's not clear if it's significant or not and if it needs to be preserved. Since you weren't reading it with your textscan format, I assume not.
It's also not clear what formatting should go in your output file. Since you're using textscan, I assume it doesn't need to be exactly identical to the input.
An very easy way to read a file in blocks of fixed size is with the datastore and co. functions. It's all implemented for you. The following works on R2019b. There are been many improvements to the datastores since 2017b, so it may not work as well for you:
%create a datastore (tabulartext in this case) to read the file
%note that * is treated as a delimeter simply so that it is ignored
ds = tabularTextDatastore('input_file.TXT', 'Delimiter', {' ', '*'}, 'NumHeaderLines', 0, 'MultipleDelimitersAsOne', true);
%specify how many rows to read at once
ds.ReadSize = 36000;
%output folder, and basename (with formatting for file number)
outfolder = 'C:\somewhere\somefolder';
basename = 'split_%3d.txt';
%read blocks in a loop. Save them somewhere. do extra processing
blockindex = 0;
while ds.hasdata
blockindex = blockindex + 1;
data = ds.Read; %read a block
outname = fullfile(outfolder, sprintf(basename, blockindex)); %construct full name of outputfile
writetable(data, outname, 'WriteVariableNames', false); %write to output file
%... some more processing
end
  댓글 수: 4
Guillaume
Guillaume 2019년 10월 30일
Carolina Sainz's comment mistakenly posted as an answer moved here:
Hi again,
I've created a function that splits the file as I want to:
function [x,z,hora] = SplitFile (input_file, lines)
fid = fopen(input_file, "rt");
format = "%*s %*f %*s %*f %*s %*f %*s %*f %*s %*f %*s %*f %*s %f %*s %*f %*s %f";
data = textscan(fid, format);
fclose(fid);
aux_x = data{:,1};
aux_z = data{:,2};
n = floor(length(aux_x))/lines;
j = 1;
for i = 0:n-1
for k = 1:lines
x (k,i+1) = aux_x(j);
z (k,i+1) = aux_z(j);
j = j+1;
end
end
hora = 0:n-1;
end
But I find a problem as sometimes data is saved as NaN. I think it might be because of the *** at the end of some lines. As this *** are not common to every line can you help me overcome this NaN issue?
Many thanks
Guillaume
Guillaume 2019년 10월 30일
can you help me overcome this NaN issue?
I've already given you two ways to overcome the issue. The datastore option and the readtable option. Both of which I specifically wrote to cope with the *** issue.
I've also given you a much more efficient and faster way of producing your x and z matrices, one which doesn't need loop and does the job in just 3 lines.
So, I'm a bit puzzled by your replies. It doesn't appear you take my answers on board.
If you want to continue using textscan, the easiest way to cope with the *** is with:
data = textscan(fid, format, 'CommentStyle', '***');

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Large Files and Big Data에 대해 자세히 알아보기

태그

아직 태그를 입력하지 않았습니다.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by