How would I 'trim-the-fat' off of individual text files that are part of a loop?

Question

EL 2019년 9월 26일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/482186-how-would-i-trim-the-fat-off-of-individual-text-files-that-are-part-of-a-loop

댓글: dpb 2019년 9월 30일

Hello,

I'm working on a script that's going to read single column, no header .txt files. Each file in a perfect world would be an exact multiple of 36,000,000 lines of data, however data get's stored with an additional 1 to 5,000,000 million lines. I do not need this data.

What I'm currently using is a file splitter on Linux command line, that splits data into 36,000,000 line chunks, and removes anything that is less than that. Here's what that looks like

clear

echo Hello Human. Please enter the date of the data to be analyzed [mmddyyyy]
echo    
read DataAnalysis
echo     
echo Would you like to analyze DEF, LFM, or SUM?
echo     
read DataType
echo      
echo Thank you Human, please wait.......
echo     
cd $DataAnalysis
split -d -l 36000000 *Live*$DataType* x0
split -d -l 36000000 *Dead*$DataType* x1
#Below, this removes anything with a length less than the bin time. This removes excess data
find . -name 'x*' | xargs -i bash -c 'if [ $(wc -l {}|cut -d" " -f1) -lt 36000000 ] ; then rm -f {}; fi'
mkdir Chopped
mv -S .txt x0* Chopped
mv -S .txt x1* Chopped
#Below, this turns all files into .txt files by adding the .txt suffix
find . -name 'x*' -print0 | xargs -0 -i% mv % %.txt
echo     
echo    
echo *****Data Chop Complete Human*****
echo      
echo      

Now this script is dependant on there being a single "LIVE" file, and a single "DEAD" file, which isn't going to always be the case. I'm going to have multiple files with arbitrary names, that need to be analyzed and concatonated in a specific order. What I currently have for file selection in MatLab is the following

%% Populate filenames for LINUX command line operation
clear
close all
clc
[FileNames PathNames]=uigetfile('Y:\Data\*.txt', 'Choose files to load:','MultiSelect','on'); %It opens the window for file selection  
prompt = 'Enter save-name according to: file_mmddyyyy_signal  ';
Filenamesave = input(prompt,'s');
Filenamesave = strcat(PathNames,Filenamesave,'.mat');
PathNames=strrep(PathNames,'L:','LabData');
PathNames=strrep(PathNames,'\','/');
PathNamesSave=strcat('/',PathNames);
save(Filenamesave,'FileNames','PathNames','PathNamesSave');

When I load the file produced by this script, how would I write a script to scan every file and ignore excess data points that don't equal 36,000,000?

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

dpb 2019년 9월 27일

편집: dpb 2019년 9월 27일

Have pointed this out numerous times but will try yet again...use fullfile() to build file names from the pieces-parts instead of string catenation operations and you won't have to mess with what the file separator character is--ML will take care of it automagically at runtime.

A corollary of the above is to not store a system-specific character in the default base names but to build them at runtime also from the name strings only using fullfile so they'll also match the OS you're running on.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

dpb 2019년 9월 27일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/482186-how-would-i-trim-the-fat-off-of-individual-text-files-that-are-part-of-a-loop#answer_393860

편집: dpb 2019년 9월 28일

MATLAB Online에서 열기

" I'm going to have multiple files with arbitrary names, that need to be analyzed and concat[e]nated "

Presuming this is related to the previous topic, I'd (yet again) suggest it's probably not necessary (or even desireable) to generate all the arbitrary intermediate files...

N=yourbignumber;
fid=fopen(yourreallyreallyreaalybigfile.txt','r');
while ~feof(fid)
  [data,nread]=fscanf(fid,'%f',N);
  if nread<N
    % whatever to do with the full section results goes here
  else
    % anything want to do with the short section results goes here
  end
end
fid=fclose(fid);

Inside that full section clause can be the other loop we just went through that uses the second magic number of 400K records to process.

댓글 수: 2
없음 표시없음 숨기기

EL 2019년 9월 29일

This makes sense. So, each file which will ahve a tail end of data I don't need will simply be ignored, if I'm understanding this correctly?

The files being loaded aren't split files. These are the raw data files that are seperate because data acquisition had to stop due to a mandatory change in conditions. I always have live and dead data, and sometimes I have to stop data acquisition to adjust the instrument or conditions. Each time I stop data acquisition, new files are generated. It's just how our software works.

dpb 2019년 9월 30일

"each file which will ahve a tail end of data I don't need will simply be ignored"

Depends. The above will read up to N records -- there could be fewer records in the file, there could be an error inside the file or there could be N or more records but had an out-of-memory problem reading the full N.

In the above, you'll read however many sets there are in the file before the loop quits but you'll know how many records were read each time and can take action accordingly.

If you have only a fixed number of total records that are wanted (some multiple of N), then would need to use a counter to keep track of how many sets you've read and break when that's done.

In the other thread, it is presumed the N is the total number of records wanted and there's no need in that case for the while loop. This would be how to read the N=400K blocks if don't read the whole set wanted in one go.

How you do this is up to you in the end; I was just trying to get you past the original postings of some time ago that were to break up the big file into a zillion little ones.

댓글을 달려면 로그인하십시오.

Answer 2

Guillaume 2019년 9월 26일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/482186-how-would-i-trim-the-fat-off-of-individual-text-files-that-are-part-of-a-loop#answer_393625

편집: Guillaume 2019년 9월 26일

MATLAB Online에서 열기

If I understood correctly:

opt = detectImportOptions(yourtextfile);
opt.DataLines = [1 36e6];  %only read the first 36000000 lines if there are more
data = readtable(yourtextfile, opt);  %R2019a or later, use readmatrix instead if you want a plain matrix

If the files are guaranteed to have at least 36,000,000 lines then this would work as well:

data = csvread(yourtextfile, 0, 0, 36e6, 0);

but will error if there are less than 36,000,000 lines, unlike the 1st option which will read whatever there is.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

dpb 2019년 9월 27일

편집: dpb 2019년 9월 27일

MATLAB Online에서 열기

One can always put the read in a try...catch block to handle the short file section case.

N=yourbignumber;
fid=fopen(yourtextfile,'r');
try
  data=fscanf(fid,'%f',N);
catch ME
  % anything want to do with the short section results goes here
end
fid=fclose(fid);

The above also will not error no matter the file size (well, it might, but you've anticipated it and have way to handle it gracefully).

The other thing of this way is you have a direct 1D double array; the readtable option above will return the data in a MATLAB table object which, for just one variable, doesn't have much benefit.

댓글을 달려면 로그인하십시오.

How would I 'trim-the-fat' off of individual text files that are part of a loop?

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How would I 'trim-the-fat' off of individual text files that are part of a loop?

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기