Extract columns from multiple csv files

조회 수: 20 (최근 30일)
Paul Broadley
Paul Broadley 2019년 8월 14일
답변: dpb 2019년 8월 14일
I have a folder containing 27 csv files, of which contain 6 columns each. (I have 32 of these folders, hence my reluctance to do this manually).
I must do some signal processing on one column (column 3) of each file, so I want to write a code allowing me to extract all 3rd columns from each file in the given folder. This can either be saved seperately or as one large table/ array.
My code reads like this (for the first folder: A100):
A100dir = 'C:\Work\TESTING\A100'
A100 = dir(fullfile(A100dir, '*.csv'))
A100 = {A100}
String = cell(1,numel(A100))
n = length(A100)
for i = 1:n
data = csvread(A100(i))
data = readtable(files{i})
colL = data(:,3)
end
  댓글 수: 2
dpb
dpb 2019년 8월 14일
편집: dpb 2019년 8월 14일
Are the record lengths in each file the same or different? If the latter, is wanted to have all same or keep differences? If each is same length, then a 2D array is the simplest storage mechanism by far.
Or, of course, if you're processing these files for this particular column just once, there would be no need to actually create the secondary file/data array altogether unless it requires more than one signal at a time to do the computation. If not, just go ahead and process each when you read it and go on and save all the memory needed to build the composite array.
Of course, if you are going to need to reprocess all these data multiple times it makes sense to consolidate it...in that case, one presumes that perhaps there would be a need to pick another column besides the third? So, one should write the code to be able to pick the desired column(s) with user input, not hardcoded in the code itself (which is a general tenet to not bury "magic numbers" in code whatever their purpose if there's any chance at all the number could change).
Paul Broadley
Paul Broadley 2019년 8월 14일
They are of different lengths in each circumstance.
I will need to do a similar process for another column (column 6) and the other columns can be ignored.
It will be required for further analysis later on but for now it will be just the one time analysis. Each aforementioned folder corresponds to a different instance of parameters.

댓글을 달려면 로그인하십시오.

채택된 답변

Neuropragmatist
Neuropragmatist 2019년 8월 14일
Why not just concatenate the data - or am I missing something?
If in the end you want a single vector to analyse you can just concatenate without preallocation (forgive me matlab):
all_data = [];
for i = 1:n
data = csvread(A100(i));
data = readtable(files{i});
all_data = [all_data; data(:,3)]; % if you know the size of data before you can preallocate all_data for better performance
end
Although I would suggest adding a column to keep track of which file what data came from:
all_data = [];
for i = 1:n
data = csvread(A100(i));
data = readtable(files{i});
all_data = [all_data; ones(size(data(:,3))).*i data(:,3)]; % if you know the size of data before you can preallocate all_data for better performance
end
Or you could collect the data in a cell array and use cellfun later to apply your analysis to each cell/data set:
all_data = cell(n,1);
for i = 1:n
data = csvread(A100(i));
data = readtable(files{i});
all_data{i,1} = [ones(size(data(:,3))).*i data(:,3)];
end
Is any of that close to what you want?
I think I would agree with dpb that you should probably load all the data into one big array, probably in a table for convenience. Then just extract the third column from that. This might save time in the long run because you only have to loop over and load files once and it would be easier for users to make modifications to the code in future.
M.

추가 답변 (1개)

dpb
dpb 2019년 8월 14일
targetColumn=3; % don't bury magic numbers, use input() to set, maybe...
rootdir = 'C:\Work\TESTING'; % set the root directory location (could use uigetfile)
dirName = 'A*'; % base pattern for the directories to traverse
dDir=dir(fullfile(rootdir,dirName,'.')); % return the directories of interest A*
for i=1:numel(dDir) % iterate over directories found
d=dir(fullfile(rootdir,dDir(i).Name,'*.csv')); % and pick up all the .csv files therein...
for j = 1:numel(d)
data = csvread(fullfile(rootdir,dDir(i).Name,d(j).name)); % read the jth file in ith directory
res=youranalysisfunction(data(:,targetColumn)); % do analysis of target column
% insert code to save the analysis results here...
...
end
The above does the one-pass, do the analysis while traversing method...warning air-code may be typos, etc., ...
If you do want to save the given column across all for future reference, you can just create a 2D cell array and then inside the inner loop after csvread
alldata{i,j}=data(:,targetColumn)); % save target column given subdir/file as cell array

카테고리

Help CenterFile Exchange에서 Logical에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by