Merge multiple XLSX files with one header mismatch

조회 수: 2 (최근 30일)
Nicolò Dall'Acqua
Nicolò Dall'Acqua 2021년 10월 27일
댓글: Walter Roberson 2021년 10월 28일
Hello everyone, I'm stuck with this problem at the moment, and I would be really grateful if you could help me.
I'm reading a considerable number of XLSX files, with multiple sheets per file, everyone with the same layout: 16 columns and n entries. All these files are stored inside a table named fileID.
What I did with the vertcat function worked fine because all the files inside my directory had the same headers. The problem is that now I need to add other files with identical headers as the previous ones, except for one that varies slightly (EPS Density vs EPP Density, in column #6). Since there is this mismatch on one column header of these new files, the vertcat approach is useless at the moment.
Therefore, what I wanted to achieve was:
  • Specify a common "Density" column without considering the EPS and EPP columns of the various files, so replace column #6 in the image attached and merge everything.
or
  • Having two different columns close by, adding a new EPP Density column between columns #6 and #7.
close all
clear all %#ok<CLALL>
clc
%warning('off')
%% IMPORT
D = '/Users/nda/Documents/Helmet/Cube/Data/C2/Impacts';
J = fullfile(D,'*.xlsx');
S = dir(J);
% READ ALL XLSX IN DIRECTORY, READ ALL SHEETS PER FILE
% ADD EXTRA COLUMN AND STORE FILENAMES IN IT
for j = 1:numel(S)
F = fullfile(D,S(j).name);
[~,sheet_name]=xlsfinfo(F);
opts = detectImportOptions(F,'NumHeaderLines',0,'ImportErrorRule','omitrow');
for k=1:length(sheet_name)
fileID{j,k} = readtable(F,opts,'Sheet',k);
fileID{j,k}{:,end+1} = string(S(j).name);
end
end
fileID = fileID(~cellfun('isempty',fileID));
fileID = vertcat(fileID{:});
%% ELABORATE
...
Thanks for any help you gurus can provide!

채택된 답변

Cris LaPierre
Cris LaPierre 2021년 10월 27일
Consider manually naming your variables. Then, as long as you have the same number of columns in each sheet, you are able to import without worrying about the names used in the file.
...
opts = detectImportOptions(F,'NumHeaderLines',0,'ImportErrorRule','omitrow');
opts.VariableNames = ["Name1","Name2"]
  댓글 수: 4
Nicolò Dall'Acqua
Nicolò Dall'Acqua 2021년 10월 28일
Understood, thank you! Updated code:
close all
clear all %#ok<CLALL>
clc
%warning('off')
%% IMPORT
D = '/Users/nda/Documents/Helmet/Cube/Data/C2/Impacts';
J = fullfile(D,'*.xlsx');
S = dir(J);
% READ ALL XLSX IN DIRECTORY, READ ALL SHEETS PER FILE
% ADD EXTRA COLUMN AND STORE FILENAMES IN IT
for j = 1:numel(S)
F = fullfile(D,S(j).name);
[~,sheet_name]=xlsfinfo(F);
opts = detectImportOptions(F,'NumHeaderLines',0,'ImportErrorRule','omitrow');
opts.VariableNames = ["date","codeID","model","size","headform","density","certification",...
"condition","type_test","type_anvil","location","peakG","CSF","SF","result","notes","path"];
opts.VariableTypes = ["string","string","string","string","string","string","string",...
"string","string","string","string","double","double","double","string","string","string"];
for k=1:length(sheet_name)
fileID{j,k} = readtable(F,opts,'Sheet',k);
fileID{j,k}{:,end} = {S(j).name};
end
end
fileID = fileID(~cellfun('isempty',fileID));
fileID = vertcat(fileID{:});
%% ELABORATE
...
Walter Roberson
Walter Roberson 2021년 10월 28일
Ah, changing the variable type to always be string is an interesting approach that should work too.
Side note: you can construct those two string() vectors outside the loop, since they are not changing inside the loop.
I think you might even be able to construct the entire opts outside the loop (construct it on the first file.)

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Walter Roberson
Walter Roberson 2021년 10월 28일
편집: Walter Roberson 2021년 10월 28일
%% IMPORT
D = '/Users/nda/Documents/Helmet/Cube/Data/C2/Impacts';
J = fullfile(D,'*.xlsx');
S = dir(J);
% READ ALL XLSX IN DIRECTORY, READ ALL SHEETS PER FILE
% ADD EXTRA COLUMN AND STORE FILENAMES IN IT
for j = 1:numel(S)
F = fullfile(D,S(j).name);
[~,sheet_name]=xlsfinfo(F);
opts = detectImportOptions(F, 'VariableNamingRule', 'preserve', ImportErrorRule','omitrow');
for k=1:length(sheet_name)
T = readtable(F,opts,'Sheet',k);
T.Filename = repmat(string(S(j).name), height(T), 1);
if ismember('EPS Density', T.Properties.VariableNames)
T.('EPP Density') = repmat("N/A", height(T), 1);
else
T.('EPP Density') = string(T.('EPP Density'));
T.('EPS Density') = nan(height(T),1);
end
T = movevars(T, {'EPS Density', 'EPP Density'}, 'After', 'HeadFormSize');
fileID{j,k} = T;
end
end
fileID = fileID(~cellfun('isempty',fileID));
fileID = vertcat(fileID{:});
%% ELABORATE
...
  댓글 수: 1
Nicolò Dall'Acqua
Nicolò Dall'Acqua 2021년 10월 28일
I realized it will be more easy for me using Cris's approach, but thanks a lot Walter for your reply.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Spreadsheets에 대해 자세히 알아보기

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by