이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
Extracting specific data from multiple excel files and create a single matrix from those
조회 수: 53 (최근 30일)
이전 댓글 표시
Hi, I have a file on my computer with close to 1000 excel files and I don't want to manually extract the second row from every excel file manually and combine into a single excel file.
I was wondering how I can do this on Matlab through reading every Excel file in that specific file and taking out all of the columns from the second rows and then combining all that into a single matrix?
Thanks for the help
채택된 답변
Mathieu NOE
2021년 8월 31일
hello
this is one example if you want to work out the entire folder
I assumed it would be numeric data so I used importdata (faster)
I also sorted the files names in natural order in case it might be relevant
It works even if your files have different size (number of columns)
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(cd,'*.xlsx')); % get list of all excel files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order
M= length (fileNames_sorted);
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_row{f} = raw(2,:); % extract the second row
end
% write all second row lines into a matrix and store it in excel file
writecell(second_row',fullfile(cd,outfile));
댓글 수: 14
Jonas Freiheit
2021년 8월 31일
Hi Mathieu, I am getting this error reading
Unrecognized function or
variable 'second_row'.
Error in output (line 26)
writecell(second_row',fullfile(cd,outfile));
>>
What should I do?
Thanks
Mathieu NOE
2021년 8월 31일
hello Jonas
can you check if the for loop is working ok ? do you get an output from this line :
what is displayed in your workspace when you type raw ?
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
Jonas Freiheit
2021년 8월 31일
When I type raw, I get >> raw
Unrecognized function or
variable 'raw'.
yep my raw is exactly the same as yours.
Mathieu NOE
2021년 8월 31일
so this means this line of code could not be executed ....
if you can share a couple of xls files I would like to test on my side...
Jonas Freiheit
2021년 8월 31일
- 18136a AS ABS BC.CSV
- 18136a Deep Ocean control AS ABS BC(auto).CSV
- 18136a Deep Ocean control.CSV
- 18136b AS ABS BC.CSV
- 18136b Deep Ocean 2%.CSV
- 18136c AS ABS BC.CSV
- 18136c Deep Ccean 2.5%.CSV
- 18136d AS ABS BC.CSV
- 18136d Deep Ocean 3%.CSV
- 18136e AS ABS BC.CSV
- 18136e Deep Ocean 3.5%.CSV
- 18136f AS ABS BC.CSV
- 18136f Deep Ocean 4%.CSV
- 18136g AS ABS BC.CSV
- 18136g Deep Ocean 4.5%.CSV
- 18136h AS ABS BC.CSV
- 18136h Deep Ocean 5%.CSV
- 18136i AS ABS BC.CSV
- 18136i Deep Ocean 6%.CSV
- 18136j AS ABS BC.CSV
- 18136j Deep Ocean 8%.CSV
- 18136k AS ABS BC.CSV
- 18136k Deep Ocean 10%.CSV
- ARTEMIS_SET_control_10%.xlsx
Yeah sure, I'll send you a small folder of samples. I've tried using this folder and it returned an excel file called OUT that only had two zeroes in the excel file.
From the previous test I tried using a folder that contained 1000 excel files and Matlab somehow printed that error possibly because its too hard to process?
Also if this is the case I could create multiple smaller folders and then combine that into the Output matrix that was previously obtained and how would I do that?
Thanks
Mathieu NOE
2021년 9월 1일
hello again
I simply modified one of your data file (ARTEMIS_SET_control_10%.xlsx) from xlsx to CSV format so they all have the same format. If you need to dig with a mix of CSV and XLSX files , I could update my code.
After that , I had not much to do and this code works like a charm, at least for this batch of 24 files;
see also at the end of my code , if you wish to have the filenames also stored in the OUT file, attached FYI
slightly updated code :
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(fileDir,'*.csv')); % get list of files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order (https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort)
M= length (fileNames_sorted);
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_row{f} = raw(2,:); % extract the second row
end
% write all second row lines into a matrix and store it in excel file
writecell(second_row',fullfile(cd,outfile)); % without filenames
% writecell([fileNames_sorted' second_row'],fullfile(cd,outfile)); % with filenames stored in column 1
(seems to me a lot of files had the same data inside , just the file names are different)
Jonas Freiheit
2021년 9월 1일
Sorry this is not working for me, all the files are different but its only printing out 401, 0 for every column.
Its supposed to do (2,:) copy everything in the 2nd row from each excel file and then every row thats copied to copy over into a single excel file or a matrix on matlab. Since there are 24 excel files in this batch there are supposed to be 24 rows and 936 columns for the OUT matrix
Jonas Freiheit
2021년 9월 1일
Since this is to be used for principal components analysis. I only am interested in the second row since its got the absorption of the spectra and the first row simply is the wavenumbers which is the same for all the excel files and is useless information
Mathieu NOE
2021년 9월 1일
ok I believe I guess what you really want- which is not what I understood so far
so you want the 2nd column (and not the second row ! ) of each data file
then ok the output size will be 24 rows and 936 columns
as all input files have same dimensions, I could make the code simpler and use writematrix instead of writecell
here code :
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(fileDir,'*.csv')); % get list of files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order (https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort)
M= length (fileNames_sorted);
second_col= [];
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_col= [second_col raw(:,2)]; % extract the second column
end
% write all second columns lines into a matrix and store it in excel file
writematrix(second_col,fullfile(cd,outfile));
Jonas Freiheit
2021년 9월 1일
Sorry I have another question, I need to create a { } cell which contains group names within it.
I need to create one thats a 5046x1 cell. containing 1682 entries saying Spot1 then 1682 entries saying Spot 2 then 1682 entries saying Spot 3. This is to group off my spectra, Do you want me to repost this so you can get more votes?
Thanks
Mathieu NOE
2021년 9월 1일
Hi Jonas
it's not just a question of votes but yes indeed each question / topic should be addressed in a separate post
this way you can also get ore answers because it's not burried in the original post .
추가 답변 (1개)
Ive J
2021년 8월 31일
편집: Ive J
2021년 8월 31일
You can use readmatrix (assuming all values are numeric, otherwise use readtable) or fileDatastore to read those files. Something like this should work:
myfiles = ["file1.xlsx", "file2.xlsx"]; % file names: use dir to generate file names within the target directory
data = [];
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
댓글 수: 13
Jonas Freiheit
2021년 8월 31일
Hi Ive J, Thanks for that, I was wondering how can I generate all the file names into myfiles without writing all the files out?
Thanks
Ive J
2021년 8월 31일
This would work:
targetDir = pwd; % only if files are within the current directory
filenames = string({dir(fullfile(targetDir, '*.xlsx')).name}); % converted to string to be used in my example above
Jonas Freiheit
2021년 8월 31일
Hi Ive, should I plug filenames into myfiles as myfiles=[filenames]
Because doing that gives me only 'Output' from the code as the output?
Ive J
2021년 8월 31일
filenames and myfiles are the same. You can use my example as:
targetDir = pwd; % only if files are within the current directory
myfiles = string({dir(fullfile(targetDir, '*.xlsx')).name});data = [];
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
% data is the matrix you're trying to generate from your excel files.
Ive J
2021년 9월 1일
What's output? There is no variable called output in my snippet. I used some of your files and it works just fine:
targetDir = pwd; % only if files are within the current directory
myfiles = string({dir(fullfile(targetDir, '*.csv')).name});data = []; % you mentioned you have Excel files, which was wrong
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
myfiles =
"18136a AS ABS BC.CSV"
"18136c AS ABS BC.CSV"
"18136d Deep Ocean 3%.CSV"
"18136f AS ABS BC.CSV"
"18136g Deep Ocean 4.5%.CSV"
data =
401.1380 0
401.1380 0
401.1380 0
401.1380 0
401.1380 0
Jonas Freiheit
2021년 9월 1일
Sorry Ive J, That works but I can't write each file out individually since its 1000 files.
Would you know how to make it work without manually inputting the file names?
Cheers
Ive J
2021년 9월 1일
I didn't manually read files. The second line gets the names of all CSV files within the directory.
Jonas Freiheit
2021년 9월 1일
Sorry, I mean its only printing out a 2x5 matrix with 401.1380 and 0 like shown in the data.
Its supposed to be for example a 25x936 matrix if there are 25 excel files containing infrared spectra with 2 rows and 936 columns each. The 1st row only contains the wavenumber data which needs to be ignored and the 2nd row contains the absorption which is different for every excel file and needs to be extracted and then combined into the final matrix.
I'm using this for principal components analysis.
Ive J
2021년 9월 1일
It's 5X2 matrix because first, I only used 5 CSV files, and secondly, your sample CSV files contained only 2 columns, so it would be simply a 5X2 matrix.
In case you have 1000 files in the folder, myfiles would be a string array of 1000 CSV file names. If each of those CSV files have also 936 columns, then the resulting matrix would be of size 1000X936.
Note that this line
myfiles = string({dir(fullfile(targetDir, '*.csv')).name});
extracts all CSV files within the target directory (path to the folder your CSV files are in), so doesn't matter if there are 5, 1000 or even more CSV files there, myfiles would still contain all those files which then will be looped over to extract the values exist in 2nd line.
Jonas Freiheit
2021년 9월 1일
Sorry there are 936 columns in each excel file it just doesn't work for some reason..
Ive J
2021년 9월 1일
Please attach some of these 1000 files (with 936 columns) you're trying to work with.
Also, please be more specific with ...doesn't work for some reason.. What exact error do you get in command window when running my snippet?
Jonas Freiheit
2021년 9월 1일
Sorry, the error was that it was printing out only 401 and 0. The problem has been solved now I really appreciate the help.
참고 항목
카테고리
Help Center 및 File Exchange에서 Spreadsheets에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)