How To Load Multiple Text Files (specific context)

Hello MatLab community, I would like to load many text files (same # of rows and columns) contained in a same folder and compile/stock all 2nd columns in a one matrix.
Here's a example : For 30 text files, the resulting matrix would thus have 30 columns and as many rows as the files contain (specifically, they'd all have 2048 rows).
But here's the catch, there's a multi-lines header (something like 8 lines of header) before the data and the data is separated by a semicolon '' ; ''.
One of the text files is attached as an example.
Also, the names of the text files do NOT follow a certain pattern and they are quite random. I've already asked a very similar question here, but I wasn't considering the header. One helpful guy wrote the script below and I'd like to tweek it a little bit to include the right parameters for the textscan().
% Set input folder
input_folder = 'C:\Users\Cotet\Downloads';
% Read all *.txt files from input folder
% NOTE: This creates a MATLAB struct with a bunch of info about each text file
% in the folder you specified.
files = dir(fullfile(input_folder, '*.txt'));
% Get full path names for each text file
file_paths = fullfile({files.folder}, {files.name});
% Read data from files, keep second column
for i = 1 : numel(file_paths)
% Read data from ith file.
% NOTE: If you're file has a text header, missing data, or
% uses non white-space delimiters, you should check out the
% documentation for textread to determine which options to use.
data = textscan(file_paths{i}, '');
% Save second data column to matrix
% NOTE: Your data files all need to have the same number of rows for this to work
A(:, i) = data(:, 2);
end
The part with which I'm concerned is this note :
% NOTE: If you're file has a text header, missing data, or
% uses non white-space delimiters, you should check out the
% documentation for textread to determine which options to use.
I've tried many things, but was ultimately unsuccessful.
Thank you so much in advance.

댓글 수: 8

I’m not certain I understand what you’re doing. I also don’t understand the textread reference. You’re not calling textread in the code you posted. If you are using textread, see the param table in Description (link) and use the relevant name-value pair arguments to skip the header lines, define the delimiter, and anything else your files require.
Note that the textscan function with this line:
data = textscan(file_paths{i}, '');
is going to attempt to parse the file name you provided it, similar to the documentation section in Read Floating-Point Numbers (link).
If you want to read the data in the file, you first need to open it with the fopen function and assign it a file ID number. (Remember to close it with fclose after you read it.)
It might be clearer if I refer to the previous question I've asked on the forum :
Thanks for the comment.
dpb
dpb 2019년 6월 7일
편집: dpb 2019년 6월 7일
What's the issue in following the other poster's sage advice? Here's the beginning of the attached file (I inserted the line numbers)...
1
2 Integration time [ms]: 0.030
3 Averaging Nr. [scans]: 1
4 Smoothing Nr. [pixels]: 0
5 Data measured with spectrometer [name]: 1903395U1
6 Wave ;Sample ;Dark ;Reference;Scope
7 [nm] ;[counts] ;[counts] ;[counts]
8
9 189.95; 383.425; 0.000; 0.000
10 190.09; 416.425; 0.000; 0.000
11 90.24; 439.425; 0.000; 0.000
....
from which it's pretty easy to see there are 8 headerlines. As noted the delimiter is a semicolon so what's the problem with
data = textscan(file_paths{i}, '','headerlines',8,'delimiter',';');
Seems pretty straightforward.
If there aren't always the same number of header lines, then you may have a more difficult issue, but detectImportOptions will parse a file as regular as this easily and then you can use readtable instead or importdata will likely have no issues at an even simpler interface.
The question of how the files are named is something else entirely -- you'll have to have some way to either build a wildcard string that matches the subset you want or build a list manually or have some other way to do the selection on a case-by-case basis--Matlab is smart, but it's not prescient in being able to discern who'w wanted and who's not automagically. As that other respondent noted, his solution works--move the wanted files into their own subdirectory.
Thanks for the reply, really appreciated. Though, when I run the following code with your addition, I get the error below :
% Set input folder
input_folder = 'C:\Users\Cotet\Desktop\Calendrier de Travail\06 - Juin\4 juin\Thomas - 4200';
% Read all *.txt files from input folder
% NOTE: This creates a MATLAB struct with a bunch of info about each text file
% in the folder you specified.
files = dir(fullfile(input_folder, '*.txt'));
% Get full path names for each text file
file_paths = fullfile({files.folder}, {files.name});
% Read data from files, keep second column
for i = 1 : numel(file_paths)
% Read data from ith file.
% NOTE: If your file has a text header, missing data, or
% uses non white-space delimiters, you should check out the
% documentation for textread to determine which options to use.
data = textscan(file_paths{i}, '', 'headerlines', 8, 'delimiter', ';'));
% Save second data column to matrix
% NOTE: Your data files all need to have the same number of rows for this to work
A(:, i) = data(:, 2);
end
% Calculate the average of the rows (second dimension) of A:
avg = mean(A, 2);
Error: File: MatLab - Conseil.m Line: 18 Column: 31
Invalid expression. When calling a function or indexing a variable, use parentheses. Otherwise, check for mismatched delimiters.
What's the problem? I thought at first it might be the curvy brackets for the ""file_paths{i}"", but I've tried both the parenthesis () and the square brackets []. Same error. There's still something wrong with how I'm calling the textscan() function. Also, all the text file are identical in the sense that only the values change.
I've noticed something though in the text file. We use the semicolon as the delimiter, but as for the 4th column, there's no semicolon at its end. Is it still ok?
Thanks!
Actually, I've found the problem. it had to do with the double ending-parenthesis " ) " at the end of the "textscan()" function.
But, guess what? Another error appeared and this one puzzles me even more :
Error using textscan
Empty format character vector is not supported at the end of a file.
Error in MatLab - Conseil (line 17)
data = textscan(file_paths{i}, '', 'headerlines', 8, 'delimiter', ';');
Not having a semicolon at the end shouldn't be a problem. I suspect the issue is that there is not semicolon after the row number, so it is trying to treat '9 189.95' as an entire number. I think you would be better off using formatspec instead of marking a delimiter. Something like:
format = '%2.0d %5.2f; %5.2f; %5.2f';
data = textscan(file_paths{i}, format, 'headerlines', 8);
Thanks Bob for the response. Unfortunately, when I use the formatspec and your code, it returns one row of empty cells (there should be 2048 rows and 4 columns). However, I think you're onto something. The FOR loop runs until the end, which is a good thing, and by that I mean this :
data = textscan(file_paths{i}, format, 'headerlines', 8);
% Save second data column to matrix
% NOTE: Your data files all need to have the same number of rows for this to work
A(:, i) = data(:, 2);
The data has 2048 rows and 4 columns. Then, I ask MatLab to stock only the 2nd column. After, the FOR loop do this with my other 30 files. So, in the end, because I have 31 files in total, I should end up with a matrix containing 31 columns (representing the 2nd column of each file) and 2048 rows (all the values of each of those 2nd columns).
Now, I have 31 columns as desired, but only 1 row with empty values. How could we fix this?
Thomas Côté
Thomas Côté 2019년 6월 10일
편집: Thomas Côté 2019년 6월 10일
Oh, also, haha! The number '9 189.95' is actually only 189.95 (on the 9th row). The reason why you see "9 189.95" is because the commentator user "dpb" copied/pasted my text file into a MatLab script. The numbers should be red as this (text file attached) :
189.95; 424.600; 0.000; 0.000
190.09; 421.600; 0.000; 0.000
190.24; 427.600; 0.000; 0.000
190.38; 450.600; 0.000; 0.000
190.53; 421.600; 0.000; 0.000
190.68; 398.600; 0.000; 0.000

댓글을 달려면 로그인하십시오.

 채택된 답변

Guillaume
Guillaume 2019년 6월 10일

1 개 추천

As dpb suggested use one of the modern file import function such as readtable or readmatrix instead of the old textscan. These can figure the format of your file on their own or if they're struggling a bit have plenty of easy to understand options to help them along. They're also a lot more configurable, particularly if you use detectImportOptions.
For example, your text file is easily decoded with:
spectrum = readtable('1903395U1_04Jun19_154040_0001.Raw8.txt', 'HeaderLines', 8)
or for a neater table:
opts = detectImportOptions('1903395U1_04Jun19_154040_0001.Raw8.txt', 'ExpectedNumVariables', 4); %only needed once for all the files that follow the same format
spectrum = readtable('1903395U1_04Jun19_154040_0001.Raw8.txt', opts)
detectImportOptions automatically figure out that the header is 8 lines, that the delimiter is ; and that the name of the columns is on the 6th row. I've told it that there is only 4 variables despite the header having 5 names (why is there a 'scope'?).
You can easily wrap that in a loop over all the files. The detectImportOptions is only needed once if all the files follow the same format. You can store the table from each file into a cell array but if your aim is to run statistics across the files then you'd be better off storing it all as one flat table with an additional variable indicating which file the data comes from. After that you can use groupsumarry or similar to compute your statistics all at once.
So the code would be something like:
%Get list of files. You haven't explained how these can be obtained.
filelist = dir('C:\somefolder\*.txt');
%Loop to read all files:
spectra = cell(size(filelist)); %stored in a file array at first
opts = detectImportOptions(fullfile(filelist(1).folder, filelist(1).name, 'ExpectedNumVariables', 4);
for fileidx = 1:numel(filelist)
spectrum= readtable(fullfile(filelist(fileidx).folder, filelist(fileidx).name), opts); %read file
spectrum.Source = repmat({filelist(fileidx).name}, height(spectrum), 1); %add a variable indicating the source. Maybe you want to use only part of the filename
spectra{fileidx} = spectrum;
end
%flatten it all into one table
spectra = vertcat(spectra{:});
%compute some stats, e.g. mean and standard deviation of spectra at each wavelength across the files
groupsumarry(spectra, 'Wave', {'mean', 'std'}, {'Sample', 'Dark', 'Reference'})
Code untested. There might be typos. Read the error messages carefully. Note that I'm using meaningful variable names instead of the utterly useless A.

댓글 수: 1

Thanks/Merci Guillaume, I really appreciate your help and it worked! I made little modifications and here's the working script I'll use :
%Get list of files. You haven't explained how these can be obtained. God drops the files here!
filelist = dir('C:\Users\Cotet\Desktop\Calendrier de Travail\06 - Juin\4 juin\Thomas - 4200\*.txt');
%Loop to read all files:
spectra = cell(size(filelist)); %stored in a file array at first
opts = detectImportOptions(fullfile(filelist(1).folder, filelist(1).name), 'ExpectedNumVariables', 4);
for fileidx = 1:numel(filelist)
spectrum= readtable(fullfile(filelist(fileidx).folder, filelist(fileidx).name), opts); %read file
spectrum.Source = repmat({filelist(fileidx).name}, height(spectrum), 1); %add a variable indicating the source. Maybe you want to use only part of the filename
spectra{fileidx} = spectrum;
AllSpec(:, fileidx) = spectrum(:, 2);
end
utterly_useless_A = table2array(AllSpec);
% Calculate the average of the rows (second dimension) of utterly_useless_A:
avg = mean(utterly_useless_A, 2);
Spectrum_Avg = [table2array(spectrum(:,1)) avg];
I hope you don't mind the "utterly useless A". I really dig that name haha!
Have a great day!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Matrix Indexing에 대해 자세히 알아보기

제품

릴리스

R2019a

질문:

2019년 6월 7일

댓글:

2019년 6월 10일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by