textscan of mixed data type data file

조회 수: 3 (최근 30일)
Ashraf Alfandi
Ashraf Alfandi 2022년 2월 17일
댓글: Walter Roberson 2022년 10월 27일
I'm trying to import the data from a column-based text file into MATLAB matrix, in which each column of the matrix includes the headerline and its corresponding data column. The file consist of headerline followed by columns of data, as you may see in the attachemnt. I need MATLAB to read the first line (i.e. the headerline: string/char data type) and detect how many headers are there, which corresponds to the number of variables in the file, then read the following data (double data type) in columns based.
  댓글 수: 2
Walter Roberson
Walter Roberson 2022년 2월 17일
textscan(fid, '', 'HeaderLines', 1)
would tell textscan() to skip one line and then figure out by itself how many columns there are.
Do you need the variable names to be remembered, or were you just looking to figure out how many columns were there?
Ashraf Alfandi
Ashraf Alfandi 2022년 2월 17일
I need both. Headerlines will help me retreive the data column based on it's title/headerline.

댓글을 달려면 로그인하십시오.

채택된 답변

Ashraf Alfandi
Ashraf Alfandi 2022년 2월 17일
편집: Ashraf Alfandi 2022년 2월 17일
Thanks Mathieu for your ansewr. It's definitly very copmrehensive, but time consuming for what I need to run. After all, I came up with the follwoing simple code that takes ~ 0.0009 sec wheras yours takes 0.5 seconds
FileName = "DNP MM SYS test.dat";
test = importdata(FileName);
Data = test.data; % Extracting the data via importdata
N = length(Data(1,:)); % Detecting the number of columns
fid = fopen(FileName);
Head = textscan(fid,'%q', N+1,'HeaderLines',1); % use the N to tell textscan how many strings to expect
Head = [Head{:}]'; Head = Head(2:end);
fclose(fid);
  댓글 수: 1
Mathieu NOE
Mathieu NOE 2022년 2월 17일
hello
no problem
I got this for my code : Elapsed time is 0.027212 seconds.
for your code : Elapsed time is 0.047455 seconds.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Mathieu NOE
Mathieu NOE 2022년 2월 17일
hello
try this
readclm is a old but still valuable function (don't even remember where it came from)
the variable names are stored in cell array var
[DATA,HEAD] = readclm('DNP SYS test.txt');
var = split(HEAD,' "');
var = var(2:end);
var = strrep(var,'"',''); %get rid of double quotes
function [outdata,head] = readclm(filename,nclm,skip,formt)
% READCLM Reads numerical data from a text file into a matrix.
% Text file can begin with a header or comment block.
% [DATA,HEAD] = READCLM(FILENAME,NCLM,SKIP,FORMAT)
% Opens file FILENAME, skips first several lines specified
% by SKIP number or beginning with comment '%'.
% Then reads next several lines into a string matrix HEAD
% until the first line with numerical data is encountered
% (that is until first non-empty output of SSCANF).
% Then reads the rest of the file into a numerical matrix
% DATA in a format FORMAT with number of columns equal
% to number of columns of the text file or specified by
% number NCLM. If data does not match the size of the
% matrix DATA, it is padded with NaN at the end.
%
% READCLM(FILENAME) reads data from a text file FILENAME,
% skipping only commented lines. It determines number of
% columns by the length of the first data line and uses
% the floating point format '%g';
%
% READCLM uses FGETS to read the first lines and FSCANF
% for reading data.
% Defaults and parameters ..............................
formt_dflt = '%g'; % Default format for fscanf
addn = nan; % Number to fill the end if necessary
% Handle input ..........................................
if nargin<1, error(' File name is undefined'); end
if nargin<4, formt = formt_dflt; end
if nargin<3, skip = 0; end
if nargin<2, nclm = 0; end
if isempty(nclm), nclm = 0; end
if isempty(skip), skip = 0; end
% Open file ............................
[fid,msg] = fopen(filename);
if fid<0, disp(msg), return, end
% Find header and first data line ......................
is_head = 1;
jl = 0;
head = ' ';
while is_head % Add lines to header.....
s = fgets(fid); % Get next line
jl = jl+1;
is_skip = jl<=skip;
is_skip = jl<=skip | s(1)=='%';
out1 = sscanf(s,formt); % Try to read this line
% If unreadable by SSCANF or skip, add to header
is_head = isempty(out1) | is_skip;
if is_head & ~is_skip
head = str2mat(head,s(1:length(s)-1)); end
end
head = head(2:size(head,1),:);
% Determine number of columns if not specified
out1 = out1(:)';
l1 = length(out1);
if ~nclm, nclm = l1; end
% Read the rest of the file ..............................
if l1~=nclm % First line format is different from ncolumns
outdata = fscanf(fid,formt);
lout = length(outdata)+l1;
ncu = ceil(lout/nclm);
lz = nclm*ncu-lout;
outdata = [out1'; outdata(:); ones(lz,1)*addn];
outdata = reshape(outdata,nclm,ncu)';
else % Regular case
outdata = fscanf(fid,formt,[nclm inf]);
outdata = [out1; outdata']; % Add the first line
end
fclose (fid); % Close file ..........
end

Wesser
Wesser 2022년 10월 27일
So I originally had the script as below. It works perfectly when all the Obs_Node.out files have the same number of rows. But when the Obs_node.out files have a different number of rows, I can't compile the columns from each forloop. For example,
THETA_ObsNode(:,i) = theta_ObsNode(:);
will result in an error like:
"Unable to perform assignment because the size of the left side is 200000-by-1 and the size of the right side is
117648-by-1.
Error in MC_Data_Compile (line 67)
THETA_ObsNode(:,i) = theta_ObsNode(:); "
I am ultimatly trying to compile each column from each forloop into one file for that respective column....if that makes sense. My qestion then is how do I compile the data when the lengths of the column vary?
num_sim = 1000; %1000 monte carlo simulations
Node_CONC=zeros(200000,num_sim); %200000 is an arbitrarilly large number of rows
%~~~~~~~~~~Coalesce data from Obs_Node.out files~~~~~~~~~~~~
for i=1:num_sim
Obs_Node = fopen(["/Users/apple/Dropbox/My Mac (apple’s MacBook Pro)/Desktop/Simulations/MC_"+num2str(i)+'/Obs_Node.out']); % Open monte carlo output file in Path (i)
skip_lines=11; %skip all the lines until the output data of interest
for k=1:(skip_lines)
x=fgetl(Obs_Node);
end
temp1 = fscanf(Obs_Node,'%f',[5,Inf]); %scan the matrix of data
TEMP1 = temp1'; % transpose data
theta_ObsNode = TEMP1(:,3); % Hydraulic Conductivity
THETA_ObsNode(:,i) = theta_ObsNode(:); %%%% this line saves each iteration's data in a seperate file
flux_ObsNode = TEMP1(:,4); % Water Flux
FLUX_ObsNode(:,i) = flux_ObsNode(:);
Conc_ObsNode = TEMP1(:,5); % Concentration g/cm3
CONC_ObsNode(:,i) = Conc_ObsNode(:);
fclose(Obs_Node);
end
  댓글 수: 1
Walter Roberson
Walter Roberson 2022년 10월 27일
Pad the arrays for the shorter data.
Here I use NaN to pad, as it is clear that NaN is not valid data. The code could be a bit shorter if it was acceptable to pad with zeros instead of some other value.
The below code does not assume that all files except the last are the same length: it dynamically grows the array any time it encounters a larger file, making sure to extend the padding for any existing data.
num_sim = 1000; %1000 monte carlo simulations
Node_CONC=zeros(200000,num_sim); %200000 is an arbitrarilly large number of rows
%~~~~~~~~~~Coalesce data from Obs_Node.out files~~~~~~~~~~~~
for i=1:num_sim
Obs_Node = fopen(["/Users/apple/Dropbox/My Mac (apple’s MacBook Pro)/Desktop/Simulations/MC_"+num2str(i)+'/Obs_Node.out']); % Open monte carlo output file in Path (i)
skip_lines=11; %skip all the lines until the output data of interest
for k=1:(skip_lines)
x=fgetl(Obs_Node);
end
temp1 = fscanf(Obs_Node,'%f',[5,Inf]); %scan the matrix of data
TEMP1 = temp1'; % transpose data
theta_ObsNode = TEMP1(:,3); % Hydraulic Conductivity
flux_ObsNode = TEMP1(:,4); % Water Flux
Conc_ObsNode = TEMP1(:,5); % Concentration g/cm3
num_obs_here = length(theta_ObsNode);
if i == 1
THETA_ObsNode = nan(num_obs_here,num_sim);
FLUX_ObsNode = THETA_ObsNode;
CONC_ObsNode = THETA_ObsNode;
THETA_ObsNode(:,i) = theta_ObsNode;
FLUX_ObsNode(:,i) = flux_ObsNode;
CONC_ObsNode(:,i) = conc_ObsNode;
elseif num_obs_here <= size(Theta_ObsNode,1)
THETA_ObsNode(1:num_obs_here,i) = theta_ObsNode;
FLUX_ObsNode(1:num_obs_here,i) = flux_ObsNode;
CONC_ObsNode(1:num_obs_here,i) = conc_ObsNode;
else
THETA_ObsNode(end+1:num_obs_here,:) = NaN;
FLUX_ObsNode(end+1:num_obs_here,:) = NaN;
CONC_ObsNode(end+1:num_obs_here,:) = NaN;
THETA_ObsNode(:,i) = theta_ObsNode;
FLUX_ObsNode(:,i) = flux_ObsNode;
CONC_ObsNode(:,i) = conc_ObsNode;
end
fclose(Obs_Node);
end

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Text Files에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by