MATLAB Answers

0

Using Textscan on non-uniform data

Asked by Russell Nasrallah on 18 Jun 2019
Latest activity Edited by per isakson
on 21 Jun 2019
Hello all,
I am currently trying to format outputs from a fortran code into CSV using the textscan function in Matlab. My outputs from the fortran code have a semi-uniform output, but it can change depends on the number of nodes the user requests.
In this example, the user has specified 12 nodes and the text files looks like the following:
Interpolated values at stations
168 12
867600.0000000000 % this is the time step at which the values at the following 12 nodes is found.
1 0.3170054495E+03
2 0.2983347787E+03
3 0.2857833907E+03
4 0.2825696256E+03
5 0.2795692154E+03
6 0.2806315572E+03
7 0.2811630597E+03
8 0.2814156663E+03
9 0.2814718273E+03
10 0.2811785316E+03
11 0.2807765370E+03
12 0.2798665405E+03
871200.0000000000
1 0.3042805523E+03
2 0.3033600277E+03
3 0.2913505094E+03
4 0.2790455081E+03
5 0.2709832029E+03
6 0.2680434294E+03
7 0.2677295494E+03
8 0.2684905990E+03
9 0.2690373464E+03
10 0.2696588011E+03
11 0.2699294457E+03
12 0.2697688946E+03
Currently, I have textscan skipping the first two lines. My final output goal would looks somethign like the following:
timestep 1, node 1, node 2, ..., node 11, node 12
timestep 2, node 1, node 2, ..., node 11, node 12
Currently, I would like the code to be smart enough to tell the number of nodes that the user supplied (provided in the second line of the above text), and also be able to distinguish between the timestep lines and the node lines.
Any suggestions?
I've attached a example of one of my text files.

  1 Comments

Search Answers for read text tag:block in the search field in the upper right corner.

Sign in to comment.

2 Answers

Answer by per isakson
on 19 Jun 2019
Edited by per isakson
on 19 Jun 2019
 Accepted Answer

An exercise with fscanf()
%%
ffs = "HS_full_18md_nam_outputs.txt";
fid = fopen( ffs, 'r' );
[~] = fgetl( fid );
num = fscanf( fid, '%d%d', [2,1] );
buf = fscanf( fid, ['%f', repmat('%*d%f', 1,num(2) ) ], [num(2)+1,inf] );
[~] = fclose( fid );
out = permute( buf, [2,1] );
peek on the result
>> out(1:3,1:6)
ans =
8.676e+05 1.9204e-05 3.981e-05 5.6839e-05 7.3688e-05 9.2944e-05
8.712e+05 2.2396e-05 4.0073e-05 5.1601e-05 6.1428e-05 7.487e-05
8.748e+05 1.9849e-05 3.1175e-05 4.2591e-05 5.3355e-05 6.5603e-05
>>

  6 Comments

"importance of the "[~] =" " Walter explains my intention well.
  • [~] = fgetl( fid ); I read that as: read one line and skip the result
  • [~] = fclose( fid ); is a reminder that the validity of the file ID is not checked.
In code that will be in use one month from now, fid = fopen( ffs, 'r' ); ought to be replaced by
[ fid, msg ] = fopen( ffs, 'r' );
assert( fid >= 3, 'abc:def:CannotOpenFile' ...
, 'Cannot open the file, "%s". Message: %s' ...
, ffs, msg )
Maybe there is less of a need for check in recent releases of Matlab than it used to be.
Thanks for the response from both of you again.
Per,
When you say "In the code that will be used one month from now..." what do you mean? Is there an update coming that is going to modify the fopen function?
"In the code that will be used one month from now..."
I try to say that there are two types of code regarding error handling:
  • Small scripts/functions that you yourself use a few times during a short period of time. In this case it might be ok to skip error checking. Matlab will show more or less relevant error messages at lines several lines "too late".
  • Scripts/functions that will be used over a longer period of time. In this case error handling with good messages can help find the real cause of the problem quickly.

Sign in to comment.


Answer by Walter Roberson
on 18 Jun 2019

fid = fopen('HS_full_18md_nam_outputs.txt');
fgets(fid); %skip header
ctl = fscanf(fid, '%f%f', 2);
Nt = ctl(1);
Ns = ctl(2);
data = zeros(Nt, Ns+1);
for ts = 1 : Nt
timestep = fscanf(fid, '%f', 1);
thisdata = cell2mat(textscan(fid, '%*f%f', Ns));
data(ts, 1) = timestep;
data(ts, 2:end) = thisdata;
end
fclose(fid);

  4 Comments

Walter,
I hope you don't mind me asking you a question regarding the fscanf and textscan functions.
Am I right in thinking that when you set a matrix value
(I.E. ctl = fscanf(fid, '%f%f', 2);)
That the fscanf function stops once the matrix is full, and then the read position is resting at th end of that line?
Is the same true for textscan? Is this why you do not need to increment the line, but simple repeat the textscan for each line?
Thanks for your input!
fscanf and textscan both stop when the size inputs have been satisfied, leaving the input buffer position immediately after the last character that was consumed. That might be in the middle of a line.
Neither function specifically processes line by line. Instead, unless you use uncommon options, both ignore leading whitespace including line boundaries. If for example you ask for 3 numbers then neither function cares whether the input is
1 2 3
Or
1
2 3
(note the empty line on input)
There is a difference between the two though. For fscanf the count you provide is the total number of values to read. For textscan the count is the number of times to repeat the format. In cases where a format describes an entire line then typically that can be interpreted as the number of lines to read (not entirely accurate if the values are not in the expected format)
Thank you for this clear and concise answer, Walter! I totally understand these tools much better now.

Sign in to comment.



Translated by