MATLAB Answers

Extract numbers from mixed string

조회 수: 1,310(최근 30일)
K E
K E 19 Jul 2012
I have a file containing header lines like the following,
Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = 50
Operator Note: Air Temperature=20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)
For a given parameter such as MaxDistance or Wind Speed, I would like to extract its numerical value. This is tricky because sometimes there is an equal sign, space, or units, and sometimes there is not, because different operators enter their notes differently (lesson: next time enforce consistency).
How would I extract the following: All numerical characters (ignoring spaces and equal signs but keeping decimal points) that appear after the string representing the parameter name. Stop when a letter or punctuation mark is reached. In the case of 'MaxDistance', I would obtain 60. In the case of Wind Speed, I would obtain 16.375.
  댓글 수: 2
Jianming She
Jianming She 17 Jun 2020
This seems a more general way:
function numArray = extractNumFromStr(str)
str1 = regexprep(str,'[,;=]', ' ');
str2 = regexprep(regexprep(str1,'[^- 0-9.eE(,)/]',''), ' \D* ',' ');
str3 = regexprep(str2, {'\.\s','\E\s','\e\s','\s\E','\s\e'},' ');
numArray = str2num(str3);
Example:
a = 'alpha=-3.5,beta=1e-2. but gamma = -34.1'
numArray = extractNumFromStr(a)
numArray =
-3.5000 0.0100 -34.1000

댓글을 달려면 로그인하십시오.

채택된 답변

Jan
Jan 19 Jul 2012
편집: Jan 19 Jul 2012
Import the file into a string at first, e.g. by fileread. Then you get something like this (if not, please explain all necessary details):
Str = ['Test setup: MaxDistance = 60 m, Rate = 1.000, ', ...
'Permitted Error = 50 Operator Note: Air Temperature=20 C, ', ...
'Wind Speed 16.375m/s, Altitude 5km (Cloudy)'];
Now omit all equal characters:
Str(strfind(Str, '=')) = [];
Finally you can get the values:
Key = 'MaxDistance';
Index = strfind(Str, Key);
Value = sscanf(Str(Index(1) + length(Key):end), '%g', 1);
"Index(1)" cares for multiple occurences of the key.
  댓글 수: 3
Lorenzo
Lorenzo 30 Oct 2013
This works great! Just a quick question Jan: what if you want to find all the uccurrence of a numeric value between two strings? For instance, let's say you want the numeric values that can be found between MaxDistance and Altitude in the original example (i.e. 60, 1000, 50 ecc ecc...). How can you achieve that?
I tried this:
Key1 = 'MaxDistance'; Key2 = 'Altitude'; Index1 = strfind(file, Key1); Index2 = strfind(file, Key2); Value = sscanf(file(Index1:Index2), '%g',1);
but still I can get nothing but the first value.... Also, I dont know a-priori the number of numbers that can be encontured between the two strings...
Thanks!
Lorenzo

댓글을 달려면 로그인하십시오.

추가 답변(5개)

Stephan Koehler
Stephan Koehler 7 Jun 2017
Here is a one-line answer str2num( regexprep( Str, {'\D*([\d\.]+\d)[^\d]*', '[^\d\.]*'}, {'$1 ', ' '} ) )
  댓글 수: 2
Marco Andres Acevedo Zamora
hi, good answer but how to include the - sign (if present). Thanks.

댓글을 달려면 로그인하십시오.


Freddy
Freddy 19 Jul 2012
Maybe a little bit too late, but i like to present you also my ("regexp training"-) solution. :)
A = regexp(Str,'(?<Keyword>(?:\w+\s*\w+))\s*=?\s*(?<Value>\d+\.?\d*)','names');
s = struct();
for i = A,
s.(genvarname(i.('Keyword'))) = str2double(i.('Value'));
end
  댓글 수: 1
Albert Yam
Albert Yam 19 Jul 2012
That took a long time for me to understand what you are doing. That's cool though.
How does it skip over 'Operator Note:' ?
Edit: Never mind I get it. It doesn't have anything for ':'. The '(?:\w' has nothing to do with a ':' in the string, it is grouping the token for 'up to two words'.

댓글을 달려면 로그인하십시오.


Albert Yam
Albert Yam 19 Jul 2012
This is how I went about it, all steps included even the errors.
teststr = 'Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = 50 Operator Note: Air Temperature=20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)';
regexp(teststr,[\d])
regexp(teststr,['\d'])
regexp(teststr,['\d'],'match')
regexp(teststr,['\d+'],'match')
regexp(teststr,['\d+.?'],'match')
regexp(teststr,['\d+\.?'],'match')
regexp(teststr,['\d+\.?\d?'],'match')
regexp(teststr,['\d+\.?\d+?'],'match')
regexp(teststr,['\d+\.?\d*?'],'match')
regexp(teststr,['\d+\.?\d?'],'match')
regexp(teststr,['\d+\.?\d*'],'match')
  댓글 수: 5
G
G 7 Nov 2013
Better:
regexp(teststr,'\d+\.?\d*|-\d+\.?\d*|\.?\d+|-\.?\d+','match')
or
regexp(teststr,'-?\d+\.?\d*|-?\d*\.?\d+','match')
remains the -.34e-004 case !

댓글을 달려면 로그인하십시오.


C.J. Harris
C.J. Harris 19 Jul 2012
In order to extract a certain value:
Str = ['Test setup: MaxDistance = 60 m, Rate = 1.000, ', ...
'Permitted Error = 50 Operator Note: Air Temperature=20 C, ', ...
'Wind Speed 16.375m/s, Altitude 5km (Cloudy)'];
matchWord = 'Air Temperature';
[a,b] = regexp(Str,'\d+(\.\d+)?');
strPos = find(a > strfind(Str,matchWord),1,'first');
nValue = str2double(Str(a(strPos):b(strPos)));

Dahai Xue
Dahai Xue 10 Mar 2016
편집: KSSV 25 Jan 2021
C.J. Harris, I put your regexp into a function to extract all numbers using regexp. I have hard time to find an array operation that can use the 'a' and 'b' without the loop. Hopefully somebody has ideas. Of course it is not difficult to add more parameters or options to find "certain" numbers with preceding or following landmark strings.
function nums = regExtractNums(str)
[a,b] = regexp(str, '\d+(\.\d+)?');
nums = zeros(length(a),1);
for k = 1:length(a)
nums(k) = str2double(str(a(k):b(k)));
end
end

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by