Extracting numbers from mixed string

조회 수: 21 (최근 30일)
Al_G
Al_G 2018년 10월 7일
편집: Stephen23 2018년 10월 8일
I have filenames saved as strings such as '2001_06m'. Sometimes the files are inconsistently named as '2001_6m' (missing the zero before the 6) or '2001_06' (missing the m at the end). What code would I use to extract the non-zero integers after underscore in all cases (i.e. output = 6)?
And separately, what code would I use to extract the numbers before the underscore (usually they are 4 digits long, but sometimes 3 digits, i.e. '001' instead of '2001')?

채택된 답변

Jan
Jan 2018년 10월 7일
편집: Jan 2018년 10월 8일
s = '2001_06m';
d = sscanf(s, '%d_%d')
ans =
2001
6
Easier and faster than regexp.
[EDITED] If the input is a cell string:
C = {'2001_06m', '002_77q'};
S = sprintf('%s ', C{:});
S(S < '0' | S > '9') = ' '; % Mask all non-numbers
Num = sscanf(S, '%d %d ', [2, Inf]);
  댓글 수: 2
dpb
dpb 2018년 10월 8일
편집: dpb 2018년 10월 8일
That's true...excepting sscanf isn't vectorized (and I'd forgotten it will return the second value even though it finds the non-convertible character).
Guillaume
Guillaume 2018년 10월 8일
To be honest, none of the solutions are vectorised. Vectorising strsplit wouldn't be easy either. It wouldn't be too hard to vectorise the regexp solution, but sscanf is certainly more elegant.

댓글을 달려면 로그인하십시오.

추가 답변 (3개)

Guillaume
Guillaume 2018년 10월 7일
편집: Guillaume 2018년 10월 8일
A possible regexp version would be:
str2double(regexp(filename, '(\d+)_(\d+)', 'tokens', 'once'))
edit: following the discussion in Jan's answer, a vectorised version for when filenames is a cell array of char arrays or a string array:
tokens = regexp(filenames, '(\d+)_(\d+)', 'tokens', 'once');
str2double(vertcat(tokens{:}))
Note that the vertcat call will fail if a filename does not match the pattern.

Stephen23
Stephen23 2018년 10월 8일
편집: Stephen23 2018년 10월 8일
Fully vectorized, one line, and more efficient than regexp and/or str2double:
>> C = {'2001_06m','2001_7m','2001_08'};
>> sscanf(sprintf('%sm',C{:}),'%*d_%d%*[m]') % second number
ans =
6
7
8
>> sscanf(sprintf('%sm',C{:}),'%d_%*d%*[m]') % first number
ans =
2001
2001
2001
>> sscanf(sprintf('%sm',C{:}),'%d_%d%*[m]') % both numbers
ans =
2001
6
2001
7
2001
8
"...usually they are 4 digits long, but sometimes 3 digits, i.e. '001' instead of '2001'"
My answer works regardless of the number of digits in the numbers.

dpb
dpb 2018년 10월 7일
v=str2double(strip(splitstr(s,'_'),'m')); % chuckles...
In reality, write a regexp expression is the more general solution but I have to spend too much time figuring out the syntax and am too impatient... :)

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by