Hi, so I have a cell string with 100 X 1 like:
18WABO1-12345-0X
18WABO2-12345-0N
18WACE3-12345-00
18WACE4-12345-0R
18WAGUG-12345-0G
18WDUER-12345-0N
I would like to find the string sequence that is always between 18W and first - so the result is:
ABO1
AB02
ACE3
ACE4
AGUG
DUER etc
my example of a code:
%
somestring(:)= eic_p;
underscore_indices= strfind(somestring,'18W');
underscore_indices=cell2mat(underscore_indices);
fs_indices = strfind(somestring,'-');
fs_indices=fs_indices';
your_number=cellfun(@(v)v(1),fs_indices);
somestring(:)= somestring';
for i=1:length(fs_indices)
yourNumber= somestring{i}(underscore_indices(i)+2:your_number(i)-1);
%HOW i can save every iteration? thanks
end
in the last for loop somehow I am getting the weird output and can not save all results so I can have all those 205 abbreviations in one variable (yourNumber).
Thanks a lot,

 채택된 답변

per isakson
per isakson 2017년 10월 18일
편집: per isakson 2017년 10월 18일

1 개 추천

yourNumber is overwritten in the loop and only the last value is saved. The first step to fix your code is
yourNumber = cell( length(fs_indices), 1 );
for i = 1 : length(fs_indices)
yourNumber{i} = somestring{i}(underscore_indices(i)+2:your_number(i)-1);
end
There are other ways, e.g. with regular expressions
>> str = '18WABO1-12345-0X';
>> regexp( str, '(?<=18W)[^\-]+(?=\-)', 'match' )
ans =
'ABO1'
and
cac = {
'18WABO1-12345-0X'
'18WABO2-12345-0N'
'18WACE3-12345-00'
'18WACE4-12345-0R'
'18WAGUG-12345-0G'
'18WDUER-12345-0N' };
%
out = regexp( cac, '(?<=18W).+?(?=\-)', 'match' );
out = cat( 1, out{:} );
and
>> out
out =
'ABO1'
'ABO2'
'ACE3'
'ACE4'
'AGUG'
'DUER'
and with indexing
>> str = char( cac );
>> str = str( :, 4:7 )
str =
ABO1
ABO2
ACE3
ACE4
AGUG
DUER
>>

댓글 수: 10

Cedric
Cedric 2017년 10월 18일
편집: Cedric 2017년 10월 18일
No need for the look forward/around with your first pattern, and you can add the option 'once' to avoid the CAT. And if you need to debug it .. well I'm not really sure .. yet ;)
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match', 'once' )
out2 =
6×1 cell array
{'ABO1'}
{'ABO2'}
{'ACE3'}
{'ACE4'}
{'AGUG'}
{'DUER'}
Stephen23
Stephen23 2017년 10월 18일
편집: Stephen23 2017년 10월 18일
I tried this myself, and came up with the almost the same reg exp, just with the ^ to match the start:
regexp(C,'(?<=^18W)[^-]+','once','match')
per isakson
per isakson 2017년 10월 18일
편집: per isakson 2017년 10월 18일
Yes, the look-ahead is overkill and 'once' will save a microsecond. With long strings 'once' makes a significant difference.
However, regexp with or without 'once' returns a cell array of scalar cell arrays, which in turn contain the strings. cat "flattens" the cell array.
Cedric
Cedric 2017년 10월 18일
편집: Cedric 2017년 10월 18일
There is always one more level without the 'once':
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match' )
out2 =
6×1 cell array
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match', 'once' )
out2 =
6×1 cell array
{'ABO1'}
{'ABO2'}
{'ACE3'}
{'ACE4'}
{'AGUG'}
{'DUER'}
.. you probably forgot to copy one of the lines (call to CAT) when you copy-pasted your example from the command window.
per isakson
per isakson 2017년 10월 18일
편집: per isakson 2017년 10월 18일
"There is always one more level without the 'once':" Yes, that's correct. Now, I'll remember. A pity there isn't a strike-out feature.
One thing still puzzles me
out2 =
6×1 cell array
{'ABO1'}
why the braces around 'AB01'. Here on R2016a I get
>> out2
out2 =
'ABO1'
Have The MathWorks changed the display format?
Cedric
Cedric 2017년 10월 18일
편집: Cedric 2017년 10월 18일
Wow, you're right, I had never realized, or already forgotten(!) My output is from 2017b, but I was on 2016b until very recently .. I'm wondering if I didn't pay attention or if the update was between 2016a/b (?)
sensation
sensation 2017년 10월 19일
Thanks a lot guys for your answers! One quick question: can you just briefly eleborate (?<=18W)[^-]+ ?, or where I can find those expressions when I should use ? ^ or/and +. Thanks!
Cedric
Cedric 2017년 10월 19일
편집: Cedric 2017년 10월 19일
Look at my comment starting with "Not far" here for a brief summary.
Understanding this, you will understand that
  • (?<=..) is a look-behind and (?<=18W) imposes that what is matched (by the rest of the pattern) is preceded by 18W
  • [^..] defines a set of elements not to match, so [^-] matches all characters but the dash.
  • + is a quantifier that means one or more times the expression that precedes directly (which is [^-])
So the whole thing reads: match one or more character that is not a dash (which translates into "read all until a dash"), preceded by the literal 18W.
Stephen23
Stephen23 2017년 10월 19일
"where I can find those expressions when I should use ? ^ or/and +."
By reading the documentation ten times:
And then read it another ten times. And practice lots.
Regular expressions are powerful and very useful, but they require practice and attention to detail. Study that page I linked to, and the other pages that it links to as well.
sensation
sensation 2017년 10월 19일
Thanks!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Characters and Strings에 대해 자세히 알아보기

질문:

2017년 10월 18일

댓글:

2017년 10월 19일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by