finding string in between
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
0 개 추천
Hi, so I have a cell string with 100 X 1 like:
18WABO1-12345-0X
18WABO2-12345-0N
18WACE3-12345-00
18WACE4-12345-0R
18WAGUG-12345-0G
18WDUER-12345-0N
I would like to find the string sequence that is always between 18W and first - so the result is:
ABO1
AB02
ACE3
ACE4
AGUG
DUER etc
my example of a code:
%
somestring(:)= eic_p;
underscore_indices= strfind(somestring,'18W');
underscore_indices=cell2mat(underscore_indices);
fs_indices = strfind(somestring,'-');
fs_indices=fs_indices';
your_number=cellfun(@(v)v(1),fs_indices);
somestring(:)= somestring';
for i=1:length(fs_indices)
yourNumber= somestring{i}(underscore_indices(i)+2:your_number(i)-1);
%HOW i can save every iteration? thanks
end
in the last for loop somehow I am getting the weird output and can not save all results so I can have all those 205 abbreviations in one variable (yourNumber).
Thanks a lot,
채택된 답변
per isakson
2017년 10월 18일
편집: per isakson
2017년 10월 18일
yourNumber is overwritten in the loop and only the last value is saved. The first step to fix your code is
yourNumber = cell( length(fs_indices), 1 );
for i = 1 : length(fs_indices)
yourNumber{i} = somestring{i}(underscore_indices(i)+2:your_number(i)-1);
end
There are other ways, e.g. with regular expressions
>> str = '18WABO1-12345-0X';
>> regexp( str, '(?<=18W)[^\-]+(?=\-)', 'match' )
ans =
'ABO1'
and
cac = {
'18WABO1-12345-0X'
'18WABO2-12345-0N'
'18WACE3-12345-00'
'18WACE4-12345-0R'
'18WAGUG-12345-0G'
'18WDUER-12345-0N' };
%
out = regexp( cac, '(?<=18W).+?(?=\-)', 'match' );
out = cat( 1, out{:} );
and
>> out
out =
'ABO1'
'ABO2'
'ACE3'
'ACE4'
'AGUG'
'DUER'
and with indexing
>> str = char( cac );
>> str = str( :, 4:7 )
str =
ABO1
ABO2
ACE3
ACE4
AGUG
DUER
>>
댓글 수: 10
No need for the look forward/around with your first pattern, and you can add the option 'once' to avoid the CAT. And if you need to debug it .. well I'm not really sure .. yet ;)
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match', 'once' )
out2 =
6×1 cell array
{'ABO1'}
{'ABO2'}
{'ACE3'}
{'ACE4'}
{'AGUG'}
{'DUER'}
I tried this myself, and came up with the almost the same reg exp, just with the ^ to match the start:
regexp(C,'(?<=^18W)[^-]+','once','match')
per isakson
2017년 10월 18일
편집: per isakson
2017년 10월 18일
Yes, the look-ahead is overkill and 'once' will save a microsecond. With long strings 'once' makes a significant difference.
However, regexp with or without 'once' returns a cell array of scalar cell arrays, which in turn contain the strings. cat "flattens" the cell array.
There is always one more level without the 'once':
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match' )
out2 =
6×1 cell array
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match', 'once' )
out2 =
6×1 cell array
{'ABO1'}
{'ABO2'}
{'ACE3'}
{'ACE4'}
{'AGUG'}
{'DUER'}
.. you probably forgot to copy one of the lines (call to CAT) when you copy-pasted your example from the command window.
per isakson
2017년 10월 18일
편집: per isakson
2017년 10월 18일
"There is always one more level without the 'once':" Yes, that's correct. Now, I'll remember. A pity there isn't a strike-out feature.
One thing still puzzles me
out2 =
6×1 cell array
{'ABO1'}
why the braces around 'AB01'. Here on R2016a I get
>> out2
out2 =
'ABO1'
Have The MathWorks changed the display format?
Wow, you're right, I had never realized, or already forgotten(!) My output is from 2017b, but I was on 2016b until very recently .. I'm wondering if I didn't pay attention or if the update was between 2016a/b (?)
sensation
2017년 10월 19일
Thanks a lot guys for your answers! One quick question: can you just briefly eleborate (?<=18W)[^-]+ ?, or where I can find those expressions when I should use ? ^ or/and +. Thanks!
Understanding this, you will understand that
- (?<=..) is a look-behind and (?<=18W) imposes that what is matched (by the rest of the pattern) is preceded by 18W
- [^..] defines a set of elements not to match, so [^-] matches all characters but the dash.
- + is a quantifier that means one or more times the expression that precedes directly (which is [^-])
So the whole thing reads: match one or more character that is not a dash (which translates into "read all until a dash"), preceded by the literal 18W.
Stephen23
2017년 10월 19일
"where I can find those expressions when I should use ? ^ or/and +."
By reading the documentation ten times:
And then read it another ten times. And practice lots.
Regular expressions are powerful and very useful, but they require practice and attention to detail. Study that page I linked to, and the other pages that it links to as well.
sensation
2017년 10월 19일
Thanks!
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Characters and Strings에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
