How to set regexp so that it stops to the first istance?
조회 수: 3 (최근 30일)
이전 댓글 표시
Hi all,
I need to extract the urls from the following html code and I am using regexp.
a='<option value="http://www.tractordata.com/farm-tractors/003/9/0/3906-massey-ferguson-7465-transmission.html">2004-2007</option><option value="http://www.tractordata.com/farm-tractors/006/7/0/6706-massey-ferguson-7465-transmission.html" selected>2008-2012</option></select></form></td></tr>';
urls=regexp(a,'(?<=option value.*)http.*html','match');
and the result is:
http://www.tractordata.com/farm-tractors/003/9/0/3906-massey-ferguson-7465-transmission.html">2004-2007</option><option value="http://www.tractordata.com/farm-tractors/006/7/0/6706-massey-ferguson-7465-transmission.html
As you can see the sting extract a string which respects the pattern but it includes two different urls. I need the two following results:
http://www.tractordata.com/farm-tractors/003/9/0/3906-massey-ferguson-7465-transmission.html
http://www.tractordata.com/farm-tractors/006/7/0/6706-massey-ferguson-7465-transmission.html
How may I fix this problem?
Thanks
Pietro
댓글 수: 0
채택된 답변
Stephen23
2017년 6월 14일
편집: Stephen23
2017년 6월 14일
>> urls = regexp(a,'(?<=option value.*)http.*?\.html','match');
>> urls{:}
ans =
http://www.tractordata.com/farm-tractors/003/9/0/3906-massey-ferguson-7465-transmission.html
ans =
http://www.tractordata.com/farm-tractors/006/7/0/6706-massey-ferguson-7465-transmission.html
A more robust method would be to not match " characters:
>> urls = regexp(a,'(?<=option value=")[^"]+\.html','match');
If you want to experiment with regular expressions then you might like to try my Interactive Regular Expression Tool, which shows the outputs of regexp as your type the parse and match strings. You can download it here:
댓글 수: 0
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 String Parsing에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!