How do I read the text between href tags and return the results in a cell array?

Question

StuartG 2016년 6월 13일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/289687-how-do-i-read-the-text-between-href-tags-and-return-the-results-in-a-cell-array

댓글: Ana Alonso 2019년 12월 17일

Currently, I have an html webpage saved in a text format. Below is an example of the portion of the text I am interested in:

I want to search the text document for every case the "<a href='\some\ " pattern appears and extract the text between the tokens, i.e.

/some/1056-text-stuff

Matlab has regexp, match and tags but I am struggling to pick out the string cleanly. Ideally, I would like to search the document and return a cell array of strings which lists all of the matches. Here is my current code:

str= fileread('C:\Users\Me\Documents\MATLAB\trial.txt');  %read in text file
urls = regexp(str, 'href=(\S+)(\s*)$', 'tokens', 'lineAnchors');    %find urls

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Julian 2016년 6월 17일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/289687-how-do-i-read-the-text-between-href-tags-and-return-the-results-in-a-cell-array#answer_225865

You can try something like

>> RE='<a[\s]+href="(?<target>.*?)"[^>]*>(?<text>.*?)</a>';
>> list=regexp(html, RE, 'names')

I can recommend this tool https://www.regexbuddy.com/

댓글 수: 2
없음 표시없음 숨기기

StuartG 2016년 6월 21일

Thank you very much, the regex command was giving me a lot of grief. I tailored your expression a little bit and it worked perfectly.

Ana Alonso 2019년 12월 17일

Hi there,

What do the (?<target>.*?) and (?<text>.*?) expressions correspond to?

I've never worked with html before and I'm just trying to scrape urls from the html code.

Thanks!

댓글을 달려면 로그인하십시오.

How do I read the text between href tags and return the results in a cell array?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

How do I read the text between href tags and return the results in a cell array?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기