Parsing or regexp HTML output from urlread

I need to extract the PubMed IDs from the below HTML, but I am not too fluent in the use of regexp.
Can anyone help with how I would extract the IDs from the below HTML, and store them in a vector?
I'm guessing there is some way to say: what is between '<Id>' and '</Id>' store in...

 채택된 답변

Tom
Tom 2013년 6월 24일

1 개 추천

str = 'version="1.0" ? eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd" eSearchResult880<IdList> Id>16123227</Id Id>9561342</Id Id>8429296</Id Id>1408722</Id Id>2152845</Id Id>2894889</Id Id>2860133</Id Id>6145799</Id /IdList<TranslationSet/><TranslationStack> TermSet Term"ulcerative colitis"[All Fields]</Term> Fields</Field Count>33249</Count Explode>N</Explode /TermSet TermSet Term"Clonidine"[All Fields]</Term> Fields</Field Count>16458</Count Explode>N</Explode /TermSet OP>AND</OP /TranslationStack"ulcerative colitis"[All Fields] AND "Clonidine"[All Fields]</eSearchResult>';
%isolate the ID list string
IDList = regexp(str,'(?<=IdList>).*(?=/IdList)','match');
disp(IDList{1})
%get the ID numbers from the string
IDno = textscan(IDList{1},'Id>%d</Id');
disp(IDno{1})

추가 답변 (1개)

Sean de Wolski
Sean de Wolski 2013년 6월 24일

0 개 추천

댓글 수: 1

Philip  Spratt
Philip Spratt 2013년 6월 24일
Must apologise, the output was HTML, hence the xml2struct didn't work.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Characters and Strings에 대해 자세히 알아보기

질문:

2013년 6월 24일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by