Parsing or regexp HTML output from urlread

조회 수: 4 (최근 30일)
Philip  Spratt
Philip Spratt 2013년 6월 24일
I need to extract the PubMed IDs from the below HTML, but I am not too fluent in the use of regexp.
Can anyone help with how I would extract the IDs from the below HTML, and store them in a vector?
I'm guessing there is some way to say: what is between '<Id>' and '</Id>' store in...

채택된 답변

Tom
Tom 2013년 6월 24일
str = 'version="1.0" ? eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd" eSearchResult880<IdList> Id>16123227</Id Id>9561342</Id Id>8429296</Id Id>1408722</Id Id>2152845</Id Id>2894889</Id Id>2860133</Id Id>6145799</Id /IdList<TranslationSet/><TranslationStack> TermSet Term"ulcerative colitis"[All Fields]</Term> Fields</Field Count>33249</Count Explode>N</Explode /TermSet TermSet Term"Clonidine"[All Fields]</Term> Fields</Field Count>16458</Count Explode>N</Explode /TermSet OP>AND</OP /TranslationStack"ulcerative colitis"[All Fields] AND "Clonidine"[All Fields]</eSearchResult>';
%isolate the ID list string
IDList = regexp(str,'(?<=IdList>).*(?=/IdList)','match');
disp(IDList{1})
%get the ID numbers from the string
IDno = textscan(IDList{1},'Id>%d</Id');
disp(IDno{1})

추가 답변 (1개)

Sean de Wolski
Sean de Wolski 2013년 6월 24일
  댓글 수: 1
Philip  Spratt
Philip Spratt 2013년 6월 24일
Must apologise, the output was HTML, hence the xml2struct didn't work.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by