How to extract data from a table format HTML?

Question

0 개 추천

Hi,

I want to access a html and extract some information. However, when I use webread and then htmlTree I miss part of html data and don't know why.

Example:

Using this url

url = http://www.knapsackfamily.com/knapsack_core/information.php?word=C00000152

I would like to get information about the rows or columns of SMILES and InChL fields. However, when I use the code below I can't observe this information. I have tried different selectors, but I don't know if the data is dynamically generated.

url = http://www.knapsackfamily.com/knapsack_core/information.php?word=C00000152

html = webread(url);

tree = htmlTree(html);

selector= "td";

subtrees= findElement(tree,selector);

str = extractHTMLText(subtrees);

table_data = str(1:end);

Thank you,

Alan

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Jonas 2023년 2월 2일

MATLAB Online에서 열기

1 개 추천

without digging deeper into html, we can use just text seach:

d=webread('http://www.knapsackfamily.com/knapsack_core/information.php?word=C00000152',weboptions('Timeout',15));
SMILESfirstTry=extractBetween(d,'<th class="inf">SMILES</th>','</td>','Boundaries','exclusive');
SMILESsecondTry=extractAfter(SMILESfirstTry{1},'<td colspan="4">')
SMILESsecondTry = 'c1c(ccc(c1)/C=C/C(=O)O)O'

similar could be done for the other tags

simlarly a bit more html stuff:

tree = htmlTree(d);
selector= "tr";
subtrees= findElement(tree,selector);
 str = extractHTMLText(subtrees);
 searchTags={'InChIKey' 'InChICode' 'SMILES'};
 location=contains(str,searchTags);
 rawEntries=str(location)
rawEntries = 3×1 string array
    "InChIKey  NGSWKAQJJWESNS-ZZXKWVIFSA-N"
    "InChICode  InChI=1S/C9H8O3/c10-8-4-1-7(2-5-8)3-6-9(11)12/h1-6,10H,(H,11,12)/b6-3+"
    "SMILES  c1c(ccc(c1)/C=C/C(=O)O)O"
 extractAfter(rawEntries,'  ')
ans = 3×1 string array
    "NGSWKAQJJWESNS-ZZXKWVIFSA-N"
    "InChI=1S/C9H8O3/c10-8-4-1-7(2-5-8)3-6-9(11)12/h1-6,10H,(H,11,12)/b6-3+"
    "c1c(ccc(c1)/C=C/C(=O)O)O"

댓글 수: 2
없음 표시 없음 숨기기

Alan Cesar Pilon Miro 2023년 2월 3일

Hi Jonas,

Thank you! the first method worked very well.

Just to mentioned. I had some difficults in the second way, I could not find the objetcts.

Jonas 2023년 2월 6일

thx for your reply. make sure, that your the data returned from webread is not empty, since the website seems to be quite slow, sometimes the returned data is empty. maybe further increasing the timeout limit can help here

댓글을 달려면 로그인하십시오.

How to extract data from a table format HTML?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시 없음 숨기기

추가 답변 (0개)

카테고리

태그

Community Treasure Hunt

How to extract data from a table format HTML?

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시 없음 숨기기

추가 답변 (0개)

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시 없음 숨기기