How do you extract from a website table?
조회 수: 30 (최근 30일)
이전 댓글 표시
I'm trying to extract data from the table on this page(http://www.newyorkschools.com/districts/nyc-district-11.html).
I've tried tp uses webread but it isn't quite working for me. I'm attempting to extract the school names and the grade level and them place that into an excel file. (Helping a friend starting a stem program)
How do you think I should do?
url ='http://www.newyorkschools.com/districts/nyc-district-7.html';
data = webread(url)
tree=htmlTree(url)
selector = 'School Name'
subtrees = findElement(tree,selector)
subtrees(:)
댓글 수: 0
채택된 답변
Christopher Creutzig
2022년 6월 7일
The problem with this page is that it is not using an HTML <table> for the data you are looking for. Otherwise, you would be able to simply use readtable(url) or maybe readtable(url,TableIndex=2).
Also, the selector needs to follow what is found in the HTML source, which again in this particular case is not made easy. MATLAB does not control what you need in there.
Here's something to get you started with:
url ='http://www.newyorkschools.com/districts/nyc-district-7.html';
data = webread(url);
tree = htmlTree(data);
tabs = findElement(tree,"#myTabContent > div");
schools = tabs(1);
rows = findElement(schools,".p_div");
schoolnames = findElement(schools,".pp-col-40");
extractHTMLText(schoolnames)
댓글 수: 0
추가 답변 (2개)
Toshiaki Takeuchi
2023년 10월 24일
You can use readtable https://www.mathworks.com/help/matlab/ref/readtable.html
url = "https://www.mathworks.com/help/matlab/text-files.html";
T = readtable(url,TableSelector="//TABLE[contains(.,'readtable')]", ...
ReadVariableNames=false)
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Environment and Settings에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!