How to make sure a web page is entirely loaded before using webread
이전 댓글 표시
Hello,
While scraping data from a website, I don't have any issue for the main part of it, except for a data table; the latter is made available as its elements can be observed while inspecting the html/webpage.
My assumption is that the webpage requires some time in order to be fully loaded, before using using webread properly.
A timeout parameter or a simple for loop until success do not allow to fix the issue. Similarly, I don't mind loading text data instead of a table, as this is not the real problem here.
So all in all, I'm trying to find a way to open/fully load a web page and use this latter output as an argument/input to webread in a second step. Unless a specific parameter related to webread or weboptions exists and allows to address the issue.
Thanks for your help!
댓글 수: 11
Image Analyst
2023년 1월 1일
I'm not sure how RESTful works. When you send the web server the request, do you think it sends back some signal that it's "done" and all the data that will be sent has been sent? Or do you think it's just waiting a while and doesn't hear anything more after a certain waiting period, and then "times out" and declares that it's done because no more data is being delivered?
tom3w
2023년 1월 1일
tom3w
2023년 1월 1일
Image Analyst
2023년 1월 1일
Does the web page load if you use a web browser? If so, you might have to call tech support.
DGM
2023년 1월 2일
Are you sure that the table isn't dynamic content that's simply unavailable without the ability to execute the necessary scripts? That's an utterly common thing these days.
Rik
2023년 1월 2일
That dynamic loading is also my working diagnosis.
You can use the network tracking section of your browser debugging tools to try to find the source address of the table. The easiest way to open that window is to right click on the page and select inspect element.
Rik
2023년 1월 2일
No, I meant that you can easily open the network activity by using the inspect element option. If a page loads an object dynamically (e.g. with a script that runs when loading the initial page), then it will show up as a different object/source in the network view when you reload the page. Often that will be some sort of JSON object that is parsed by the script on the page.
This is generally done to increase loading time. On this website, the statistics when you click on a username are loaded from an external JSON. That means the page only has to load the script that tells your browser how to interpret the user stats, instead of having to contain the same hidden object for every time a username appears on the page. It also allows a central update of stats for each user: just update the JSON file and all pages will update accordingly.
tom3w
2023년 1월 2일
답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Web Services에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!