Image extraction from webpage
조회 수: 8 (최근 30일)
이전 댓글 표시
There are serial-numbered webpages (some of these numbers don't exist), which have images of interest at one particular location in the html file:
<h4 id="COMPANY">COMPANY</h4>
<p><img class="image" border="0" src="/resources/companyName_company.jpg"/></p>
The companyName is different in each numbered webpage.
However, urlwrite gives only html pages without these images. When opened in browser, these images are absent. Since it is these images that are of interest, and none of the other content of the webpage, the whole purpose is defeated. How can this be resolved ? Is there a way to get only these images, and nothing else from the webpage ?
댓글 수: 2
채택된 답변
Rik
2020년 4월 27일
The HTML file doesn't contain the image. It contains a relative path to the image. Because you don't have the image file in the location the HTML file specifies the image doesn't show up. You need to use the 3 step process below to get the image file.
- download the HTML file
- determine '/resources/companyName_company.jpg'
- dowload the image from website.com/resources/companyName_company.jpg
댓글 수: 18
Rik
2020년 4월 29일
Glad to be of help.
Since you suggested to be bound by an NDA not to provide more details I don't see what adding "(subject to testing)" is trying to accomplish. Obviously it works on a recent release of Matlab for this example, otherwise I wouldn't have posted it. The only thing it currently accomplishes is sounding condescending.
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Image Processing Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!