webbot

버전 1.0.0.0 (6 KB) 작성자: Laurent Cavin
A Java-based "web browser" that extract all links from a web-page, and display them.
다운로드 수: 8.6K
업데이트 날짜: 2003/10/15

라이선스 보기

WEBBOT Java-based browser with download and PERL regular expressions. The function will extract all links from a web-page, and display them. The resulting documents can be downloaded.

WEBBOT(URL)
URL is a string indicating the base page address; the url must link to an html file. The function lists all links in the file. URL can also be a cell vector of url-strings.

WEBBOT(URL, WHAT)
displays only specific links. WHAT is a string:
'all_links': displays all links (default).
'page_links': displays all links to an html web page*.
'local_links': displays all local links on the server*.
'external_links': displays all links to external websites.
'image_links': displays all links to an image file**.
'image_tags': displays all image tags <img src="xxx">.
'.xxx.yyyy.zz': displays all links to each specific .xxx files; the case is ignored ('zip' will find 'ZiP'); e.g. '.zip.gz.gzip.tar.Z'.

WEBBOT(URL, WHAT, ACT)
performs an action on found links. ACT is a string:
'noaction': just display links (default)
'download': downloads all links found locally.
'cartoons': downloads all image tags found on linked pages. This is usefull for cartoons websites where each cartoon (e.g. "01.gif") is on its own html page (e.g. "c01.html").
<li>'follow.x': follows links to html pages and recursively performs the same action on the resulting page. 'x' is an integer indicating the ecursivity depth (0 is equivalent to 'noaction').

lks = WEBBOT(URL, ...)
returns an cell-array with links of URL{end}.

Notes: * Links explicitely pointing to a .htm or .html url.
** Image links are recognized by the following file types:
.jpg .jpeg .gif .pict .bmp .tif .tiff .ras .png (.giff)

Try it with:
webbot('http://www.unitedmedia.com/comics/dilbert/archive/', ...
'local_links', 'cartoons');

Written by L.Cavin, 28.09.2003, (c) CSE
This code is free to use and modify for non-commercial purposes.
Web address: http://ltcmail.ethz.ch/cavin/CSEDBLib.html#WEBBOT

인용 양식

Laurent Cavin (2026). webbot (https://kr.mathworks.com/matlabcentral/fileexchange/4023-webbot), MATLAB Central File Exchange. 검색 날짜: .

MATLAB 릴리스 호환 정보
개발 환경: R13
모든 릴리스와 호환
플랫폼 호환성
Windows macOS Linux
카테고리
Help CenterMATLAB Answers에서 Call Web Services from MATLAB Using HTTP에 대해 자세히 알아보기
버전 게시됨 릴리스 정보
1.0.0.0

Major update:
Much, much, faster downloads with the Matworks object "com.mathworks.mlwidgets.io.InterruptibleStreamCopier".
The old code using "java.net.url" is still included for demonstration purposes.