问题
I heard it is possible to capture webpages by using PHP(maybe above 6.0) on windows server.
I got some sample code and tested. but there are no code to perform rightly.
If you know some right ways to capture webpage save it image file on web applications?
Please teach me.
回答1:
you could use the browsershots api http://browsershots.org/
with the xml-rpc interface you really could use almost any language to access it.
http://api.browsershots.org/xmlrpc/
回答2:
Though you have asked for a PHP solution, I would like to share yet another solution with Perl. WWW::Mechanize along with LWP::UserAgent and HTML::Parser can help in screen scraping.
Some documents for reference:
- Web scraping with WWW::Mechanize
- Screen-scraping with WWW::Mechanize
回答3:
Downloading the html of a web page is commonly known as screen scraping. This can be useful if you want a program to extract data from a given page. The easiest way to request HTTP resources is to use a tool call cURL. cURL comes as a stand alone unix tool, but there are libraries to use it in about every programming language. To capture this page from the Unix command line type:
curl http://stackoverflow.com/questions/1077970/in-any-languages-can-i-capture-a-webpageno-install-no-activex-if-i-can-plz
In PHP, you can do the same:
<?php
$ch = curl_init() or die(curl_error());
curl_setopt($ch, CURLOPT_URL,"http://stackoverflow.com/questions/1077970/in-any-languages-can-i-capture-a-webpageno-install-no-activex-if-i-can-plz");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data1=curl_exec($ch) or die(curl_error());
echo "<font color=black face=verdana size=3>".$data1."</font>";
echo curl_error($ch);
curl_close($ch);
?>
Now before copying an entire website, you should check their robots.txt file to see if they allow robots to spider their site, and you may want to check if there is an API available which allows you to retrieve the data without the HTML.
来源:https://stackoverflow.com/questions/1077970/in-any-languages-can-i-capture-a-webpage-and-save-it-image-file-no-install-n