Using cURL to download a site's HTML source, but getting different file than intended

荒凉一梦 提交于 2019-12-07 07:56:44

问题


I'm trying to use cURL and PHP to download the HTML source (as it appears in the browser) of here. But instead of the actual source code, this is returned instead (a meta refresh link set to 0).

<html>
    <head><title>Object moved</title></head>
    <body>
        <h2>Object moved to <a href="https://login.live.com/login.srf?wa=wsignin1.0&amp;rpsnv=11&amp;checkda=1&amp;ct=1321044850&amp;rver=6.1.6195.0&amp;wp=MBI&amp;wreply=http:%2F%2Fwww.windowsphone.com%2Fen-US%2Fapps%2Fea39f002-ac30-e011-854c-00237de2db9e&amp;lc=1033&amp;id=268289">here</a>.
        </h2>
    </body>
</html>

I'm trying to spoof the referral header to be the site, but it seems I'm doing it wrong. Code is below. Any suggestions? Thanks

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'http://www.windowsphone.com/en-US/apps/ea39f002-ac30-e011-854c-00237de2db9e');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com/en-US/apps/ea39f002-ac30-e011-854c-00237de2db9e");

$html = curl_exec($ch);
curl_close($ch);

回答1:


$ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, 'http://www.windowsphone.com/en-US/apps/ea39f002-ac30-e011-854c-00237de2db9e');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); 
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com");
$html = curl_exec($ch);
curl_close($ch);
echo $html;



回答2:


Add the curl option to follow redirects:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

If it is a meta refresh and not an HTTP moved header, see: PHP: Can CURL follow meta redirects

As mentioned by flesk, you may also need to store the cookies.




回答3:


The problem isn't the referrer but that you need to enable cookies for it to work.

Try something like this:

curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");

You have to query the page twice. First allow redirects to get the cookie from login.live.com, then query again with the cookie set.



来源:https://stackoverflow.com/questions/8099919/using-curl-to-download-a-sites-html-source-but-getting-different-file-than-int

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!