PHP crawl a website, which is using cloudflare

最后都变了- 提交于 2019-12-19 04:14:53

问题


I want to crawl some specific values (e.g.newstext) from a website (which is not my own).

file_get_contents() is not working, propably blocked by php.ini.

So i tried to do it with curl, problem is:
All I get is the redirection text from cloudflare.
My crawler should do something like:
go to page -> wait the 5secs cloudflare redirect -> curl the page.

Any ideas how to crawl the page after the cloudfare waiting time? (in PHP)

edit: so i tried a lot of things, problem is still the same..
more specific: it only crawls the cloudflare redirect page. (so i'm getting a page which redirects to the host, cloudflare is in front. when i curl on localhost it takes localhost, so redirect is obv not working.) Is there no way to start saving returend data after 5secs "curling"?


回答1:


"go to page -> wait the 5secs cloudflare redirect -> curl the page."

The 5 second interstitial page actually requires that JavaScript and cookies are enabled before a visitor can pass the check, which probably won't work if you're using a crawler or bot to access the site.




回答2:


First you should check how normal browser behave on this site. What are redirects and cookies.

Then, you need to set up curl script that collects all cookies in "cookie jar" and auto follow redirects.

Then you should do some tests.

Hope this helps.

Note:

  • Cloudflare have good infrastructure to block people like you. They could do captcha challenge or something similar.

  • Also good system administrator soon or later will find what you are doing and will block your IP or your user-agent.




回答3:


You should use phantomjs

echo shell_exec('phantomjs example.js')

example.js

var page = require('webpage').create();
var url = 'http://www.google/';
page.open(url, function (status) {
  console.log(page.content)
  phantom.exit();
});


来源:https://stackoverflow.com/questions/31182100/php-crawl-a-website-which-is-using-cloudflare

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!