CURLOPT_FOLLOWLOCATION not working

后端 未结 2 1871
[愿得一人]
[愿得一人] 2021-01-21 05:31

I\'m trying to scrape the data at this link: http://www.treasurydirect.gov/NP/BPDLogin?application=np

which contains a meta refresh.

I\'m using curl_exec with CU

相关标签:
2条回答
  • 2021-01-21 05:42

    Meta refreshes are instructions for a browser. Curl doesn't process these. CURLOPT_FOLLOWLOCATION is meant for following redirects.

    0 讨论(0)
  • 2021-01-21 05:44

    The problem is not the meta refresh tag (which by the way never will be followed by CURLOPT_FOLLOWLOCATION option) but the HTTP user agent header. The web site checks the HTTP user agent header field against a list of accepted user agents. You could solve this by adding the following line when setting options for $ch:

    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
    
    0 讨论(0)
提交回复
热议问题