问题
I am trying to access a specific site using python, and no matter which lib I use I just can't seem to access it.
I have tried Selenium+PhantomJS, I have tried requests and urllib.
Whenever I try to access the site from the browser I get a json file, and whenever I try to access it from a python script I get an html file (which has a huge minified script inside it)
I suspect this site is detecting I'm sending the request headlessly and is blocking my requests, but I can't figure out how.
The site address is: http://www.yesplanet.co.il/presentationsJSON
I would very much appreciate if anyone can point me in the right direction. Thanks!
EDIT: Here's my selenium code:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://www.yesplanet.co.il/presentationsJSON")
source = driver.page_source
At this point I print the source and see it is not what I expected.
Here is a requests implementation that also does not work:
import requests
res = requests.get("http://www.yesplanet.co.il/presentationsJSON")
source = res.content
The same happens here..
回答1:
It works for me if I set a bunch of headers including sending a cookie.
curl -H "Cookie:rbzid=d29SMXE1Rktrdm5kS2x0YW5EdVZwUzNpYVhWdUlJSndlVzEvUU9vWG5OU2dRSVNnWTc3TWYwaHN4V2REVGJyNFBMSFl1bXErMGFLNXNtUGxVb0ZwS3dVRDRhajEwczFMMmE3cUc1blBmaTEzeFZFWGhrbHgrUXhNeHRhZnhWNjBib1pTenM5bjFvOUhVRVoxOTNGRHBYQXQwVzVsYXdSSXliME5LeUZjU0Rhb2tHa09ycUNVYmJyOUVjMERJN3daaUlFUGhwUHpvT0dDblcwU0wwMEM3NlJZRGw1K1pXZ2NKNkJRTWhvNUtaZz1AQEAxOTVAQEAtNjY2NjY2NjYwNjA-" -H "Accept-Language: en-US,en;q=0.8,ja;q=0.6" -H "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" http://www.yesplanet.co.il/presentationsJSON
Not sure which other headers are important
I looked at what headers chrome was sending by checking the network panel i the dev tools
From that I can also see chrome made 2 requests
来源:https://stackoverflow.com/questions/42450434/a-specific-site-is-returning-a-different-response-on-python-and-in-chrome