问题
Using HTTParty, I'm able to read some pages fine, other on the same site strangely give a 404 unless I set the proper session headers. So I'm trying to get and set them via HTTParty.
This works:
HTTParty.get 'https://www.instagram.com/explore/locations/24993086/pfriem-family-brewers/'
This gives a 404:
HTTParty.get 'https://www.instagram.com/explore/locations/295648950/trio-salon-spa/'
Curl also gives a 404 for that:
url -I https://www.instagram.com/explore/locations/295648950/trio-salon-spa/
...unless I set all the headers that Chrome sends:
curl -I 'https://www.instagram.com/explore/locations/295648950/trio-salon-spa/' -H 'pragma: no-cache' -H 'accept-encoding: gzip, deflate, br' -H 'accept-language: en-US,en;q=0.9,da;q=0.8,fr;q=0.7' -H 'upgrade-insecure-requests: 1' -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'cache-control: no-cache' -H 'authority: www.instagram.com' -H 'cookie: shbid=15682; csrftoken=7FQ5Z0SZikfkYiS02bcTRotjAYEvdooD; mid=Wk7bTgAEAAHaOKTW39cyGMXo8vLj; ds_user_id=2055054912; mcd=3; sessionid=IGSC07d4796576bd227dda2a5353ee0365bbd6a6f6b2da7567d57a83cde58c0ae870%3AOvd5knFMpZUQaZpRyr0QkeMitWBFnGDP%3A%7B%22_auth_user_id%22%3A2055054912%2C%22_auth_user_backend%22%3A%22accounts.backends.CaseInsensitiveModelBackend%22%2C%22_auth_user_hash%22%3A%22%22%2C%22_platform%22%3A4%2C%22_token_ver%22%3A2%2C%22_token%22%3A%222055054912%3AhdEI59s33u2BM3M2f8p2ZkSkZ9HeZR5Y%3A7359e774fd121f9726db15a24e660d43a0f464b4733bbbaa31f619aec3f433ba%22%2C%22last_refreshed%22%3A1524728608.4418663979%7D; rur=FTW; fbm_124024574287414=base_domain=.instagram.com; fbsr_124024574287414=9pBDQeojfCbPmhlXZHwx_OGhduHFlQusvBdewwiZDY4.eyJhbGdvcml0aG0iOiJITUFDLVNIQTI1NiIsImNvZGUiOiJBUUJVTEZmWHBfYS1mdVYyMmhKZEhnV3Bvb3dKVXppV3oxVVhxcDBGelFnTmpnNmpRZjNDY1ZZV2xybWZYSE5JV0dxeVUtNE9UaDBFUUdMbFJFWVVwZzZPYngxYmdxbUxHLV9pVGl5U3hGa1JxbGRJRExITHV5V09WVEVsbjlHUFhsTmRCQUNheUdYMmh4ZWcxajJEcERwczZ2X282aWRUcWd6UmYxaHE2VjVObEFGX0w5bFM1RXI5bHg3b2c1bWk0ak9OdmVCcVpLTG5nY1llM0NnWHVtOWdWQWJTVi1SdWpQU2J1UmhHNFdaS0xweWtPVEYzdmhsUlVaT2FLZ3FXcFo2TXFXY2xqSTM0T3JWZjR3dzFyY3J0S0RkdVh0Qk5zellKRG9weU1IdG1kVERUVGVrZmwyUHhpRzZsSmRkUVpSbWc0MTNMWERHQ0ZORDRVRS13OUhvMyIsImlzc3VlZF9hdCI6MTUyNDc1ODM3NCwidXNlcl9pZCI6IjExNTU2MzM2MjkifQ; urlgen="{\"time\": 1524728608\054 \"65.157.26.82\": 209}:1fBjJI:P27GLkp5R5uijWSi-SEkHW-Mo0c"' --compressed
So, I'm trying to do a basic get with HTTP, read the session, and make another request with it, but it's not working properly:
require 'httparty'
url = 'https://www.instagram.com/explore/locations/295648950/trio-salon-spa/'
get_response = HTTParty.get(url)
cookie_hash = CookieHash.new
get_response.get_fields('Set-Cookie').each { |c| cookie_hash.add_cookies(c) }
get_response_cookie = parse_cookie(get_response.headers['Set-Cookie'])
post_response = HTTParty.get( url, headers: {'Cookie' => cookie_hash.to_cookie_string } )
How can I get this to work?
来源:https://stackoverflow.com/questions/50048446/getting-setting-session-with-httparty