Is there a way to scrape Amazon Product Listing page using Python?

霸气de小男生 提交于 2019-12-08 06:45:53

问题


I'm trying to scrape product listing pages that display the vendors and prices of particular products, but urllib.urlopen isn't working--it will work on all other pages on Amazon, but I'm kind of wondering if Amazon's bots prevent scraping on product listing pages. Can anyone verify this? Using Chrome I can still view page source...

Here's an example of a product listing page I would want to scrape: http://www.amazon.com/gp/offer-listing/B007E84H96/ref=dp_olp_new?ie=UTF8&condition=new


回答1:


Trying 'curl -I ' on that URL returns MethodNotAllowed:

$ curl -I 'http://www.amazon.com/gp/offer-listing/B007E84H96/ref=dp_olp_new?ie=UTF8&condition=new' 
HTTP/1.1 405 MethodNotAllowed
Date: Wed, 13 Feb 2013 16:41:08 GMT
Server: Server
x-amz-id-1: 1WKZG9N0SE87E3KFG6YV
allow: POST, GET
x-amz-id-2: Apluv2QBzzrmXlRWjlClRGsQQ1TbwsxObe2hxfdrGhO/OQziI/aIT3vkVjCPn+qz
Vary: Accept-Encoding,User-Agent
Content-Type: text/html; charset=ISO-8859-1

and adding a User-Agent string with the '-A' switch didn't effect that return value.

You might experiment with different http headers to see if you can find something that passess. But it's pretty obvious that Amazon wouldn't want you to screen scrape prices from their product pages. And a little googling brings up this page:

http://www.distil.it/amazon-cracks-down-on-price-scraping/#.URvBFo4ry0s

With no fanfare or warning, Amazon in June began enforcing a long-standing policy prohibiting screen-scraping tools from harvesting listing information directly from its marketplace, a favorite tool for providers of repricing services for merchants, according to a third-party developer.

Note also that Amazon has an API for their affiliates -- there are some related questions about using that API from python in the "Related" question links on the right column.




回答2:


Have you heard of BeautifulSoup? You might get some mileage out of that...

http://www.crummy.com/software/BeautifulSoup/


More details: BeautifulSoup Grab Visible Webpage Text



来源:https://stackoverflow.com/questions/14844032/is-there-a-way-to-scrape-amazon-product-listing-page-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!