Scrap amazon all deals php curl?

前端 未结 1 1716

I want to scrap amazon all deals page

http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1

So i am using curl php

$request         


        
相关标签:
1条回答
  • 2021-01-16 14:56

    Based on my quick reseach you might query XHRs made by amazon to request deals.

    Such dynamic websites get their data thru Ajax JSON calls. One might try to find out where from the data is dynamically downloaded, (using dev. tools or web sniffer), and then query those urls for data.

    See the shot. But if you to query them with php Curl you should use/imitate the http headers of that particular request headers (including cookies): request headers

    Update

    Based on your new curl request...

    1. The amazon page (its js logic) makes XHR to its server for each product item. XHRs look like this: http://www.amazon.com/xa/dealcontent/v2/GetDealMetadata?nocache=1434445645152 not http://www.amazon.com/gp/goldbox/all-deals/ref=sv_gb_1 which is only the referer.

    2. A request for product item is POST, not GET.

    3. You probably got cookie from your browser and inserted it into the php curl header. Wrong. These cookie are of your browser session, not related to a session of your php server that will requests XHRs. Therefore for this use cookie jar, see the post.
    4. The POST's load is an object, should be formed with known structure. Form data: {"requestMetadata":{"marketplaceID":"ATVPDKIKX0DER","sessionID":"175-4567874-0146849","clientID":"goldbox"},"widgetContext":{"pageType":"GoldBox","subPageType":"AllDeals","deviceType":"pc","refRID":"1VFVJBKEYZT3DGWSANXQ","widgetID":"1969939662","slotName":"center-6"},"page":1,"dealsPerPage":8,"itemResponseSize":"NONE","queryProfile":{"featuredOnly":false,"dealTypes":["LIGHTNING_DEAL","BEST_DEAL"],"includedCategories":["283155","599858","154606011"],"excludedExtendedFilters":{"MARKETING_ID":["restrictedcontent"]}}}

    See the developer tools picture: see the points bordered in red

    1. As Michael - sqlbot mentioned, you try to do an action that violates Amazon's terms of Use. But for the scrape technique's sake I still update my answer.
    0 讨论(0)
提交回复
热议问题