Get all images from a board from a Pinterest web address

后端 未结 5 1268
遥遥无期
遥遥无期 2021-02-09 05:50

This question sounds easy, but it is not as simple as it sounds.

Brief summary of what\'s wrong

For an example, use this board; http://pinterest

5条回答
  •  醉话见心
    2021-02-09 06:37

    A couple of people have suggested using javascript to emulate scrolling.

    I don't think you need to emulate scrolling at all, I think you just need to find out the format of the URIs called via AJAX whenever scrolling occurs, and then you can get each "page" of results sequentially. A little backward engineering is required.

    Using the network tab of Chrome inspector I can see that once I reach a certain distance down the page, this URI is called:

    http://pinterest.com/resource/BoardFeedResource/get/?source_url=%2Fdodo%2Fweb-designui-and-mobile%2F&data=%7B%22options%22%3A%7B%22board_id%22%3A%22158400180582875562%22%2C%22access%22%3A%5B%5D%2C%22bookmarks%22%3A%5B%22LT4xNTg0MDAxMTE4NjcxMTM2ODk6MjV8ZWJjODJjOWI4NTQ4NjU4ZDMyNzhmN2U3MGQyZGJhYTJhZjY2ODUzNTI4YTZhY2NlNmY0M2I1ODYwYjExZmQ3Yw%3D%3D%22%5D%7D%2C%22context%22%3A%7B%22app_version%22%3A%22fb43cdb%22%7D%2C%22module%22%3A%7B%22name%22%3A%22GridItems%22%2C%22options%22%3A%7B%22scrollable%22%3Atrue%2C%22show_grid_footer%22%3Atrue%2C%22centered%22%3Atrue%2C%22reflow_all%22%3Atrue%2C%22virtualize%22%3Atrue%2C%22item_options%22%3A%7B%22show_rich_title%22%3Afalse%2C%22squish_giraffe_pins%22%3Afalse%2C%22show_board%22%3Afalse%2C%22show_via%22%3Afalse%2C%22show_pinner%22%3Afalse%2C%22show_pinned_from%22%3Atrue%7D%2C%22layout%22%3A%22variable_height%22%7D%7D%2C%22append%22%3Atrue%2C%22error_strategy%22%3A1%7D&_=1377092055381

    if we decode that, we see that it's mostly JSON

    http://pinterest.com/resource/BoardFeedResource/get/?source_url=/dodo/web-designui-and-mobile/&data=
    {
    "options": {
        "board_id": "158400180582875562",
        "access": [],
        "bookmarks": [
            "LT4xNTg0MDAxMTE4NjcxMTM2ODk6MjV8ZWJjODJjOWI4NTQ4NjU4ZDMyNzhmN2U3MGQyZGJhYTJhZjY2ODUzNTI4YTZhY2NlNmY0M2I1ODYwYjExZmQ3Yw=="
        ]
    },
    "context": {
        "app_version": "fb43cdb"
    },
    "module": {
        "name": "GridItems",
        "options": {
            "scrollable": true,
            "show_grid_footer": true,
            "centered": true,
            "reflow_all": true,
            "virtualize": true,
            "item_options": {
                "show_rich_title": false,
                "squish_giraffe_pins": false,
                "show_board": false,
                "show_via": false,
                "show_pinner": false,
                "show_pinned_from": true
            },
            "layout": "variable_height"
        }
    },
    "append": true,
    "error_strategy": 1
    }
    &_=1377091719636
    

    Scroll down until we get a second request, and we see this

    http://pinterest.com/resource/BoardFeedResource/get/?source_url=/dodo/web-designui-and-mobile/&data=
    {
        "options": {
            "board_id": "158400180582875562",
            "access": [],
            "bookmarks": [
                "LT4xNTg0MDAxMTE4NjcwNTk1ODQ6NDl8ODFlMDUwYzVlYWQxNzVmYzdkMzI0YTJiOWJkYzUwOWFhZGFkM2M1MzhiNzA0ZDliZDIzYzE3NjkzNTg1ZTEyOQ=="
            ]
        },
        "context": {
            "app_version": "fb43cdb"
        },
        "module": {
            "name": "GridItems",
            "options": {
                "scrollable": true,
                "show_grid_footer": true,
                "centered": true,
                "reflow_all": true,
                "virtualize": true,
                "item_options": {
                    "show_rich_title": false,
                    "squish_giraffe_pins": false,
                    "show_board": false,
                    "show_via": false,
                    "show_pinner": false,
                    "show_pinned_from": true
                },
                "layout": "variable_height"
            }
        },
        "append": true,
        "error_strategy": 2
    }
    &_=1377092231234
    

    As you can see, not much has changed. The Board_id is the same. error_strategy is now 2, and the &_ at the end is different.

    The &_ parameter is key here. I would bet that it tells the page where to begin the next set of photos. I can't find a reference to it in either of the responses or the original Page HTML but it has to be in there somewhere, or be generated by javascript on the client side. Either way, the page / browser has to know what to ask for next, so this information is something you should be able to get at.

提交回复
热议问题