How do you find the “main” picture of a website, given the URL?

后端 未结 4 1889
眼角桃花
眼角桃花 2021-02-05 13:59

Let\'s say you\'re given http://nytimes.com How would you pull out the \"main\" image?

The reason I\'m asking is because Flipboard is able to grab the main image from a

4条回答
  •  暗喜
    暗喜 (楼主)
    2021-02-05 14:05

    There are many strategies to determine what is the "main" image of an URL:

    • many websites now declare what the main image is (for Facebook OpenGraph or Twitter Cards)
    • sometimes, the image can be guessed from the URL or by doing an API call (especially true for image hosting websites such as Instagram)
    • the main image can also be determined with by analyzing the webpage with content extraction techniques (Readability). You might want to filter out "noise" to get rid of tracking pixels or ads.
    • if all these techniques fail, you can download all the images and assume that the largest images are the most interesting.

    I've created a JavaScript library that uses most of these techniques to determine the "main" picture of an URL : ImageResolver.

提交回复
热议问题