There are many strategies to determine what is the "main" image of an URL:
- many websites now declare what the main image is (for Facebook OpenGraph or Twitter Cards)
- sometimes, the image can be guessed from the URL or by doing an API call (especially true for image hosting websites such as Instagram)
- the main image can also be determined with by analyzing the webpage with content extraction techniques (Readability). You might want to filter out "noise" to get rid of tracking pixels or ads.
- if all these techniques fail, you can download all the images and assume that the largest images are the most interesting.
I've created a JavaScript library that uses most of these techniques to determine the "main" picture of an URL : ImageResolver.