How to read Open Graph and meta tags from a webpage with a url

问题

I want my website to be able to pull up information about a web page when the user pastes a link into the post box, similar to Facebook.

I was wondering how sites like Google, Reddit and Facebook are able to retrieve thumbnails, titles and descriptions with just a URL.

Anyone know how they do this?

回答1:

The basic algorithm is rather simple: fetch the page, analyze content, extract text&images&title&whatever, build preview. However there are a lot of difficulties for particular use cases. Menus, banners and adds, text structure - plenty of different details that require very scrupulous processing. AFAIK there is no algorithm that can solve this task in 100% cases (yes, Google's and other algorighms aren't perfect).

About Reddit. Since it's opensourced, you can find how they do it exactly. Here is the code you're looking for: https://github.com/reddit/reddit/blob/master/r2/r2/lib/scraper.py

Yandex has API that allows to do the same. You can find more here and here.

来源：https://stackoverflow.com/questions/16750127/how-to-read-open-graph-and-meta-tags-from-a-webpage-with-a-url

标签

jquery

facebook

opengraph