How to read Open Graph and meta tags from a webpage with a url

自古美人都是妖i 提交于 2019-12-08 09:50:16

问题


I want my website to be able to pull up information about a web page when the user pastes a link into the post box, similar to Facebook.

I was wondering how sites like Google, Reddit and Facebook are able to retrieve thumbnails, titles and descriptions with just a URL.

Anyone know how they do this?


回答1:


The basic algorithm is rather simple: fetch the page, analyze content, extract text&images&title&whatever, build preview. However there are a lot of difficulties for particular use cases. Menus, banners and adds, text structure - plenty of different details that require very scrupulous processing. AFAIK there is no algorithm that can solve this task in 100% cases (yes, Google's and other algorighms aren't perfect).

About Reddit. Since it's opensourced, you can find how they do it exactly. Here is the code you're looking for: https://github.com/reddit/reddit/blob/master/r2/r2/lib/scraper.py

Yandex has API that allows to do the same. You can find more here and here.



来源:https://stackoverflow.com/questions/16750127/how-to-read-open-graph-and-meta-tags-from-a-webpage-with-a-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!