发表新帖

发表新帖

News clustering

前端未结

关注

 3  757

深忆病人 2021-01-30 18:34

How does Google News and Techmeme cluster news items that are similar? Are there any well know algorithm that is used to achieve this?

Appreciate your help.

Than

3条回答

梦如初夏 (楼主)

2021-01-30 19:19

The algorithmic basis is agglomerative clustering or something similar. But there are a number of heuristics on top of that. For example, the vector space is surely comprised of words and phrases (word n-grams). Limiting the search in a strict time period is also very important. And identifying names, and weighing more the title and the paragraph headings are also key parts.

On a tangentially related note. If you are interested in finding near-duplicate articles then there are a number of easier to implement approaches, such as the one described here

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题