Efficient way of calculating likeness scores of strings when sample size is large?

前端未结

关注

 8  833

轻奢々 2020-12-25 15:10

Let\'s say that you have a list of 10,000 email addresses, and you\'d like to find what some of the closest \"neighbors\" in this list are - defined as email addresses that

8条回答

囚心锁ツ (楼主)

2020-12-25 16:02

10,000 email addresses sound not too much. For similarity search in a larger space you can use shingling and min-hashing. This algorithm is a bit more complicated to implement, but is much more efficient on a large space.

0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...