ALGORITHM - String similarity score/hash

后端未结

关注

 8  1305

Is there a method to calculate something like general \"similarity score\" of a string? In a way that I am not comparing two strings together but rather I get some number/scores

相关标签:

8条回答

-上瘾入骨i

2021-02-01 10:38
You can always use Levenshtein distance, also, there is a written implementation for that: http://code.google.com/p/pylevenshtein/

But, for simplicity, you can use builtin difflib module:
```
>>> import difflib
>>> l
{'Hello Earth', 'Hello World!', 'Foo Bar!', 'Foo world!', 'Foo bar', 'Hello World', 'FooBarbar'}
>>> difflib.get_close_matches("Foo World", l)
['Foo world!', 'Hello World', 'Hello World!']
```
http://docs.python.org/library/difflib.html#difflib.get_close_matches
0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2021-02-01 10:51

Have a look at locality-sensitive hashing.

The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

There's a very good explanation available here together with some sample code.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2