What is the best algorithm for matching two string containing less than 10 words in latin script

前端未结

关注

 5  1832

刺人心 2021-02-04 09:35

I\'m comparing song titles, using Latin script (although not always), my aim is an algorithm that gives a high score if the two song titles seem to be the same same title and a

5条回答

说谎 (楼主)

2021-02-04 10:25

You are likely need to solve a string-to-string correction problem. Levenshtein distance algorithm is implemented in many languages. Before running it I'd remove all spaces from string, because they don't contain any sensitive information, but may influence two strings difference. For string search prefix trees are also useful, you can have a look in this direction as well. For example here or here. Was already discussed on SO. If spaces are so much significant in your case, just assign a greater weight to them.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...