What is the best algorithm for matching two string containing less than 10 words in latin script

前端 未结 5 1823
刺人心
刺人心 2021-02-04 09:35

I\'m comparing song titles, using Latin script (although not always), my aim is an algorithm that gives a high score if the two song titles seem to be the same same title and a

5条回答
  •  说谎
    说谎 (楼主)
    2021-02-04 10:25

    You are likely need to solve a string-to-string correction problem. Levenshtein distance algorithm is implemented in many languages. Before running it I'd remove all spaces from string, because they don't contain any sensitive information, but may influence two strings difference. For string search prefix trees are also useful, you can have a look in this direction as well. For example here or here. Was already discussed on SO. If spaces are so much significant in your case, just assign a greater weight to them.

提交回复
热议问题