How do I use kd-trees for determining string similarity?

问题

I am trying to utilize k-nearest neighbors for the string similarity problem i.e. given a string and a knowledge base, I want to output k strings that are similar to my given string. Are there any tutorials that explain how to utilize kd-trees to efficiently do this k-nearest neighbor lookup for strings? The string length will not exceed more than 20 characters.

回答1:

Probably one of the hottest blog posts I had read a year or so ago: Levenstein Automata. Take a look at that article. It provides not only a description of the algorithm but also code to follow. Technically, it's not a kd-tree but it's quite related to the string matching and dictionary correction algorithms one might encounter/use in the real world.

He also has another blog post about BK-trees which are much better at the fuzzy matching for strings and string look ups where there are mispellings. Here is another resource containing source code for a BK-tree (this one I can't verify the accuracy or proper implementation.)

来源：https://stackoverflow.com/questions/5696859/how-do-i-use-kd-trees-for-determining-string-similarity

标签

python

string

algorithm

kdtree

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!