how to do fuzzy search in big data

后端 未结 3 2102
梦如初夏
梦如初夏 2021-01-31 20:05

I\'m new to that area and I wondering mostly what the state-of-the-art is and where I can read about it.

Let\'s assume that I just have a key/value store and I have some

3条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-31 20:29

    There is no (fast) generic solution, each application will need different approach.

    Neither of the two examples actually does traditional nearest neighbor search. AcoustID (I'm the author) is just looking for exact matches, but it searches in a very high number of hashes in hope that some of them will match. The phonetic search example uses metaphone to convert words to their phonetic representation and is also only looking for exact matches.

    You will find that if you have a lot of data, exact search using huge hash tables is the only thing you can realistically do. The problem then becomes how to convert your fuzzy matching to exact search.

    A common approach is to use locality-sensitive hashing (LSH) with a smart hashing method, but as you can see in your two examples, sometimes you can get away with even simpler approach.

    Btw, you are looking specifically for text search, the simplest way you can do it split your input to N-grams and index those. Depending on how your distance function is defined, that might give you the right candidate matches without too much work.

提交回复
热议问题