Lightweight fuzzy search library

前端 未结 8 505
鱼传尺愫
鱼传尺愫 2020-12-14 03:05

Can you suggest some light weight fuzzy text search library?

What I want to do is to allow users to find correct data for search terms with typos.

I could us

相关标签:
8条回答
  • 2020-12-14 03:30

    @aku - links to working soundex libraries are right there at the bottom of the page.

    As for Levenshtein distance, the Wikipedia article on that also has implementations listed at the bottom.

    0 讨论(0)
  • 2020-12-14 03:31

    I'm not sure how well Lucene is suited for fuzzy searching, the custom library would be better choice. For example, this search is done in Java and works pretty fast, but it is custom made for such task: http://www.softcorporation.com/products/people/

    0 讨论(0)
  • 2020-12-14 03:31

    A powerful, lightweight solution is sphinx.

    It's smaller then Lucene and it supports disambiguation.

    It's written in c++, it's fast, battle-tested, has libraries for every env and it's used by large companies, like craigslists.org

    0 讨论(0)
  • 2020-12-14 03:33

    Soundex is very 'English' in it's encoding - Daitch-Mokotoff works better for many names, especially European (Germanic) and Jewish names. In my UK-centric world, it's what I use.

    Wiki here.

    0 讨论(0)
  • 2020-12-14 03:46

    If you can choose to use a database, I recommend using PostgreSQL and its fuzzy string matching functions.

    If you can use Ruby, I suggest looking into the amatch library.

    0 讨论(0)
  • 2020-12-14 03:47

    Lucene is very scalable—which means its good for little applications too. You can create an index in memory very quickly if that's all you need.

    For fuzzy searching, you really need to decide what algorithm you'd like to use. With information retrieval, I use an n-gram technique with Lucene successfully. But that's a special indexing technique, not a "library" in itself.

    Without knowing more about your application, it won't be easy to recommend a suitable library. How much data are you searching? What format is the data? How often is the data updated?

    0 讨论(0)
提交回复
热议问题