How to create simple fuzzy search with Postgresql only?

前端 未结 2 764
逝去的感伤
逝去的感伤 2020-12-07 11:04

I have a little problem with search functionality on my RoR based site. I have many Produts with some CODEs. This code can be any string like \"AB-123-lHdfj\". Now I use ILI

相关标签:
2条回答
  • 2020-12-07 11:42

    Postgres provides a module with several string comparsion functions such as soundex and metaphone. But you will want to use the levenshtein edit distance function.

    Example:
    
    test=# SELECT levenshtein('GUMBO', 'GAMBOL');
     levenshtein
    -------------
               2
    (1 row)
    

    The 2 is the edit distance between the two words. When you apply this against a number of words and sort by the edit distance result you will have the type of fuzzy matches that you're looking for.

    Try this query sample: (with your own object names and data of course)

    SELECT * 
    FROM some_table
    WHERE levenshtein(code, 'AB123-lHdfj') <= 3
    ORDER BY levenshtein(code, 'AB123-lHdfj')
    LIMIT 10
    

    This query says:

    Give me the top 10 results of all data from some_table where the edit distance between the code value and the input 'AB123-lHdfj' is less than 3. You will get back all rows where the value of code is within 3 characters difference to 'AB123-lHdfj'...

    Note: if you get an error like:

    function levenshtein(character varying, unknown) does not exist
    

    Install the fuzzystrmatch extension using:

    test=# CREATE EXTENSION fuzzystrmatch;
    
    0 讨论(0)
  • 2020-12-07 11:49

    Paul told you about levenshtein(). That's a very useful tool, but it's also very slow with big tables. It has to calculate the levenshtein-distance from the search term for every single row, that's expensive.

    First off, if your requirements are as simple as the example indicates, you can still use LIKE. Just replace any - in your search term with % to create the WHERE clause

    WHERE code LIKE "%AB%123%lHdfj%"
    

    instead of

    WHERE code LIKE "%AB-123-lHdfj%"
    

    If your real problem is more complex and you need something faster then - depending on your requirements - there are several options.

    • There is full text search, of course. But this may be an overkill in your case.

    • A more likely candidate is pg_trgm. Note that you can combine that with LIKE in PostgreSQL 9.1. See this blog post by Depesz.
      Also very interesting in this context: the similarity() function or % operator of that module. More:

      • PostgreSQL LIKE query performance variations
    • Last but not least you can implement a hand-knit solution with a function to normalize the strings to be searched. For instance, you could transform AB1-23-lHdfj -> ab123lhdfj, save it in an additional column and search it with search terms that have been transformed the same way.

      Or use an index on an expression instead of the redundant column. (Involved functions must be IMMUTABLE.) And possibly combine that with pg_tgrm from above.

    Overview of pattern-matching techniques:

    • Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
    0 讨论(0)
提交回复
热议问题