Postgres accent insensitive LIKE search in Rails 3.1 on Heroku

后端 未结 5 2128
无人及你
无人及你 2021-02-05 11:11

How can I modify a where/like condition on a search query in Rails:

find(:all, :conditions => [\"lower(name) LIKE ?\", \"%#{search.downcase}%\"])

相关标签:
5条回答
  • 2021-02-05 11:40

    There are 2 questions related to your search on the StackExchange: https://serverfault.com/questions/266373/postgresql-accent-diacritic-insensitive-search

    But as you are on Heroku, I doubt this is a good match (unless you have a dedicated database plan).

    There is also this one on SO: Removing accents/diacritics from string while preserving other special chars.

    But this assumes that your data is stored without any accent.

    I hope it will point you in the right direction.

    0 讨论(0)
  • 2021-02-05 11:46

    First of all, you install postgresql-contrib. Then you connect to your DB and execute:

    CREATE EXTENSION unaccent;
    

    to enable the extension for your DB.

    Depending on your language, you might need to create a new rule file (in my case greek.rules, located in /usr/share/postgresql/9.1/tsearch_data), or just append to the existing unaccent.rules (quite straightforward).

    In case you create your own .rules file, you need to make it default:

    ALTER TEXT SEARCH DICTIONARY unaccent (RULES='greek');
    

    This change is persistent, so you need not redo it.

    The next step would be to add a method to a model to make use of this function.

    One simple solution would be defining a function in the model. For instance:

    class Model < ActiveRecord::Base
        [...]
        def self.unaccent(column,value)
            a=self.where('unaccent(?) LIKE ?', column, "%value%")
            a
        end
        [...]
    end
    

    Then, I can simply invoke:

    Model.unaccent("name","text")
    

    Invoking the same command without the model definition would be as plain as:

    Model.where('unaccent(name) LIKE ?', "%text%"
    

    Note: The above example has been tested and works for postgres9.1, Rails 4.0, Ruby 2.0.

    UPDATE INFO
    Fixed potential SQLi backdoor thanks to @Henrik N's feedback

    0 讨论(0)
  • 2021-02-05 11:50

    For those like me who are having trouble on add the unaccent extension for PostgreSQL and get it working with the Rails application, here is the migration you need to create:

    class AddUnaccentExtension < ActiveRecord::Migration
      def up
        execute "create extension unaccent"
      end
    
      def down
        execute "drop extension unaccent"
      end
    end
    

    And, of course, after rake db:migrate you will be able to use the unaccent function in your queries: unaccent(column) similar to ... or unaccent(lower(column)) ...

    0 讨论(0)
  • 2021-02-05 11:53

    Assuming Foo is the model you are searching against and name is the column. Combining Postgres translate and ActiveSupport's transliterate. You can do something like:

    Foo.where(
      "translate(
        LOWER(name),
        'âãäåāăąÁÂÃÄÅĀĂĄèééêëēĕėęěĒĔĖĘĚìíîïìĩīĭÌÍÎÏÌĨĪĬóôõöōŏőÒÓÔÕÖŌŎŐùúûüũūŭůÙÚÛÜŨŪŬŮ',
        'aaaaaaaaaaaaaaaeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiooooooooooooooouuuuuuuuuuuuuuuu'
      )
      LIKE ?", "%#{ActiveSupport::Inflector.transliterate("%qué%").downcase}%"
    )
    
    0 讨论(0)
  • 2021-02-05 11:55

    Poor man's solution

    If you are able to create a function, you can use this one. I compiled the list starting here and added to it over time. It is pretty complete. You may even want to remove some characters:

    CREATE OR REPLACE FUNCTION lower_unaccent(text)
      RETURNS text AS
    $func$
    SELECT lower(translate($1
         , '¹²³áàâãäåāăąÀÁÂÃÄÅĀĂĄÆćčç©ĆČÇĐÐèéêёëēĕėęěÈÊËЁĒĔĖĘĚ€ğĞıìíîïìĩīĭÌÍÎÏЇÌĨĪĬłŁńňñŃŇÑòóôõöōŏőøÒÓÔÕÖŌŎŐØŒř®ŘšşșߊŞȘùúûüũūŭůÙÚÛÜŨŪŬŮýÿÝŸžżźŽŻŹ'
         , '123aaaaaaaaaaaaaaaaaaacccccccddeeeeeeeeeeeeeeeeeeeeggiiiiiiiiiiiiiiiiiillnnnnnnooooooooooooooooooorrrsssssssuuuuuuuuuuuuuuuuyyyyzzzzzz'
         ));
    $func$ LANGUAGE sql IMMUTABLE;
    

    Your query should work like that:

    find(:all, :conditions => ["lower_unaccent(name) LIKE ?", "%#{search.downcase}%"])
    

    For left-anchored searches, you can utilize an index on the function for very fast results:

    CREATE INDEX tbl_name_lower_unaccent_idx
      ON fest (lower_unaccent(name) text_pattern_ops);
    

    For queries like:

    SELECT * FROM tbl WHERE (lower_unaccent(name)) ~~ 'bob%'
    

    Proper solution

    In PostgreSQL 9.1+, with the necessary privileges, you can just:

    CREATE EXTENSION unaccent;
    

    which provides a function unaccent(), doing what you need (except for lower(), just use that additionally if needed). Read the manual about this extension.
    Also available for PostgreSQL 9.0 but CREATE EXTENSION syntax is new in 9.1.

    More about unaccent and indexes:

    • Does PostgreSQL support "accent insensitive" collations?
    0 讨论(0)
提交回复
热议问题