Creating a Gin Index with Trigram (gin_trgm_ops) in Django model

后端 未结 5 1677
轻奢々
轻奢々 2021-02-02 01:29

The new TrigramSimilarity feature of the django.contrib.postgres was great for a problem I had. I use it for a search bar to find hard to spell latin names. The problem is that

5条回答
  •  梦毁少年i
    2021-02-02 01:49

    I had a similar problem, trying to use the pg_tgrm extension to support efficient contains and icontains Django field lookups.

    There may be a more elegant way, but defining a new index type like this worked for me:

    from django.contrib.postgres.indexes import GinIndex
    
    class TrigramIndex(GinIndex):
        def get_sql_create_template_values(self, model, schema_editor, using):
            fields = [model._meta.get_field(field_name) for field_name, order in self.fields_orders]
            tablespace_sql = schema_editor._get_index_tablespace_sql(model, fields)
            quote_name = schema_editor.quote_name
            columns = [
                ('%s %s' % (quote_name(field.column), order)).strip() + ' gin_trgm_ops'
                for field, (field_name, order) in zip(fields, self.fields_orders)
            ]
            return {
                'table': quote_name(model._meta.db_table),
                'name': quote_name(self.name),
                'columns': ', '.join(columns),
                'using': using,
                'extra': tablespace_sql,
            }
    

    The method get_sql_create_template_values is copied from Index.get_sql_create_template_values(), with just one modification: the addition of + ' gin_trgm_ops'.

    For your use case, you would then define the index on name_txt using this TrigramIndex instead of a GinIndex. Then run makemigrations, which will produce a migration that generates the required CREATE INDEX SQL.

    UPDATE:

    I see you're also doing a query using icontains:

    result.exclude(name_txt__icontains = 'sp.')
    

    The Postgresql backend will turn that into something like this:

    UPPER("NCBI_names"."name_txt"::text) LIKE UPPER('sp.')
    

    and then the trigram index won't be used because of the UPPER().

    I had the same problem, and ended up subclassing the database backend to work around it:

    from django.db.backends.postgresql import base, operations
    
    class DatabaseFeatures(base.DatabaseFeatures):
        pass
    
    class DatabaseOperations(operations.DatabaseOperations):
        def lookup_cast(self, lookup_type, internal_type=None):
            lookup = '%s'
    
            # Cast text lookups to text to allow things like filter(x__contains=4)
            if lookup_type in ('iexact', 'contains', 'icontains', 'startswith',
                               'istartswith', 'endswith', 'iendswith', 'regex', 'iregex'):
                if internal_type in ('IPAddressField', 'GenericIPAddressField'):
                    lookup = "HOST(%s)"
                else:
                    lookup = "%s::text"
    
            return lookup
    
    
    class DatabaseWrapper(base.DatabaseWrapper):
        """
            Override the defaults where needed to allow use of trigram index
        """
        ops_class = DatabaseOperations
    
        def __init__(self, *args, **kwargs):
            self.operators.update({
                'icontains': 'ILIKE %s',
                'istartswith': 'ILIKE %s',
                'iendswith': 'ILIKE %s',
            })
            self.pattern_ops.update({
                'icontains': "ILIKE '%%' || {} || '%%'",
                'istartswith': "ILIKE {} || '%%'",
                'iendswith': "ILIKE '%%' || {}",
            })
            super(DatabaseWrapper, self).__init__(*args, **kwargs)
    

提交回复
热议问题