Creating a Gin Index with Trigram (gin_trgm_ops) in Django model

后端未结

关注

 5  1677

轻奢々 2021-02-02 01:29

The new TrigramSimilarity feature of the django.contrib.postgres was great for a problem I had. I use it for a search bar to find hard to spell latin names. The problem is that

5条回答

梦毁少年i (楼主)

2021-02-02 01:49

I had a similar problem, trying to use the pg_tgrm extension to support efficient contains and icontains Django field lookups.

There may be a more elegant way, but defining a new index type like this worked for me:

from django.contrib.postgres.indexes import GinIndex

class TrigramIndex(GinIndex):
    def get_sql_create_template_values(self, model, schema_editor, using):
        fields = [model._meta.get_field(field_name) for field_name, order in self.fields_orders]
        tablespace_sql = schema_editor._get_index_tablespace_sql(model, fields)
        quote_name = schema_editor.quote_name
        columns = [
            ('%s %s' % (quote_name(field.column), order)).strip() + ' gin_trgm_ops'
            for field, (field_name, order) in zip(fields, self.fields_orders)
        ]
        return {
            'table': quote_name(model._meta.db_table),
            'name': quote_name(self.name),
            'columns': ', '.join(columns),
            'using': using,
            'extra': tablespace_sql,
        }

The method get_sql_create_template_values is copied from Index.get_sql_create_template_values(), with just one modification: the addition of + ' gin_trgm_ops'.

For your use case, you would then define the index on name_txt using this TrigramIndex instead of a GinIndex. Then run makemigrations, which will produce a migration that generates the required CREATE INDEX SQL.

UPDATE:

I see you're also doing a query using icontains:

result.exclude(name_txt__icontains = 'sp.')

The Postgresql backend will turn that into something like this:

UPPER("NCBI_names"."name_txt"::text) LIKE UPPER('sp.')

and then the trigram index won't be used because of the UPPER().

I had the same problem, and ended up subclassing the database backend to work around it:

from django.db.backends.postgresql import base, operations

class DatabaseFeatures(base.DatabaseFeatures):
    pass

class DatabaseOperations(operations.DatabaseOperations):
    def lookup_cast(self, lookup_type, internal_type=None):
        lookup = '%s'

        # Cast text lookups to text to allow things like filter(x__contains=4)
        if lookup_type in ('iexact', 'contains', 'icontains', 'startswith',
                           'istartswith', 'endswith', 'iendswith', 'regex', 'iregex'):
            if internal_type in ('IPAddressField', 'GenericIPAddressField'):
                lookup = "HOST(%s)"
            else:
                lookup = "%s::text"

        return lookup


class DatabaseWrapper(base.DatabaseWrapper):
    """
        Override the defaults where needed to allow use of trigram index
    """
    ops_class = DatabaseOperations

    def __init__(self, *args, **kwargs):
        self.operators.update({
            'icontains': 'ILIKE %s',
            'istartswith': 'ILIKE %s',
            'iendswith': 'ILIKE %s',
        })
        self.pattern_ops.update({
            'icontains': "ILIKE '%%' || {} || '%%'",
            'istartswith': "ILIKE {} || '%%'",
            'iendswith': "ILIKE '%%' || {}",
        })
        super(DatabaseWrapper, self).__init__(*args, **kwargs)

0 讨论(0)

查看其它5个回答