Creating a Gin Index with Trigram (gin_trgm_ops) in Django model

后端 未结 5 1667
轻奢々
轻奢々 2021-02-02 01:29

The new TrigramSimilarity feature of the django.contrib.postgres was great for a problem I had. I use it for a search bar to find hard to spell latin names. The problem is that

相关标签:
5条回答
  • 2021-02-02 01:49

    I had a similar problem, trying to use the pg_tgrm extension to support efficient contains and icontains Django field lookups.

    There may be a more elegant way, but defining a new index type like this worked for me:

    from django.contrib.postgres.indexes import GinIndex
    
    class TrigramIndex(GinIndex):
        def get_sql_create_template_values(self, model, schema_editor, using):
            fields = [model._meta.get_field(field_name) for field_name, order in self.fields_orders]
            tablespace_sql = schema_editor._get_index_tablespace_sql(model, fields)
            quote_name = schema_editor.quote_name
            columns = [
                ('%s %s' % (quote_name(field.column), order)).strip() + ' gin_trgm_ops'
                for field, (field_name, order) in zip(fields, self.fields_orders)
            ]
            return {
                'table': quote_name(model._meta.db_table),
                'name': quote_name(self.name),
                'columns': ', '.join(columns),
                'using': using,
                'extra': tablespace_sql,
            }
    

    The method get_sql_create_template_values is copied from Index.get_sql_create_template_values(), with just one modification: the addition of + ' gin_trgm_ops'.

    For your use case, you would then define the index on name_txt using this TrigramIndex instead of a GinIndex. Then run makemigrations, which will produce a migration that generates the required CREATE INDEX SQL.

    UPDATE:

    I see you're also doing a query using icontains:

    result.exclude(name_txt__icontains = 'sp.')
    

    The Postgresql backend will turn that into something like this:

    UPPER("NCBI_names"."name_txt"::text) LIKE UPPER('sp.')
    

    and then the trigram index won't be used because of the UPPER().

    I had the same problem, and ended up subclassing the database backend to work around it:

    from django.db.backends.postgresql import base, operations
    
    class DatabaseFeatures(base.DatabaseFeatures):
        pass
    
    class DatabaseOperations(operations.DatabaseOperations):
        def lookup_cast(self, lookup_type, internal_type=None):
            lookup = '%s'
    
            # Cast text lookups to text to allow things like filter(x__contains=4)
            if lookup_type in ('iexact', 'contains', 'icontains', 'startswith',
                               'istartswith', 'endswith', 'iendswith', 'regex', 'iregex'):
                if internal_type in ('IPAddressField', 'GenericIPAddressField'):
                    lookup = "HOST(%s)"
                else:
                    lookup = "%s::text"
    
            return lookup
    
    
    class DatabaseWrapper(base.DatabaseWrapper):
        """
            Override the defaults where needed to allow use of trigram index
        """
        ops_class = DatabaseOperations
    
        def __init__(self, *args, **kwargs):
            self.operators.update({
                'icontains': 'ILIKE %s',
                'istartswith': 'ILIKE %s',
                'iendswith': 'ILIKE %s',
            })
            self.pattern_ops.update({
                'icontains': "ILIKE '%%' || {} || '%%'",
                'istartswith': "ILIKE {} || '%%'",
                'iendswith': "ILIKE '%%' || {}",
            })
            super(DatabaseWrapper, self).__init__(*args, **kwargs)
    
    0 讨论(0)
  • I found a 12/2020 article that uses the newest version of Django ORM as such:

    class Author(models.Model):
        first_name = models.CharField(max_length=100)
        last_name = models.CharField(max_length=100)
    
        class Meta:
            indexes = [
                GinIndex(
                    name='review_author_ln_gin_idx', 
                    fields=['last_name'], 
                    opclasses=['gin_trgm_ops'],
                )
            ]
    

    Inspired from an old article on this subject, I landed to a current one which gives the following solution for a GistIndex:

    Update: From Django-1.11 things seem to be simpler, as this answer and django docs sugest:

    from django.contrib.postgres.indexes import GinIndex
    
    class MyModel(models.Model):
        the_field = models.CharField(max_length=512, db_index=True)
    
        class Meta:
            indexes = [GinIndex(fields=['the_field'])]
    

    From Django-2.2, an attribute opclasses will be available in class Index(fields=(), name=None, db_tablespace=None, opclasses=()) for this purpose.


    from django.contrib.postgres.indexes import GistIndex
    
    class GistIndexTrgrmOps(GistIndex):
        def create_sql(self, model, schema_editor):
            # - this Statement is instantiated by the _create_index_sql()
            #   method of django.db.backends.base.schema.BaseDatabaseSchemaEditor.
            #   using sql_create_index template from
            #   django.db.backends.postgresql.schema.DatabaseSchemaEditor
            # - the template has original value:
            #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s)%(extra)s"
            statement = super().create_sql(model, schema_editor)
            # - however, we want to use a GIST index to accelerate trigram
            #   matching, so we want to add the gist_trgm_ops index operator
            #   class
            # - so we replace the template with:
            #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgrm_ops)%(extra)s"
            statement.template =\
                "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgm_ops)%(extra)s"
    
            return statement
    

    Which you can then use in your model class like this:

    class YourModel(models.Model):
        some_field = models.TextField(...)
    
        class Meta:
            indexes = [
                GistIndexTrgrmOps(fields=['some_field'])
            ]
    
    0 讨论(0)
  • 2021-02-02 01:58

    This already has an answer, but in Django 2.2 you can do this much easier:

    class MyModel(models.Model):
       name = models.TextField()
       class Meta:
           indexes = [GistIndex(name="gist_trgm_idx", fields=("name",), opclasses=("gist_trgm_ops",))]
    
    

    Alternatively you can use GinIndex.

    0 讨论(0)
  • 2021-02-02 01:59

    To make Django 2.2 use the index for icontains and similar searches:

    Subclass GinIndex to make an case insensitive index (uppercasing all field values):

    from django.contrib.postgres.indexes import GinIndex
    
    class UpperGinIndex(GinIndex):
    
        def create_sql(self, model, schema_editor, using=''):
            statement = super().create_sql(model, schema_editor, using=using)
            quote_name = statement.parts['columns'].quote_name
    
            def upper_quoted(column):
                return f'UPPER({quote_name(column)})'
            statement.parts['columns'].quote_name = upper_quoted
            return statement
    

    Add the index to your model like this, including kwarg name which is required when using opclasses:

    class MyModel(Model):
        name = TextField(...)
    
        class Meta:
            indexes = [
                UpperGinIndex(fields=['name'], name='mymodel_name_gintrgm', opclasses=['gin_trgm_ops'])
            ]
    

    Generate the migration and edit the generated file:

    # Generated by Django 2.2.3 on 2019-07-15 10:46
    from django.contrib.postgres.operations import TrigramExtension  # <<< add this
    from django.db import migrations
    import myapp.models
    
    
    class Migration(migrations.Migration):
    
        operations = [
            TrigramExtension(),   # <<< add this
            migrations.AddIndex(
                model_name='mymodel',
                index=myapp.models.UpperGinIndex(fields=['name'], name='mymodel_name_gintrgm', opclasses=['gin_trgm_ops']),
            ),
        ]
    
    0 讨论(0)
  • 2021-02-02 02:02

    In case someone want to have index on multiple columns joined (concatenated) with space you can use my modicitaion of built-in index.

    Creates index like gin (("column1" || ' ' || "column2" || ' ' || ...) gin_trgm_ops)

    class GinSpaceConcatIndex(GinIndex):
    
        def get_sql_create_template_values(self, model, schema_editor, using):
    
            fields = [model._meta.get_field(field_name) for field_name, order in self.fields_orders]
            tablespace_sql = schema_editor._get_index_tablespace_sql(model, fields)
            quote_name = schema_editor.quote_name
            columns = [
                ('%s %s' % (quote_name(field.column), order)).strip()
                for field, (field_name, order) in zip(fields, self.fields_orders)
            ]
            return {
                'table': quote_name(model._meta.db_table),
                'name': quote_name(self.name),
                'columns': "({}) gin_trgm_ops".format(" || ' ' || ".join(columns)),
                'using': using,
                'extra': tablespace_sql,
            }
    
    0 讨论(0)
提交回复
热议问题