Best way to use PostgreSQL full text search ranking

后端 未结 3 1419
旧时难觅i
旧时难觅i 2021-02-01 23:33

Following on from this answer I want to know what the best way to use PostgreSQL\'s built-in full text search is if I want to sort by rank, and limit to only matching q

3条回答
  •  感情败类
    2021-02-02 00:33

    The use of the @@ operator will utilize the full text GIN index, while the test for score > 0 would not.

    I created a table as in the Question, but added a column named title_tsv:

    CREATE TABLE test_pictures (
      id        BIGSERIAL,
      title     text,
      title_tsv tsvector
    );
    
    CREATE INDEX ix_pictures_title_tsv ON test_pictures 
        USING gin(title_tsv);
    

    I populated the table with some test data:

    INSERT INTO test_pictures(title, title_tsv)
    SELECT T.data, to_tsvector(T.data) 
    FROM   some_table T;
    

    Then I ran the previously accepted answer with explain analyze:

    EXPLAIN ANALYZE 
    SELECT  score, id, title
    FROM (
        SELECT ts_rank_cd(P.title_tsv, to_tsquery('address & shipping')) AS score
            ,P.id        
            ,P.title
        FROM test_pictures as P
    ) S
    WHERE score > 0
    ORDER BY score DESC;
    

    And got the following. Note the execution time of 5,015 ms

    QUERY PLAN                                                                                                                                    |
    ----------------------------------------------------------------------------------------------------------------------------------------------|
    Gather Merge  (cost=274895.48..323298.03 rows=414850 width=60) (actual time=5010.844..5011.330 rows=1477 loops=1)                             |
      Workers Planned: 2                                                                                                                          |
      Workers Launched: 2                                                                                                                         |
      ->  Sort  (cost=273895.46..274414.02 rows=207425 width=60) (actual time=4994.539..4994.555 rows=492 loops=3)                                |
            Sort Key: (ts_rank_cd(p.title_tsv, to_tsquery('address & shipping'::text))) DESC                                                      |
            Sort Method: quicksort  Memory: 131kB                                                                                                 |
            ->  Parallel Seq Scan on test_pictures p  (cost=0.00..247776.02 rows=207425 width=60) (actual time=17.672..4993.997 rows=492 loops=3) |
                  Filter: (ts_rank_cd(title_tsv, to_tsquery('address & shipping'::text)) > '0'::double precision)                                 |
                  Rows Removed by Filter: 497296                                                                                                  |
    Planning time: 0.159 ms                                                                                                                       |
    Execution time: 5015.664 ms                                                                                                                   |
    

    Now compare that with the @@ operator:

    EXPLAIN ANALYZE
    SELECT ts_rank_cd(to_tsvector(P.title), to_tsquery('address & shipping')) AS score
        ,P.id
        ,P.title
    FROM    test_pictures as P
    WHERE P.title_tsv @@ to_tsquery('address & shipping')
    ORDER BY score DESC;
    

    And the results coming in with an execution time of about 29 ms:

    QUERY PLAN                                                                                                                                       |
    -------------------------------------------------------------------------------------------------------------------------------------------------|
    Gather Merge  (cost=13884.42..14288.35 rows=3462 width=60) (actual time=26.472..26.942 rows=1477 loops=1)                                        |
      Workers Planned: 2                                                                                                                             |
      Workers Launched: 2                                                                                                                            |
      ->  Sort  (cost=12884.40..12888.73 rows=1731 width=60) (actual time=17.507..17.524 rows=492 loops=3)                                           |
            Sort Key: (ts_rank_cd(to_tsvector(title), to_tsquery('address & shipping'::text))) DESC                                                  |
            Sort Method: quicksort  Memory: 171kB                                                                                                    |
            ->  Parallel Bitmap Heap Scan on test_pictures p  (cost=72.45..12791.29 rows=1731 width=60) (actual time=1.781..17.268 rows=492 loops=3) |
                  Recheck Cond: (title_tsv @@ to_tsquery('address & shipping'::text))                                                                |
                  Heap Blocks: exact=625                                                                                                             |
                  ->  Bitmap Index Scan on ix_pictures_title_tsv  (cost=0.00..71.41 rows=4155 width=0) (actual time=3.765..3.765 rows=1477 loops=1)  |
                        Index Cond: (title_tsv @@ to_tsquery('address & shipping'::text))                                                            |
    Planning time: 0.214 ms                                                                                                                          |
    Execution time: 28.995 ms                                                                                                                        |
    

    As you can see in the execution plan, the index ix_pictures_title_tsv was used in the second query, but not in the first one, making the query with the @@ operator a whopping 172 times faster!

提交回复
热议问题