Best way to use PostgreSQL full text search ranking

后端 未结 3 1418
旧时难觅i
旧时难觅i 2021-02-01 23:33

Following on from this answer I want to know what the best way to use PostgreSQL\'s built-in full text search is if I want to sort by rank, and limit to only matching q

相关标签:
3条回答
  • 2021-02-02 00:09

    If I use the version with repeated to_tsvector(...) will it call that twice, or is it smart enough to cache the results somehow?

    The best way to notice these things is to do a simple explain, although those can be hard to read.

    Long story short, yes, PostgreSQL is smart enough to reuse computed results.

    Is there a way to do it without repeating the to_ts... function calls?

    What I usually do is add a tsv column which is the text search vector. If you make this auto update using triggers it immediately gives you the vector easily accessible but it also allows you to selectively update the search index by making the trigger selective.

    Is there a way to use score in the WHERE clause at all?

    Yes, but not with that name. Alternatively you could create a sub-query, but I would personally just repeat it.

    If there is, would it be better to filter by score > 0 or use the @@ thing?

    The simplest version I can think of is this:

    SELECT *
    FROM pictures
    WHERE 'small dog' @@ text_search_vector
    

    The text_search_vector could obviously be replaced with something like to_tsvector('english', pictures.title)

    0 讨论(0)
  • 2021-02-02 00:28
    select *
    from (
        SELECT
            pictures.id,
            ts_rank_cd(to_tsvector('english', pictures.title), 
            to_tsquery('small dog')) AS score
        FROM pictures
    ) s
    WHERE score > 0
    ORDER BY score DESC
    
    0 讨论(0)
  • 2021-02-02 00:33

    The use of the @@ operator will utilize the full text GIN index, while the test for score > 0 would not.

    I created a table as in the Question, but added a column named title_tsv:

    CREATE TABLE test_pictures (
      id        BIGSERIAL,
      title     text,
      title_tsv tsvector
    );
    
    CREATE INDEX ix_pictures_title_tsv ON test_pictures 
        USING gin(title_tsv);
    

    I populated the table with some test data:

    INSERT INTO test_pictures(title, title_tsv)
    SELECT T.data, to_tsvector(T.data) 
    FROM   some_table T;
    

    Then I ran the previously accepted answer with explain analyze:

    EXPLAIN ANALYZE 
    SELECT  score, id, title
    FROM (
        SELECT ts_rank_cd(P.title_tsv, to_tsquery('address & shipping')) AS score
            ,P.id        
            ,P.title
        FROM test_pictures as P
    ) S
    WHERE score > 0
    ORDER BY score DESC;
    

    And got the following. Note the execution time of 5,015 ms

    QUERY PLAN                                                                                                                                    |
    ----------------------------------------------------------------------------------------------------------------------------------------------|
    Gather Merge  (cost=274895.48..323298.03 rows=414850 width=60) (actual time=5010.844..5011.330 rows=1477 loops=1)                             |
      Workers Planned: 2                                                                                                                          |
      Workers Launched: 2                                                                                                                         |
      ->  Sort  (cost=273895.46..274414.02 rows=207425 width=60) (actual time=4994.539..4994.555 rows=492 loops=3)                                |
            Sort Key: (ts_rank_cd(p.title_tsv, to_tsquery('address & shipping'::text))) DESC                                                      |
            Sort Method: quicksort  Memory: 131kB                                                                                                 |
            ->  Parallel Seq Scan on test_pictures p  (cost=0.00..247776.02 rows=207425 width=60) (actual time=17.672..4993.997 rows=492 loops=3) |
                  Filter: (ts_rank_cd(title_tsv, to_tsquery('address & shipping'::text)) > '0'::double precision)                                 |
                  Rows Removed by Filter: 497296                                                                                                  |
    Planning time: 0.159 ms                                                                                                                       |
    Execution time: 5015.664 ms                                                                                                                   |
    

    Now compare that with the @@ operator:

    EXPLAIN ANALYZE
    SELECT ts_rank_cd(to_tsvector(P.title), to_tsquery('address & shipping')) AS score
        ,P.id
        ,P.title
    FROM    test_pictures as P
    WHERE P.title_tsv @@ to_tsquery('address & shipping')
    ORDER BY score DESC;
    

    And the results coming in with an execution time of about 29 ms:

    QUERY PLAN                                                                                                                                       |
    -------------------------------------------------------------------------------------------------------------------------------------------------|
    Gather Merge  (cost=13884.42..14288.35 rows=3462 width=60) (actual time=26.472..26.942 rows=1477 loops=1)                                        |
      Workers Planned: 2                                                                                                                             |
      Workers Launched: 2                                                                                                                            |
      ->  Sort  (cost=12884.40..12888.73 rows=1731 width=60) (actual time=17.507..17.524 rows=492 loops=3)                                           |
            Sort Key: (ts_rank_cd(to_tsvector(title), to_tsquery('address & shipping'::text))) DESC                                                  |
            Sort Method: quicksort  Memory: 171kB                                                                                                    |
            ->  Parallel Bitmap Heap Scan on test_pictures p  (cost=72.45..12791.29 rows=1731 width=60) (actual time=1.781..17.268 rows=492 loops=3) |
                  Recheck Cond: (title_tsv @@ to_tsquery('address & shipping'::text))                                                                |
                  Heap Blocks: exact=625                                                                                                             |
                  ->  Bitmap Index Scan on ix_pictures_title_tsv  (cost=0.00..71.41 rows=4155 width=0) (actual time=3.765..3.765 rows=1477 loops=1)  |
                        Index Cond: (title_tsv @@ to_tsquery('address & shipping'::text))                                                            |
    Planning time: 0.214 ms                                                                                                                          |
    Execution time: 28.995 ms                                                                                                                        |
    

    As you can see in the execution plan, the index ix_pictures_title_tsv was used in the second query, but not in the first one, making the query with the @@ operator a whopping 172 times faster!

    0 讨论(0)
提交回复
热议问题