Match a phrase ending in a prefix with full text search

后端 未结 4 975
北恋
北恋 2020-12-31 14:36

I\'m looking for a way to emulate something like SELECT * FROM table WHERE attr LIKE \'%text%\' using a tsvector in PostgreSQL.

I\'ve created a tsvector

相关标签:
4条回答
  • 2020-12-31 14:58
    SELECT title
    FROM table
    WHERE title_tsv @@ to_tsquery('zend') and
    title_tsv @@ to_tsquery('fram:*')  
    

    is equivalent to:

    SELECT title
    FROM table
    WHERE title_tsv @@ to_tsquery('zend & fram:*')
    

    but of course that finds "Zend has no framework" as well.

    You could of course express a regular expression match against title after the tsquery match, but you would have to use explain analyze to make sure that was being executed after the tsquery instead of before.

    0 讨论(0)
  • 2020-12-31 15:03

    Not a pretty solution, but it should do the job:

    psql=# SELECT regexp_replace(cast(plainto_tsquery('Zend Fram') as text), E'(\'\\w+\')', E'\\1:*', 'g') ;
       regexp_replace    
    ---------------------
     'zend':* & 'fram':*
    (1 row)
    

    It can be used like:

    psql=# SELECT title FROM table WHERE title_tsv(title) @@ to_tsquery(regexp_replace(cast(plainto_tsquery('Zend Fram') as text), E'(\'\\w+\')', E'\\1:*', 'g'));
    

    How this works:

    1. casts the plain tsquery to a string: cast(plainto_tsquery('Zend Fram') as text)
    2. uses regex to append the :* prefix matcher to each search term: regexp_replace(..., E'(\'\\w+\')', E'\\1:*', 'g')
    3. converts it back to a non-plain tsquery. to_tsquery(...)
    4. and uses it in the search expression SELECT title FROM table WHERE title_tsv(title) @@ ...
    0 讨论(0)
  • 2020-12-31 15:03

    There's a way to do it in Postgres using trigrams and Gin/Gist indexes. There's a simple example, but with some rough edges, in this article by Kristo Kaiv: Substring Search.

    0 讨论(0)
  • 2020-12-31 15:14

    Postgres 9.6 introduces phrase search capabilities for full text search. So this works now:

    SELECT title
    FROM  tbl
    WHERE title_tsv @@ to_tsquery('zend <-> fram:*');

    <-> being the FOLLOWED BY operator.

    It finds 'foo Zend framework bar' or 'Zend frames', but not 'foo Zend has no framework bar'.

    Quoting the release notes for Postgres 9.6:

    A phrase-search query can be specified in tsquery input using the new operators <-> and <N>. The former means that the lexemes before and after it must appear adjacent to each other in that order. The latter means they must be exactly N lexemes apart.

    For best performance support the query with a GIN index:

    CREATE INDEX tbl_title_tsv_idx ON tbl USING GIN (title_tsv);
    

    Or don't store title_tsv in the table at all (bloating it and complicating writes). You can use an expression index instead:

    CREATE INDEX tbl_title_tsv_idx ON tbl USING GIN (to_tsvector('english', title));
    

    You need to specify the text search configuration (often language-specific) to make the expression immutable. And adapt the query accordingly:

    ...
    WHERE to_tsvector('english', title) @@ to_tsquery('english', 'zend <-> fram:*');
    
    0 讨论(0)
提交回复
热议问题