Single character text search alternative

问题

Requirement: ensure single character ci text search over compound columns is processed in most efficient and performant way including relevance weight sorting;
Having a table create table test_search (id int primary key, full_name varchar(300) not null, short_name varchar(30) not null); with 3 mln rows suggester api call sends queries to db starting from first input character and first 20 results ordered by relevance should be returned.

Options/disadvantages:

like lower() / ilike over '%c%': slow on big dataset, no relevance;
pg_trgm with trigram based search like/ilike + compound gin/gist index: single character cannot be splitted into several trigrams so search is done via table fullscan, no relevance;
fulltext search via setweight(to_tsvector(lower())) gin/gist index: relevance based output but less results because of tokens exclude single characters;

Are there other options available to improve single character search? How to improve or mix mentioned above to get the best result? How to force fulltext to skip stoplist and create all possible lexemes like it is possible for sqlserver?

回答1:

Full-text search won't help you at all with this, because only whole words are indexed, and you cannot search for substrings.

The best you can probably do is use this function:

CREATE FUNCTION get_chars(text) RETURNS char(1)[]
   LANGUAGE sql IMMUTABLE AS
$$SELECT array_agg(DISTINCT x)::char(1)[] FROM regexp_split_to_table($1, '') AS x$$;

Then index

CREATE INDEX ON test_search USING gin (get_chars(full_name || short_name));

and search like

SELECT * FROM test_search
WHERE get_chars(full_name || short_name) @> ARRAY['c']::char(1)[];

For frequent characters, this query should still use a sequential scan, since that is the best access method. But for rare characters you may be faster that way.

来源：https://stackoverflow.com/questions/59389553/single-character-text-search-alternative

标签

postgresql

postgresql-10