I need to retrieve certain rows from a table depending on certain values in a specific column, named columnX in the example:
select *
from t
I agree with @Quassnoi, a GIN index is fastest and simplest - unless write performance or disk space are issues because it occupies a lot of space and eats quite a bit of performance for INSERT
, UPDATE
and DELETE
.
My additional answer is triggered by your statement:
I can't find a better approach than using similar to.
If that is what you found, then your search isn't over, yet. SIMILAR TO
is a complete waste of time. Literally. PostgreSQL only features it to comply to the (weird) SQL standard. Inspect the output of EXPLAIN ANALYZE
for your query and you will find that SIMILAR TO
has been replaced by a regular expression.
Internally every SIMILAR TO
expression is rewritten to a regular expression. Consequently, for each and every SIMILAR TO
expression there is at least one regular expression match that is a bit faster. Let EXPLAIN ANALYZE
translate it for you, if you are not sure. You won't find this in the manual, PostgreSQL does not promise to do it this way, but I have yet to see an exception.
More details in this related answer on dba.SE.
This strikes me as a data modelling issue. You appear to be using a text
field as a set, storing single character codes to identify values present in the set.
If so, I'd want to remodel this table to use one of the following approaches:
Standard relational normalization. Drop columnX
, and replace it with a new table with a foreign key reference to tableName(id)
and a charcode
column that contains one character from the old columnX
per row, like CREATE TABLE tablename_columnx_set(tablename_id integer not null references tablename(id), charcode "char", primary key (tablename_id, charcode))
. You can then fairly efficiently search for keys in columnX
using normal SQL subqueries, joins, etc. If your application can't cope with that change you could always keep columnX
and maintain the side table using triggers.
Convert columnX
to a hstore
of keys with a dummy value. You can then use hstore operators like columnX ?| ARRAY['A','B','C']
. A GiST
index on the hstore of columnX
should provide fairly solid performance for those operations.
Split to an array as recommended by Quassnoi if your table change rate is low and you can pay the costs of the GIN index;
Convert columnX
to an array of integers, use intarray
and the intarray GiST index. Have a mapping table of codes to integers or convert in the application.
Time permitting I'll follow up with demos of each. Making up the dummy data is a pain, so it'll depend on what else is going on.
If you are only going to search lists of one-character values, then split each string into an array of characters and index the array:
CREATE INDEX
ix_tablename_columnxlist
ON tableName
USING GIN((REGEXP_SPLIT_TO_ARRAY(columnX, '')))
then search against the index:
SELECT *
FROM tableName
WHERE REGEXP_SPLIT_TO_ARRAY(columnX, '') && ARRAY['A', 'B', 'C', '1', '2', '3']
I'll post this as an answer because it may guide other people in the future: Why not have 6 columns, haveA
, haveB
~ have3
and do a 6-part OR
query? Or use a bitmask?
If there are too many attributes to assign a column each, I might try creating an "attribute" table:
(fkey, attr) VALUES (1, 'A'), (1, 'B'), (2, '3')
and let the DBMS worry about the optimization.