问题
I have a bunch of text rows in a PostgreSQL table and I am trying to find common strings.
For example, let's say I have a basic table like:
CREATE TABLE a (id serial, value text);
INSERT INTO a (value) VALUES
('I go to the movie theater'),
('New movie theater releases'),
('Coming out this week at your local movie theater'),
('New exposition about learning disabilities at the children museum'),
('The genius found in learning disabilities')
;
I am trying to locate popular strings like movie theater
and learning disabilities
across all the rows (the goal is to show a list of "trending" strings king of like Twitter "Trends")
I use full text search and I have tried to use ts_stat
combined with ts_headline
but the results are quite disappointing.
Any thoughts? thanks!
回答1:
There is no ready-to-use Posgres text search feature to find most popular phrases. For two-words phrases you can use ts_stat()
to find most popular words, eliminate particles, prepositions etc, and cross join these words to find most popular pairs.
For an actual data you would want to change values marked as --> parameter.
The query may be quite expensive on a larger dataset.
with popular_words as (
select word
from ts_stat('select value::tsvector from a')
where nentry > 1 --> parameter
and not word in ('to', 'the', 'at', 'in', 'a') --> parameter
)
select concat_ws(' ', a1.word, a2.word) phrase, count(*)
from popular_words as a1
cross join popular_words as a2
cross join a
where value ilike format('%%%s %s%%', a1.word, a2.word)
group by 1
having count(*) > 1 --> parameter
order by 2 desc;
phrase | count
-----------------------+-------
movie theater | 3
learning disabilities | 2
(2 rows)
回答2:
How about something like:
SELECT * FROM a WHERE value LIKE '%movie theater%';
This would find rows which match the pattern 'movie theater' somewhere in the value column (and could include any number of characters before or after it).
来源:https://stackoverflow.com/questions/42702888/locate-popular-strings-with-postgresql