I\'m preparing to deploy a Rails app on Heroku that requires full text search. Up to now I\'ve been running it on a VPS using MySQL with Sphinx.
However, if I want to us
Postgresql's FTS function is mature and fairly fast at lookups. It's worth a look for sure.
Since I just went through the effort of comparing elastic search (1.9) against postgres FTS, I figured I should share my results since they're somewhat more current than the ones @gustavodiazjaimes cites.
My main concern with postgres was that it did not have faceting built in, but that's trivial to build yourself, here's my example (in django):
results = YourModel.objects.filter(vector_search=query)
facets = (results
.values('book')
.annotate(total=Count('book'))
.order_by('book'))
I'm using postgres 9.6 and elastic-search 1.9 (through haystack on django). Here's a comparison between elasticsearch and postgres across 16 various types of queries.
es_times pg_times es_times_faceted pg_times_faceted
0 0.065972 0.000543 0.015538 0.037876
1 0.000292 0.000233 0.005865 0.007130
2 0.000257 0.000229 0.005203 0.002168
3 0.000247 0.000161 0.003052 0.001299
4 0.000276 0.000150 0.002647 0.001167
5 0.000245 0.000151 0.005098 0.001512
6 0.000251 0.000155 0.005317 0.002550
7 0.000331 0.000163 0.005635 0.002202
8 0.000268 0.000168 0.006469 0.002408
9 0.000290 0.000236 0.006167 0.002398
10 0.000364 0.000224 0.005755 0.001846
11 0.000264 0.000182 0.005153 0.001667
12 0.000287 0.000153 0.010218 0.001769
13 0.000264 0.000231 0.005309 0.001586
14 0.000257 0.000195 0.004813 0.001562
15 0.000248 0.000174 0.032146 0.002246
count mean std min 25% 50% 75% max
es_times 16.0 0.004382 0.016424 0.000245 0.000255 0.000266 0.000291 0.065972
pg_times 16.0 0.000209 0.000095 0.000150 0.000160 0.000178 0.000229 0.000543
es_times_faceted 16.0 0.007774 0.007150 0.002647 0.005139 0.005476 0.006242 0.032146
pg_times_faceted 16.0 0.004462 0.009015 0.001167 0.001580 0.002007 0.002400 0.037876
In order to get postgres to these speeds for faceted searches I had to use an GIN index on the field with a SearchVectorField, which is django specific but I'm sure other frameworks have a similar vector type.
One other consideration is that pg 9.6 now supports phrase matching, which is huge.
My take away is that postgres is for most cases going to be preferrable as it offers:
I found this amazing comparison and want to share it:
Full Text Search In PostgreSQL
Time to Build Index LIKE predicate -- none
PostgreSQL / GIN -- 40 min
Sphinx Search -- 6 min
Apache Lucene -- 9 min
Inverted index -- high
Index Storage LIKE predicate -- none
PostgreSQL / GIN -- 532 MB
Sphinx Search -- 533 MB
Apache Lucene -- 1071 MB
Inverted index -- 101 MB
Query Speed LIKE predicate -- 90+ seconds
PostgreSQL / GIN -- 20 ms
Sphinx Search -- 8 ms
Apache Lucene -- 80 ms
Inverted index -- 40 ms
Some fresher results for synthetic customer data (10 million records).
If you're interested in Postgres vs. Lucene, why not both? Check out the ZomboDB extension for Postgres, which integrates Elasticsearch as a first-class index type. Still a fairly early project but it looks really promising to me.
(Technically not available on Heroku, but still worth looking at.)
Disclosure: I'm a cofounder of the Websolr and Bonsai Heroku add-ons, so my perspective is a bit biased toward Lucene.
My read on Postgres full-text search is that it is pretty solid for straightforward use cases, but there are a number of reasons why Lucene (and thus Solr and ElasticSearch) is superior both in terms of performance and functionality.
For starters, jpountz provides a truly excellent technical answer to the question, Why is Solr so much faster than Postgres? It's worth a couple of reads through to really digest.
I also commented on a recent RailsCast episode comparing relative advantages and disadvantages of Postgres full-text search versus Solr. Let me recap that here:
LIKE
operator.Off the top of my head, in no particular order…
Clearly I think a dedicated search engine based on Lucene is the better option here. Basically, you can think of Lucene as the de facto open source repository of search expertise.
But if your only other option is the LIKE
operator, then Postgres full-text search is a definite win.
Postgres's full text search has amazing capabilities in the areas of stemming, ranking/boosting, synonym handling, fuzzy searches among others - but no support for faceted search.
So, if Postgres is already in your stack and you don't need faceting, better try it out to avail the HUGE benefit of ease of keeping indices in sync and maintaining sleek stack, before looking out for Lucene based solutions - at least if all your app is not based on search.