Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?

前端 未结 9 1872
遥遥无期
遥遥无期 2020-11-22 14:20

I\'m building a Django site and I am looking for a search engine.

A few candidates:

  • Lucene/Lucene with Compass/Solr

  • Sphinx

9条回答
  •  难免孤独
    2020-11-22 15:03

    Good to see someone's chimed in about Lucene - because I've no idea about that.

    Sphinx, on the other hand, I know quite well, so let's see if I can be of some help.

    • Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings.
    • Indexing speed is super-fast, because it talks directly to the database. Any slowness will come from complex SQL queries and un-indexed foreign keys and other such problems. I've never noticed any slowness in searching either.
    • I'm a Rails guy, so I've no idea how easy it is to implement with Django. There is a Python API that comes with the Sphinx source though.
    • The search service daemon (searchd) is pretty low on memory usage - and you can set limits on how much memory the indexer process uses too.
    • Scalability is where my knowledge is more sketchy - but it's easy enough to copy index files to multiple machines and run several searchd daemons. The general impression I get from others though is that it's pretty damn good under high load, so scaling it out across multiple machines isn't something that needs to be dealt with.
    • There's no support for 'did-you-mean', etc - although these can be done with other tools easily enough. Sphinx does stem words though using dictionaries, so 'driving' and 'drive' (for example) would be considered the same in searches.
    • Sphinx doesn't allow partial index updates for field data though. The common approach to this is to maintain a delta index with all the recent changes, and re-index this after every change (and those new results appear within a second or two). Because of the small amount of data, this can take a matter of seconds. You will still need to re-index the main dataset regularly though (although how regularly depends on the volatility of your data - every day? every hour?). The fast indexing speeds keep this all pretty painless though.

    I've no idea how applicable to your situation this is, but Evan Weaver compared a few of the common Rails search options (Sphinx, Ferret (a port of Lucene for Ruby) and Solr), running some benchmarks. Could be useful, I guess.

    I've not plumbed the depths of MySQL's full-text search, but I know it doesn't compete speed-wise nor feature-wise with Sphinx, Lucene or Solr.

提交回复
热议问题