I\'m building a Django site and I am looking for a search engine.
A few candidates:
Lucene/Lucene with Compass/Solr
Sphinx
Good to see someone's chimed in about Lucene - because I've no idea about that.
Sphinx, on the other hand, I know quite well, so let's see if I can be of some help.
I've no idea how applicable to your situation this is, but Evan Weaver compared a few of the common Rails search options (Sphinx, Ferret (a port of Lucene for Ruby) and Solr), running some benchmarks. Could be useful, I guess.
I've not plumbed the depths of MySQL's full-text search, but I know it doesn't compete speed-wise nor feature-wise with Sphinx, Lucene or Solr.
SearchTools-Avi said "MySQL text search, which doesn't even index words of three letters or fewer."
FYIs, The MySQL fulltext min word length is adjustable since at least MySQL 5.0. Google 'mysql fulltext min length' for simple instructions.
That said, MySQL fulltext has limitations: for one, it gets slow to update once you reach a million records or so, ...
I don't know Sphinx, but as for Lucene vs a database full-text search, I think that Lucene performance is unmatched. You should be able to do almost any search in less than 10 ms, no matter how many records you have to search, provided that you have set up your Lucene index correctly.
Here comes the biggest hurdle though: personally, I think integrating Lucene in your project is not easy. Sure, it is not too hard to set it up so you can do some basic search, but if you want to get the most out of it, with optimal performance, then you definitely need a good book about Lucene.
As for CPU & RAM requirements, performing a search in Lucene doesn't task your CPU too much, though indexing your data is, although you don't do that too often (maybe once or twice a day), so that isn't much of a hurdle.
It doesn't answer all of your questions but in short, if you have a lot of data to search, and you want great performance, then I think Lucene is definitely the way to go. If you're not going to have that much data to search, then you might as well go for a database full-text search. Setting up a MySQL full-text search is definitely easier in my book.
I'm looking at PostgreSQL full-text search right now, and it has all the right features of a modern search engine, really good extended character and multilingual support, nice tight integration with text fields in the database.
But it doesn't have user-friendly search operators like + or AND (uses & | !) and I'm not thrilled with how it works on their documentation site. While it has bolding of match terms in the results snippets, the default algorithm for which match terms is not great. Also, if you want to index rtf, PDF, MS Office, you have to find and integrate a file format converter.
OTOH, it's way better than the MySQL text search, which doesn't even index words of three letters or fewer. It's the default for the MediaWiki search, and I really think it's no good for end-users: http://www.searchtools.com/analysis/mediawiki-search/
In all cases I've seen, Lucene/Solr and Sphinx are really great. They're solid code and have evolved with significant improvements in usability, so the tools are all there to make search that satisfies almost everyone.
for SHAILI - SOLR includes the Lucene search code library and has the components to be a nice stand-alone search engine.
Just my two cents to this very old question. I would highly recommend taking a look at ElasticSearch.
Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
The advantages over other FTS (full text search) Engines are:
We are using this search engine at our project and very happy with it.
I have used Solr, in my project and it is the best so far.