Lucene - is it the right answer for huge index?

前端 未结 1 1826
清歌不尽
清歌不尽 2021-02-08 22:31

Is Lucene capable of indexing 500M text documents of 50K each?

What performance can be expected such index, for single term search and for 10 terms search?

Shoul

相关标签:
1条回答
  • 2021-02-08 22:55

    Yes, Lucene should be able to handle this, according to the following article: http://www.lucidimagination.com/content/scaling-lucene-and-solr

    Here's a quote:

    Depending on a multitude of factors, a single machine can easily host a Lucene/Solr index of 5 – 80+ million documents, while a distributed solution can provide subsecond search response times across billions of documents.

    The article goes into great depth about scaling to multiple servers. So you can start small and scale if needed.

    A great resource about Lucene's performance is the blog of Mike McCandless, who is actively involved in the development of Lucene: http://blog.mikemccandless.com/ He often uses Wikipedia's content (25 GB) as test input for Lucene.

    Also, it might be interesting that Twitter's real-time search is now implemented with Lucene (see http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html).

    However, I am wondering if the numbers you provided are correct: 500 million documents x 50 KB = ~23 TB -- Do you really have that much data?

    0 讨论(0)
提交回复
热议问题