Lucene performance

前端 未结 2 810
一生所求
一生所求 2021-01-05 13:39

could you please suggest on the steps to be followed for lucene performance. especially with large data (around 1TB of pdf files to be indexed)

相关标签:
2条回答
  • 2021-01-05 14:19

    Please check the tips on the question Optimizing Lucene Performance. Since you are working with large amount of data, you also need to watch the index creation performance. Some tips on improving indexing performance and search performance are available on Lucene Wiki.

    0 讨论(0)
  • 2021-01-05 14:23
    1. Read Scaling Lucene and Solr.
    2. Define your needs from Lucene (for example: you are indexing PDFs - do you need to store the full text, just to make it searchable, or not at all?)
    3. Make a small-scale experiment - index a few documents, see whether retrieval is good enough.
    4. Try to index the whole thing (considering the paper's tips for quick indexing and for indexing for retrieval speed) - Is retrieval good enough? Is performance good enough?
    5. Iterate.
    0 讨论(0)
提交回复
热议问题