Lucene IndexWriter slow to add documents

后端 未结 2 981
时光说笑 2021-02-09 09:15

I wrote a small loop which added 10,000 documents into the IndexWriter and it took for ever to do it.

Is there another way to index large volumes of documents?


  •  旧时难觅i
    2021-02-09 10:12

    Just checking, but you haven't got the debugger attached when you're running it have you?

    This severely affects performance when adding documents.

    On my machine (Lucene

    Built with platform target x86:

    • No debugger - 5.2 seconds

    • Debugger attached - 113.8 seconds

    Built with platform target x64:

    • No debugger - 6.0 seconds

    • Debugger attached - 171.4 seconds

    Rough example of saving and loading an index to and from a RAMDirectory:

    const int DocumentCount = 10 * 1000;
    const string IndexFilePath = @"X:\Temp\tmp.idx";
    Analyzer analyzer = new StandardAnalyzer();
    Directory ramDirectory = new RAMDirectory();
    IndexWriter indexWriter = new IndexWriter(ramDirectory, analyzer, true);
    for (int i = 0; i < DocumentCount; i++)
        Document doc = new Document();
        string text = "Value" + i;
        doc.Add(new Field("Value", text, Field.Store.YES, Field.Index.TOKENIZED));
    //Save index
    FSDirectory fileDirectory = FSDirectory.GetDirectory(IndexFilePath, true);
    IndexWriter fileIndexWriter = new IndexWriter(fileDirectory, analyzer, true);
    fileIndexWriter.AddIndexes(new[] { ramDirectory });
    //Load index
    FSDirectory newFileDirectory = FSDirectory.GetDirectory(IndexFilePath, false);
    Directory newRamDirectory = new RAMDirectory();
    IndexWriter newIndexWriter = new IndexWriter(newRamDirectory, analyzer, true);
    newIndexWriter.AddIndexes(new[] { newFileDirectory });
    Console.WriteLine("New index writer document count:{0}.", newIndexWriter.DocCount());
