Lucene IndexWriter slow to add documents

后端 未结 2 981
时光说笑
时光说笑 2021-02-09 09:15

I wrote a small loop which added 10,000 documents into the IndexWriter and it took for ever to do it.

Is there another way to index large volumes of documents?

I

2条回答
  •  旧时难觅i
    2021-02-09 10:12

    Just checking, but you haven't got the debugger attached when you're running it have you?

    This severely affects performance when adding documents.

    On my machine (Lucene 2.0.0.4):

    Built with platform target x86:

    • No debugger - 5.2 seconds

    • Debugger attached - 113.8 seconds

    Built with platform target x64:

    • No debugger - 6.0 seconds

    • Debugger attached - 171.4 seconds

    Rough example of saving and loading an index to and from a RAMDirectory:

    const int DocumentCount = 10 * 1000;
    const string IndexFilePath = @"X:\Temp\tmp.idx";
    
    Analyzer analyzer = new StandardAnalyzer();
    Directory ramDirectory = new RAMDirectory();
    
    IndexWriter indexWriter = new IndexWriter(ramDirectory, analyzer, true);
    
    for (int i = 0; i < DocumentCount; i++)
    {
        Document doc = new Document();
        string text = "Value" + i;
        doc.Add(new Field("Value", text, Field.Store.YES, Field.Index.TOKENIZED));
        indexWriter.AddDocument(doc);
    }
    
    indexWriter.Close();
    
    //Save index
    FSDirectory fileDirectory = FSDirectory.GetDirectory(IndexFilePath, true);
    IndexWriter fileIndexWriter = new IndexWriter(fileDirectory, analyzer, true);
    fileIndexWriter.AddIndexes(new[] { ramDirectory });
    fileIndexWriter.Close();
    
    //Load index
    FSDirectory newFileDirectory = FSDirectory.GetDirectory(IndexFilePath, false);
    Directory newRamDirectory = new RAMDirectory();
    IndexWriter newIndexWriter = new IndexWriter(newRamDirectory, analyzer, true);
    newIndexWriter.AddIndexes(new[] { newFileDirectory });
    
    Console.WriteLine("New index writer document count:{0}.", newIndexWriter.DocCount());
    

提交回复
热议问题