I wrote a small loop which added 10,000 documents into the IndexWriter and it took for ever to do it.
Is there another way to index large volumes of documents?
I
Just checking, but you haven't got the debugger attached when you're running it have you?
This severely affects performance when adding documents.
On my machine (Lucene 2.0.0.4):
Built with platform target x86:
No debugger - 5.2 seconds
Debugger attached - 113.8 seconds
Built with platform target x64:
No debugger - 6.0 seconds
Debugger attached - 171.4 seconds
Rough example of saving and loading an index to and from a RAMDirectory:
const int DocumentCount = 10 * 1000;
const string IndexFilePath = @"X:\Temp\tmp.idx";
Analyzer analyzer = new StandardAnalyzer();
Directory ramDirectory = new RAMDirectory();
IndexWriter indexWriter = new IndexWriter(ramDirectory, analyzer, true);
for (int i = 0; i < DocumentCount; i++)
{
Document doc = new Document();
string text = "Value" + i;
doc.Add(new Field("Value", text, Field.Store.YES, Field.Index.TOKENIZED));
indexWriter.AddDocument(doc);
}
indexWriter.Close();
//Save index
FSDirectory fileDirectory = FSDirectory.GetDirectory(IndexFilePath, true);
IndexWriter fileIndexWriter = new IndexWriter(fileDirectory, analyzer, true);
fileIndexWriter.AddIndexes(new[] { ramDirectory });
fileIndexWriter.Close();
//Load index
FSDirectory newFileDirectory = FSDirectory.GetDirectory(IndexFilePath, false);
Directory newRamDirectory = new RAMDirectory();
IndexWriter newIndexWriter = new IndexWriter(newRamDirectory, analyzer, true);
newIndexWriter.AddIndexes(new[] { newFileDirectory });
Console.WriteLine("New index writer document count:{0}.", newIndexWriter.DocCount());