I\'m really puzzled why it keeps dying with java.lang.OutOfMemoryError during indexing even though it has a few GBs of memory.
Is there a fundamental reason why it needs
a wild guess, the documents you are indexing are very large
Lucene by default only indexes the first 10,000 terms of a document to avoid OutOfMemory errors, you can overcome this limit see setMaxFieldLength
Also, you could call optimize() and close as soon as you are done with processing with Indexwriter()
a definite way is to profile and find the bottleneck =]