I\'m really puzzled why it keeps dying with java.lang.OutOfMemoryError during indexing even though it has a few GBs of memory.
Is there a fundamental reason why it needs
Looking at the stack trace, it looks like you are performing a search, and sorting by a field. If you need to sort by a field, internally Lucene needs to load up all the values of all the terms in the field into memory. If the field contains a lot of data, then it is very possible that you may run out of memory.
a wild guess, the documents you are indexing are very large
Lucene by default only indexes the first 10,000 terms of a document to avoid OutOfMemory errors, you can overcome this limit see setMaxFieldLength
Also, you could call optimize() and close as soon as you are done with processing with Indexwriter()
a definite way is to profile and find the bottleneck =]
You are using the post.jar to index data? This jar has a bug in solr1.2/1.3 I think (but I don't know the details). Our company has fixed this internally and it should be also fixed in the latest trunk solr1.4/1.5.
I was using this Java:
$ java -version
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)
Which was running out of heap space, but then I upgraded to this Java:
$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
And now it works fine, on a huge dataset, with lots of term facets.
For me it worked after restarting the Tomcat server.