Hi
I have lucene index that is frequently updating with new records, I have 5,000,000 records in my index and I\'m caching one of my numeric fields using FieldCache. but af
The FieldCache uses weak references to index readers as keys for their cache. (By calling IndexReader.GetCacheKey
which has been un-obsoleted.) A standard call to IndexReader.Open
with a FSDirectory
will use a pool of readers, one for every segment.
You should always pass the innermost reader to the FieldCache. Check out ReaderUtil
for some helper stuff to retrieve the individual reader a document is contained within. Document ids wont change within a segment, what they mean when describing it as unpredictable/volatile is that it will change between two index commits. Deleted documents could have been proned, segments have been merged, and such actions.
A commit needs to remove the segment from disk (merged/optimized away), which means that new readers wont have the pooled segment reader, and the garbage collection will remove it as soon as all older readers are closed.
Never, ever, call FieldCache.PurgeAllCaches()
. It's meant for testing, not production use.
Added 2011-04-03; example code using subreaders.
var directory = FSDirectory.Open(new DirectoryInfo("index"));
var reader = IndexReader.Open(directory, readOnly: true);
var documentId = 1337;
// Grab all subreaders.
var subReaders = new List<IndexReader>();
ReaderUtil.GatherSubReaders(subReaders, reader);
// Loop through all subreaders. While subReaderId is higher than the
// maximum document id in the subreader, go to next.
var subReaderId = documentId;
var subReader = subReaders.First(sub => {
if (sub.MaxDoc() < subReaderId) {
subReaderId -= sub.MaxDoc();
return false;
}
return true;
});
var values = FieldCache_Fields.DEFAULT.GetInts(subReader, "newsdate");
var value = values[subReaderId];
Here's one way I've solved this problem. You'll need to create a background thread to construct IndexSearcher
instances, one at a time on some interval. Continue using your current IndexSearcher
instance until a new one from the background thread is ready. Then swap out the new one to be your current one. Each instance acts as a snapshot of the index from the time that it was first opened. Note that the memory overhead for FieldCache
doubles because you need two instances in memory at once. You can safely write to IndexWriter
while this is happening.
If you need to you can take this a step further by making index changes immediately available for search, although it can get tricky. You'll need to associate a RAMDirectory
with each snapshot instance above to keep the changes in memory. Then create a second IndexWriter
that points to that RAMDirectory
. For each index write you'll need to write to both IndexWriter
instances. For searches you'll use a MultiSearcher
across the RAMDirectory
and your normal index on disk. The RAMDirectory
can be thrown away once the IndexSearcher
it was coupled with is no longer used. I'm glossing over some details here, however that's the general idea.
Hope this helps.