FieldCache with frequently updating index

后端 未结 2 1656
独厮守ぢ
独厮守ぢ 2021-01-21 14:12

Hi
I have lucene index that is frequently updating with new records, I have 5,000,000 records in my index and I\'m caching one of my numeric fields using FieldCache. but af

相关标签:
2条回答
  • 2021-01-21 15:01

    The FieldCache uses weak references to index readers as keys for their cache. (By calling IndexReader.GetCacheKey which has been un-obsoleted.) A standard call to IndexReader.Open with a FSDirectory will use a pool of readers, one for every segment.

    You should always pass the innermost reader to the FieldCache. Check out ReaderUtil for some helper stuff to retrieve the individual reader a document is contained within. Document ids wont change within a segment, what they mean when describing it as unpredictable/volatile is that it will change between two index commits. Deleted documents could have been proned, segments have been merged, and such actions.

    A commit needs to remove the segment from disk (merged/optimized away), which means that new readers wont have the pooled segment reader, and the garbage collection will remove it as soon as all older readers are closed.

    Never, ever, call FieldCache.PurgeAllCaches(). It's meant for testing, not production use.

    Added 2011-04-03; example code using subreaders.

    var directory = FSDirectory.Open(new DirectoryInfo("index"));
    var reader = IndexReader.Open(directory, readOnly: true);
    var documentId = 1337;
    
    // Grab all subreaders.
    var subReaders = new List<IndexReader>();
    ReaderUtil.GatherSubReaders(subReaders, reader);
    
    // Loop through all subreaders. While subReaderId is higher than the
    // maximum document id in the subreader, go to next.
    var subReaderId = documentId;
    var subReader = subReaders.First(sub => {
        if (sub.MaxDoc() < subReaderId) {
            subReaderId -= sub.MaxDoc();
            return false;
        }
    
        return true;
    });
    
    var values = FieldCache_Fields.DEFAULT.GetInts(subReader, "newsdate");
    var value = values[subReaderId];
    
    0 讨论(0)
  • 2021-01-21 15:14

    Here's one way I've solved this problem. You'll need to create a background thread to construct IndexSearcher instances, one at a time on some interval. Continue using your current IndexSearcher instance until a new one from the background thread is ready. Then swap out the new one to be your current one. Each instance acts as a snapshot of the index from the time that it was first opened. Note that the memory overhead for FieldCache doubles because you need two instances in memory at once. You can safely write to IndexWriter while this is happening.

    If you need to you can take this a step further by making index changes immediately available for search, although it can get tricky. You'll need to associate a RAMDirectory with each snapshot instance above to keep the changes in memory. Then create a second IndexWriter that points to that RAMDirectory. For each index write you'll need to write to both IndexWriter instances. For searches you'll use a MultiSearcher across the RAMDirectory and your normal index on disk. The RAMDirectory can be thrown away once the IndexSearcher it was coupled with is no longer used. I'm glossing over some details here, however that's the general idea.

    Hope this helps.

    0 讨论(0)
提交回复
热议问题