incremental indexing lucene

前端 未结 1 1765
名媛妹妹
名媛妹妹 2020-12-20 08:30

I\'m making an application in Java using Lucene 3.6 and want to make an incremental rate. I have already created the index, and I read that you have to do is open the existi

相关标签:
1条回答
  • 2020-12-20 09:08

    According to Lucene data model, you store documents inside the index. Inside each document you will have the fields that you want to index, which are so called "analyzed" and the fields which are not "analyzed", where you can store a timestamp and other information you might need later.

    I have the feeling you have a certain confusion between files and documents, because in your first post you speak about documents and now you are trying to call IndexFileNames.isDocStoreFile(file.getName()) which actually tells only if file is a file containing a Lucene index.

    If you understand Lucene object model, writing the code you need takes approximately three minutes:

    • You have to check if the document is already existing in the index (for example by storing a non-analyzed field containing a unique identifier), by simply querying Lucene.
    • If your query returns 0 documents, you will add the new document to the index
    • If your query returns 1 document, you will get its "timestamp" field and compare it to the one of the new document you are trying to store. Then you can use the docId of the document to delete it from the index, if necessary, to add the new one.

    If on the other side you are sure that you want always to modify the previous value, you can refer to this snippet from Lucene in Action:

    public void testUpdate() throws IOException { 
        assertEquals(1, getHitCount("city", "Amsterdam"));
        IndexWriter writer = getWriter();
        Document doc = new Document();
        doc.add(new Field("id", "1",
        Field.Store.YES,
        Field.Index.NOT_ANALYZED));
        doc.add(new Field("country", "Netherlands",
        Field.Store.YES,
        Field.Index.NO));
        doc.add(new Field("contents",
        "Den Haag has a lot of museums",
        Field.Store.NO,
        Field.Index.ANALYZED));
        doc.add(new Field("city", "Den Haag",
        Field.Store.YES,
        Field.Index.ANALYZED));
        writer.updateDocument(new Term("id", "1"),
        doc);
        writer.close();
        assertEquals(0, getHitCount("city", "Amsterdam"));
        assertEquals(1, getHitCount("city", "Den Haag"));
    }
    

    As you see, the snippets uses a non analyzed ID as I was suggesting to save a queryable - simple attribute, and method updateDocument to first delete and then re-add the doc.

    You might want to directly check the javadoc at

    http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexWriter.html#updateDocument(org.apache.lucene.index.Term,org.apache.lucene.document.Document)

    0 讨论(0)
提交回复
热议问题