Lucene IndexWriter slow to add documents

后端 未结 2 980
时光说笑
时光说笑 2021-02-09 09:15

I wrote a small loop which added 10,000 documents into the IndexWriter and it took for ever to do it.

Is there another way to index large volumes of documents?

I

2条回答
  •  Happy的楠姐
    2021-02-09 09:52

    You should do this way to get the best performance. on my machine i'm indexing 1000 document in 1 second

    1) You should reuse (Document, Field) not creating every time you add a document like this

    private static void IndexingThread(object contextObj)
    {
         Range range = (Range)contextObj;
         Document newDoc = new Document();
         newDoc.Add(new Field("title", "", Field.Store.NO, Field.Index.ANALYZED));
         newDoc.Add(new Field("body", "", Field.Store.NO, Field.Index.ANALYZED));
         newDoc.Add(new Field("newsdate", "", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
         newDoc.Add(new Field("id", "", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
    
         for (int counter = range.Start; counter <= range.End; counter++)
         {
             newDoc.GetField("title").SetValue(Entities[counter].Title);
             newDoc.GetField("body").SetValue(Entities[counter].Body);
             newDoc.GetField("newsdate").SetValue(Entities[counter].NewsDate);
             newDoc.GetField("id").SetValue(Entities[counter].ID.ToString());
    
             writer.AddDocument(newDoc);
         }
    }
    

    After that you could use threading and break your large collection into smaller ones, and use the above code for each section for example if you have 10,000 document you can create 10 Thread using ThreadPool and feed each section to one thread for indexing

    Then you will gain the best performance.

提交回复
热议问题