How might I index PDF files using Lucene.Net?

前端 未结 2 1026
忘掉有多难
忘掉有多难 2021-02-03 15:55

I\'m looking for some sample code demonstrating how to index PDF documents using Lucene.Net and C#. Google turned up a few, but none that I could find helpful.

2条回答
  •  生来不讨喜
    2021-02-03 16:29

    StringBuilder stringBuilder = new StringBuilder();
    
    PdfReader pdfReader = new PdfReader(byte[] of the .pdf);
    
    for (int page = 1; page <= pdfReader.NumberOfPages; page++)
    {
        stringBuilder.Append(PdfTextExtractor.GetTextFromPage(pdfReader, page) + " ");
    }
    

    (using iTextSharp)

    The rest isn't as succinctly illustrated.

    There is code in the product demo on my site that shows how to use the lucene.net code, but it is a little long to post here.

    Here is the code as pertaining to my product: https://svn.arachnode.net/svn/arachnodenet/trunk/Plugins/CrawlActions/ManageLuceneDotNetIndexes.cs Username/Password: Public

提交回复
热议问题