How might I index PDF files using Lucene.Net?

前端未结

关注

 2  1026

忘掉有多难 2021-02-03 15:55

I\'m looking for some sample code demonstrating how to index PDF documents using Lucene.Net and C#. Google turned up a few, but none that I could find helpful.

2条回答

生来不讨喜 (楼主)

2021-02-03 16:29
```
StringBuilder stringBuilder = new StringBuilder();

PdfReader pdfReader = new PdfReader(byte[] of the .pdf);

for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
    stringBuilder.Append(PdfTextExtractor.GetTextFromPage(pdfReader, page) + " ");
}
```
(using iTextSharp)

The rest isn't as succinctly illustrated.

There is code in the product demo on my site that shows how to use the lucene.net code, but it is a little long to post here.

Here is the code as pertaining to my product: https://svn.arachnode.net/svn/arachnodenet/trunk/Plugins/CrawlActions/ManageLuceneDotNetIndexes.cs Username/Password: Public
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...