I\'m looking for some sample code demonstrating how to index PDF documents using Lucene.Net and C#. Google turned up a few, but none that I could find helpful.
StringBuilder stringBuilder = new StringBuilder();
PdfReader pdfReader = new PdfReader(byte[] of the .pdf);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
stringBuilder.Append(PdfTextExtractor.GetTextFromPage(pdfReader, page) + " ");
}
(using iTextSharp)
The rest isn't as succinctly illustrated.
There is code in the product demo on my site that shows how to use the lucene.net code, but it is a little long to post here.
Here is the code as pertaining to my product: https://svn.arachnode.net/svn/arachnodenet/trunk/Plugins/CrawlActions/ManageLuceneDotNetIndexes.cs Username/Password: Public