I have posted about parsing pdf's in one of my blogs. Hit this link:
http://devpinoy.org/blogs/marl/archive/2008/03/04/pdf-to-text-using-open-source-library-pdfbox-another-sample-for-grade-1-pupils.aspx
Edit: Link no long works. Below quoted from http://web.archive.org/web/20130507084207/http://devpinoy.org/blogs/marl/archive/2008/03/04/pdf-to-text-using-open-source-library-pdfbox-another-sample-for-grade-1-pupils.aspx
Well, the following is based on popular examples available on the web.
What this does is "read" the pdf file and output it as a text in the
rich text box control in the form. The PDFBox for .NET library can be
downloaded from sourceforge.
You need to add reference to IKVM.GNU.Classpath & PDFBox-0.7.3. And
also, FontBox-0.1.0-dev.dll and PDFBox-0.7.3.dll need to be added on
the bin folder of your application. For some reason I can't recall
(maybe it's from one of the tutorials), I also added to the bin
IKVM.GNU.Classpath.dll.
On the side note, just got my copy of "Head First C#" (on Keith's
suggestion) from Amazon. The book is cool! It is really written for
beginners. This edition covers VS2008 and the framework 3.5.
Here you go...
/* Marlon Ribunal
* Convert PDF To Text
* *******************/
using System;
using System.Collections.Generic;
using System.Drawing;
using System.Windows.Forms;
using System.Drawing.Printing;
using System.IO;
using System.Text;
using System.ComponentModel.Design;
using System.ComponentModel;
using org.pdfbox.pdmodel;
using org.pdfbox.util;
namespace MarlonRibunal.iPdfToText
{
public partial class MainForm : Form
{
public MainForm()
{
InitializeComponent();
}
void Button1Click(object sender, EventArgs e)
{
PDDocument doc = PDDocument.load("C:\\pdftoText\\myPdfTest.pdf");
PDFTextStripper stripper = new PDFTextStripper();
richTextBox1.Text=(stripper.getText(doc));
}
}
}