I\'m currently designing a full text search system where users perform text queries against MS Office and PDF documents, and the result will return a list of documents that
A bit late to the party but this may help someone :)
I had a similar problem and some research led me to fscrawler. Description:
This crawler helps to index binary documents such as PDF, Open Office, MS Office.
Main features: