I am writing a custom Lucene.NET indexer to enable indexing of MS Word documents. The indexer must be capable of handling last three releases of MS Word: 2010, 2007 and 2003.
You could you use the IFilter plugins to let you retrieve the contents of the documents and then index them. The interface is originally part of Microsoft Index Service but is generally available for indexing documents.
I looked into the technology a couple of years ago and seem to remember that either the filters for Office documents were built into Windows or could be installed separately from the complete Office package but I may be wrong here.
More about the IFilter technology at IFilter at Wikipedia and IFilter at MSDN. You will have to look into P/Invoke and might get some inspiration IFilter at pinvoke.net.
A sample in C# can be found at MSDN Code Gallery.