Two possibilities: either use Microsoft's spec to write your own parser for the .doc format, or use an existing library for the purpose (e.g., from Aspose). Unless you have a couple of spare years to spend on the task, the latter is clearly the correct choice.