I\'m working on a code to make a MS Word to HTML system. After googleing for about half a minute, I was able to find the code which does exactly what I need. Now.. It works
The Interop library is not a "working" library in itself, it is only a wrapper around winword.exe for .NET programs, so using this library does not make any sense if you don't install or use Microsoft Word.
Instead you will need to find a library that allows for manipulating Word Documents. If you can constrain the documents to be in the new format (docx), then it will be quite an easy task, e.g. using the OOXML SDK (as proposed by Stilgar, too). But there are libraries for the old format, too.
Update: I have to admit, although I was convinced I searched and found some libraries for the old doc format before, I do not manage to find those anymore, probably because the result lists is "spoiled" by the many offers for docx. To be clear:
If you can afford to stick to docx (2007 or later) format, you should do that. Office Open XML is a (more or less) open standard based on ZIP and XML, and many tools already exist and will be developed in the future. The old format is much less supported nowadays.
If you have to go for the old format, too, then Aspose (as proposed by Uwe) is the only library I found.
I think the OOXML SDK may contain something but it will only work with docx and not with the old doc.
As for the old formats I am also interested in a cheap and easy way to support them without the need to use the Automation APIs
you can use Code7248.word_reader.dll
below is the sample code on how to use Code7248.word_reader.dll
add reference to this DLL in your project and copy below code.
using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;
namespace testWordRead
{
class Program
{
private void readFileContent(string path)
{
TextExtractor extractor = new TextExtractor(path);
string text = extractor.ExtractText();
Console.WriteLine(text);
}
static void Main(string[] args)
{
Program cs = new Program();
string path = "D:\Test\testdoc1.docx";
cs.readFileContent(path);
Console.ReadLine();
}
}
}