Is there an open-source Java library for reading Word documents (both .docx and the older .doc format)?
Read-only access if sufficient; I do not need to modify the W
public class XParseTest
{
public static void main(String[] args) throws XmlException, OpenXML4JException, IOException
{
File file=new File("e:\\testing\\new.docx");
FileInputStream fs = new FileInputStream(file);
OPCPackage d = OPCPackage.open(fs);
XWPFWordExtractor xw = new XWPFWordExtractor(d);
System.out.println(xw.getText());
}
}
this will parse docx file...
Apache POI HWPF for .doc and XWPF for .docx files
There is an apache project that does this: http://poi.apache.org//