I would like to convert doc/docx documents to semantic HTML.
Some wishes/requirements:
Semantic HTML such that headers in the document are
There's a tool called upCast which is able to convert Word documents into XML.