I would like to convert doc/docx documents to semantic HTML.
Some wishes/requirements:
Semantic HTML such that headers in the document are
docx4j (for docx only, not doc) writes clean HTML output. You'd need to change things a bit if you wanted
, but its open source so you can do that.