Is there a program or workflow to convert .doc
or .docx
files to Markdown or similar text?
PS: Ideally, I would welcome the option that a spec
You can use Word to Markdown (Ruby Gem) to convert it in one step. Conversion can be as simple as:
$ gem install word-to-markdown
$ w2m path/to/document.docx
It routes the document through LibreOffice, but also does it best to semantice headings based on their relative font size.
There's also a hosted version which would be as simple as drag-and-drop to convert.
Word to Markdown might be worth a shot, or the procedure described here using Calibre and Pandoc via HTMLZ, here's a bash script they use:
#!/bin/bash
mkdir temp
cp $1 temp
cd temp
ebook-convert $1 output.htmlz
unzip output.htmlz
cd ..
pandoc -f html -t markdown -o output.md temp/index.html
rm -R temp
Pandoc supports conversion from docx to markdown directly:
pandoc -f docx -t markdown foo.docx -o foo.markdown
Several markdown formats are supported:
-t gfm (GitHub-Flavored Markdown)
-t markdown_mmd (MultiMarkdown)
-t markdown (pandoc’s extended Markdown)
-t markdown_strict (original unextended Markdown)
-t markdown_phpextra (PHP Markdown Extra)
-t commonmark (CommonMark Markdown)
For bulleted lists you can paste a list into Sublime Text and use multiselect ( tested ) or find and replace ( not tested ) to replace eg the proprietary MS Word characters with -
, --
etc
This doesn't work with headings but it may be possible to use a similar technique with other elements.
From here:
unoconv -f html test.docx
pandoc -f html -t markdown -o test.md test.html