How can doc/docx files be converted to markdown or structured text?

后端 未结 11 855
难免孤独
难免孤独 2021-01-29 21:45

Is there a program or workflow to convert .doc or .docx files to Markdown or similar text?

PS: Ideally, I would welcome the option that a spec

11条回答
  •  抹茶落季
    2021-01-29 22:29

    Mammoth is best known as a Word to HTML converter but it now supports a Markdown writer module. When I last checked, Mammoth Markdown support was still in its early stages, so you may find some features are unsupported. As usual ... check the website for the latest details.

    Install

    To use the Javascript version ... install NodeJS and then install Mammoth:

    npm install -g mammoth
    

    Command line

    Command line to convert a Word document to Markdown ...

    mammoth document.docx --output-format=markdown
    

    API

    NodeJS API to convert to Markdown ...

    var mammoth = require("mammoth");
    mammoth.convertToMarkdown({path: "path/to/document.docx"});
    

    Features:

    Mammoth Markdown writer currently supports:

    • Lists (numbered and bulleted)
    • Links
    • Font styles such as bold, italic
    • Images

    The Mammoth command line tools and API have been ported to several languages:

    With NO Markdown (May 2016):

    • .NET
    • Java/JVM
    • Wordpress

    With Markdown:

    • Javascript
    • Python

提交回复
热议问题