HTML to Markdown with Java

前端 未结 5 2009
臣服心动
臣服心动 2020-12-04 06:52

is there an easy way to transform HTML into markdown with JAVA?

I am currently using the Java MarkdownJ library to transform markdown to html.

相关标签:
5条回答
  • 2020-12-04 07:16

    if you are using WMD editor and want to get the markdown code on the server side, just use these options before loading the wmd.js script:

    wmd_options = {
            // format sent to the server.  can also be "HTML"
            output: "Markdown",
    
            // line wrapping length for lists, blockquotes, etc.
            lineLength: 40,
    
            // toolbar buttons.  Undo and redo get appended automatically.
            buttons: "bold italic | link blockquote code image | ol ul heading hr",
    
            // option to automatically add WMD to the first textarea found.
            autostart: true
        };
    
    0 讨论(0)
  • 2020-12-04 07:19

    I came across Remark for converting HTML to Markdown see: http://remark.overzealous.com/manual/index.html It depends on JSoup, a powerful Java library for working with real-world HTML.

    Edit From the creator: please note that Atlassian has lost my repo, and I no longer support this library or have it available publicly.

    0 讨论(0)
  • 2020-12-04 07:27

    There is a great library for JS called Turndown, you can try it online here. It works for htmls that the accepted answer errors out.

    I needed it for Java (as the question), so I ported it. The library for Java is called CopyDown, it has the same test suite as Turndown and I've tried it with real examples that the accepted answer was throwing errors.

    To install with gradle:

    dependencies {
            compile 'io.github.furstenheim:copy_down:1.0'
    }
    

    Then to use it:

    CopyDown converter = new CopyDown();
    String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
    String markdown = converter.convert(myHtml);
    System.out.println(markdown);
    > Some title\n==========\n\nSome html\n\nAnother paragraph\n
    

    PS. It has MIT license

    0 讨论(0)
  • 2020-12-04 07:29

    I am working on the same issue, and experimenting with a couple different techniques.

    The answer above could work. You could use the jTidy library to do the initial cleanup work and convert from HTML to XHTML. You use the XSLT stylesheet linked above.

    Unfortunately there is no library that has a one-stop function to do this in Java. You could try using the Python script html2text with Jython, but I haven't yet tried this!

    0 讨论(0)
  • 2020-12-04 07:31

    Use this XSLT.

    If you need help using XSLT and Java here's a code snippet:

    public static void main(String[] args) throws Exception {
    
            File xsltFile = new File("mardownXSLT.xslt");
    
            Source xmlSource = new StreamSource(new StringReader(theHTML));
            Source xsltSource = new StreamSource(xsltFile);
    
            TransformerFactory transFact =
                    TransformerFactory.newInstance();
            Transformer trans = transFact.newTransformer(xsltSource);
    
            StringWriter result = new StringWriter();
            trans.transform(xmlSource, new StreamResult(result));
        }
    
    0 讨论(0)
提交回复
热议问题