How to handle special characters when converting from HTML to DocX

后端 未结 1 1057
死守一世寂寞
死守一世寂寞 2021-01-15 19:00

I have a application that converts html files to DocX using DocX4J. I´m having problems with special characters like ç,á,é,í,ã,etc. My text font in the html files is Arial b

相关标签:
1条回答
  • Following the tip given by JasonPlutext, I found an example of how to map a font to the XHTMLImporter at the DocX4J forum (http://www.docx4java.org/forums/docx-java-f6/docx-to-html-and-back-to-docx-t1913.html).

    Now my code is working! See the final version below.


    public WordprocessingMLPackage export(String xhtml) {
    
    WordprocessingMLPackage wordMLPackage = null;
    try {
        RFonts arialRFonts = Context.getWmlObjectFactory().createRFonts();
        arialRFonts.setAscii("Arial");
        arialRFonts.setHAnsi("Arial");
        XHTMLImporterImpl.addFontMapping("Arial", arialRFonts);
    
        wordMLPackage = WordprocessingMLPackage.createPackage();
        XHTMLImporter importer = new XHTMLImporterImpl(wordMLPackage);
        List<Object> content = importer.convert(xhtml,null);
        wordMLPackage.getMainDocumentPart().getContent().addAll(content);
    }
    catch (Docx4JException e) {
        // ...
    }
    return wordMLPackage;
    }
    
    0 讨论(0)
提交回复
热议问题