Getting text style from docx using Apache poi

后端 未结 6 1356
清酒与你
清酒与你 2021-01-14 02:41

I\'m trying to get the style information from an MS docx file, I have no problem writing file content with added styles like bold, italic. font size etc, but reading the fil

相关标签:
6条回答
  • 2021-01-14 03:10

    I found a very nice way to copy styles from one document to another. It is not as direct as I would have hoped but it works.

    1. Rename the source word document to type zip
    2. Extract the contents
    3. Copy styles.xml into a string constant or read the file
    4. Copy the styles into your output document with the following code

      public void copyStylesXml(String stylesXmlString) {
         try {
            CTStyles ctStyle = CTStyles.Factory.parse(stylesXmlString);
            XWPFStyles styles = getDoc().createStyles();
            styles.setStyles(ctStyle);
         } catch (Exception e) {
            log.warn(e, e);
         }
      }
      

    The same approach works for copying list formats

    0 讨论(0)
  • 2021-01-14 03:15

    This is the simple trick to get the bold property.

    run.getCTR().xmlText().contains("<w:b w:val=\"1\"/>") return true if bold otherwise false.

    0 讨论(0)
  • 2021-01-14 03:23

    I gave up trying to use Apache poi, I found another lib called docx4j, this seems to do what I need, the properties I want to look at a now available, once the docx file is loaded you can view the content of the file in an xml format like below.

    `

    <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:ns27="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" mc:Ignorable="w14 wp14">
       <w:body>
          <w:p w:rsidR="009A66AB" w:rsidRDefault="000F4AD1">
             <w:r>
                <w:rPr>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t>&quot;Hello, this is</w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rStyle w:val="apple-converted-space"/>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t> </w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rStyle w:val="Strong"/>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:bdr w:val="none" w:color="auto" w:sz="0" w:space="0" w:frame="true"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t>bold text</w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rStyle w:val="apple-converted-space"/>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t> </w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t>and this is</w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rStyle w:val="apple-converted-space"/>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t> </w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rStyle w:val="Emphasis"/>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:bdr w:val="none" w:color="auto" w:sz="0" w:space="0" w:frame="true"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t>italic text</w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rStyle w:val="apple-converted-space"/>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t> </w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t>an</w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t>d this is</w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rStyle w:val="apple-converted-space"/>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t> </w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rStyle w:val="Emphasis"/>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:b/>
                   <w:bCs/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:bdr w:val="none" w:color="auto" w:sz="0" w:space="0" w:frame="true"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t>bold-italic text</w:t>
             </w:r>
             <w:r>
                <w:rPr>
                   <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica"/>
                   <w:color w:val="222222"/>
                   <w:sz w:val="23"/>
                   <w:szCs w:val="23"/>
                   <w:shd w:val="clear" w:color="auto" w:fill="FFFFFF"/>
                </w:rPr>
                <w:t>&quot;</w:t>
             </w:r>
          </w:p>
          <w:sectPr w:rsidR="009A66AB">
             <w:pgSz w:w="11906" w:h="16838"/>
             <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
             <w:cols w:space="708"/>
             <w:docGrid w:linePitch="360"/>
          </w:sectPr>
       </w:body>
    </w:document>
    

    `

    0 讨论(0)
  • 2021-01-14 03:26

    Okay, so based on the comments from Gagravarr, the solution is below, exactly as I wanted. So basically Gagravarr answered the question but I'm not sure how apart from saying it hear to give him credit.

    for (XWPFParagraph paragraph : docx.getParagraphs()) {
                    int pos = 0;
                    for (XWPFRun run : paragraph.getRuns()) {
                        System.out.println("Current run IsBold : " + run.isBold());
                        System.out.println("Current run IsItalic : " + run.isItalic());
                        for (char c : run.text().toCharArray()) {
    
                            System.out.print(c);
                            pos++;
                        }
                        System.out.println();
                    }
                }
    

    `

    Output below

    Current run IsBold : false Current run IsItalic : false "Hello, this is  Current run IsBold : true Current run IsItalic : false bold text Current run IsBold : false Current run IsItalic : false  and this is  Current run IsBold : false Current run IsItalic : true italic text Current run IsBold : false Current run IsItalic : false  a Current run IsBold : false Current run IsItalic : false n Current run IsBold : false Current run IsItalic : false d this is  Current run IsBold : true Current run IsItalic : true bold-italic text Current run IsBold : false Current run IsItalic : false "

    0 讨论(0)
  • 2021-01-14 03:30

    Here is a very good way to copy styles from another document. A little background; a docx file is really a zip file of a number of xml files including styles.xml. In the following code sample I read numberin.xml, parse it into a CTStyles object then set it in the current document. Here is most of the code. You can use the same approach to copy numbering.xml for your Word numbering.

    // copy an existing style.xml document into this document to get styles
    public void copyStylesFromDocument(String documentFileName) {
        log.debug("fileName " + documentFileName);
        try {
            InputStream is = CertificationReportHelper.getInputStreamFromZipFile(documentFileName, FILE_NAME_STYLES);
            CTStyles ctStyle = CTStyles.Factory.parse(is);
            XWPFStyles styles = getDoc().createStyles();
            styles.setStyles(ctStyle);
            log.info("Styles copied from file " + FILE_NAME_STYLES + " in document" + documentFileName);
        } catch (Exception e) {
            String msg = "Error copying styles from file " + FILE_NAME_STYLES + " in document" + documentFileName;
            addErrorMessage(msg, e);
            log.debug(e, e);
        }
        @SuppressWarnings("resource") // closing stream causes input stream to close and operation fails
    public static InputStream getInputStreamFromZipFile(String zipFileName, String containedFile) {
        InputStream is = null;
        ZipFile zfile = null;
        try {
            zfile = new ZipFile(zipFileName);
            ZipEntry entry = zfile.getEntry(containedFile);
            log.trace(entry);
            if (entry != null) {
                is = zfile.getInputStream(entry);
                log.trace("created input stream  for file " + containedFile + " from zip file" + zipFileName);
            } else {
                String msg = "Error getting input stream for file " + containedFile + " from zip file " + zipFileName;
                // closing stream causes input stream to close and operation fails
                throw new ApplicationRuntimeException(msg);
            }
        } catch (Exception e) {
            String msg = "Error getting input stream for file " + containedFile + " from zip file " + zipFileName + "  Message:"
                    + e.getMessage();
            log.warn("*** Throwing exception " + msg);
            throw new ApplicationRuntimeException(msg, e);
        } finally {
            // closing stream causes input stream to close and operation fails
            // try {
            // zfile.close();
            // } catch (IOException e) {
            // log.warn("Catching exception "+e+" closing zip file "+zipFileName);
            // }
        }
        return is;
    
    0 讨论(0)
  • 2021-01-14 03:33

    you can use paragraph.getCTP().getPPr().getRPr().isSetB()

    0 讨论(0)
提交回复
热议问题