POI read sentence from Word document

别说谁变了你拦得住时间么 提交于 2021-02-08 07:28:33

问题


I have written java program to read data from excel and to replace the same available in word document with Apache POI.The problem is poi reads only the word not the sentence:

XWPFDocument doc = new XWPFDocument(OPCPackage.open("input.docx"));
for (XWPFParagraph p : doc.getParagraphs()) {
    List<XWPFRun> runs = p.getRuns();
    if (runs != null) {
        for (XWPFRun r : runs) {
            String text = r.getText(0);
            if (text != null && text.contains("needle")) {
                text = text.replace("needle", "haystack");
                r.setText(text, 0);
            }
        }
    }
}
for (XWPFTable tbl : doc.getTables()) {
   for (XWPFTableRow row : tbl.getRows()) {
      for (XWPFTableCell cell : row.getTableCells()) {
         for (XWPFParagraph p : cell.getParagraphs()) {
            for (XWPFRun r : p.getRuns()) {
              String text = r.getText(0);
              if (text.contains("needle")) {
                text = text.replace("needle", "haystack");
                r.setText(text);
              }
            }
         }
      }
   }
}
doc.write(new FileOutputStream("output.docx"));

回答1:


Neither Word nor Excel have the concept of a sentence. As a consequence, neither does POI. But in Excel, you have to go to a lot of trouble to style individual words in a cell. Not true with Word. Every time you bold a word, or insert something, or change a letter, or change the font, Word breaks it into a separate run. In fact, you could end up with a bunch of letters in individual runs. To do what you want, you need to concatenate all the runs in the paragraph together, and parse it for whatever your sentence separator is, and then omit false separators such as the period at the end of an abbreviation. This is not easy once you start thinking about it, and in fact, it is very language dependent. For example in English sentences typically end with a period ., exclamation !, or question ? but a period is also used to terminate an abbreviation, and sometimes the sentence terminator is followed by a quote ". English sentences do not have a start character, but some sentences in Spanish do ¡ or ¿.



来源:https://stackoverflow.com/questions/42120787/poi-read-sentence-from-word-document

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!