问题
I have written java program to read data from excel and to replace the same available in word document with Apache POI.The problem is poi reads only the word not the sentence:
XWPFDocument doc = new XWPFDocument(OPCPackage.open("input.docx"));
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
if (text != null && text.contains("needle")) {
text = text.replace("needle", "haystack");
r.setText(text, 0);
}
}
}
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text.contains("needle")) {
text = text.replace("needle", "haystack");
r.setText(text);
}
}
}
}
}
}
doc.write(new FileOutputStream("output.docx"));
回答1:
Neither Word nor Excel have the concept of a sentence. As a consequence, neither does POI. But in Excel, you have to go to a lot of trouble to style individual words in a cell. Not true with Word. Every time you bold a word, or insert something, or change a letter, or change the font, Word breaks it into a separate run. In fact, you could end up with a bunch of letters in individual runs. To do what you want, you need to concatenate all the runs in the paragraph together, and parse it for whatever your sentence separator is, and then omit false separators such as the period at the end of an abbreviation. This is not easy once you start thinking about it, and in fact, it is very language dependent. For example in English sentences typically end with a period .
, exclamation !
, or question ?
but a period is also used to terminate an abbreviation, and sometimes the sentence terminator is followed by a quote "
. English sentences do not have a start character, but some sentences in Spanish do ¡
or ¿
.
来源:https://stackoverflow.com/questions/42120787/poi-read-sentence-from-word-document