I want to do replacements in MS Word (.docx) document using regular expression (java RegEx):
Example:
…, с одной стороны, и %SOME
As you see, the approach "to do replacements in MS Word (.docx) document using regular expression (java RegEx)" is not really good since you never can be sure that the text to replace will be together in one text-run. Better approach is using fields (merge fields or form fields) or content controls in Word.
My favourites for such requirements are still the good old form fields in Word
.
First advantage is that even without document protection it will not be possible formatting parts of form field content different and so tearing apart the form field content into different runs (but see note 1). Second advantage is that because of the gray background the form fields are good visible in document content. And another advantage is the possibility applying a document protection so that only filling the form fields will be possibly, even in Word' s GUI. This is really good for preserving such contractual documents from unwanted changings.
(Note 1): At least Word
prevents formatting parts of form field content different and so tearing apart the form field content into different runs. Other word-processing software (Writer
for example) may not respecting this restriction though.
So I would have the Word template like so:
The grey fields are the good old form Textfields in Word
, named Text1
, Text2
and Text3
. Textfields blocks look like:
<xml-fragment w:rsidR="00833656"
...
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
... >
<w:rPr>
<w:rFonts w:eastAsia="Times-Roman"/>
<w:color w:themeColor="text1" w:val="000000"/>
<w:lang w:val="en-US"/>
</w:rPr>
<w:fldChar w:fldCharType="begin">
<w:ffData>
<w:name w:val="Text1"/>
<w:enabled w:val="0"/>
<w:calcOnExit w:val="0"/>
<w:textInput>
<w:default w:val="<введите заказчика>"/>
</w:textInput>
</w:ffData>
</w:fldChar>
</xml-fragment>
</xml-fragment>
Then the following code:
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.SimpleValue;
import javax.xml.namespace.QName;
public class WordReplaceTextInFormFields {
private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
boolean foundformfield = false;
for (XWPFParagraph paragraph : document.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
XmlCursor cursor = run.getCTR().newCursor();
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");
while(cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
if ("begin".equals(((SimpleValue)obj).getStringValue())) {
cursor.toParent();
obj = cursor.getObject();
obj = obj.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];
if (ffname.equals(((SimpleValue)obj).getStringValue())) {
foundformfield = true;
} else {
foundformfield = false;
}
} else if ("end".equals(((SimpleValue)obj).getStringValue())) {
if (foundformfield) return;
foundformfield = false;
}
}
if (foundformfield && run.getCTR().getTList().size() > 0) {
run.getCTR().getTList().get(0).setStringValue(text);
//System.out.println(run.getCTR());
}
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));
replaceFormFieldText(document, "Text1", "Моя Компания");
replaceFormFieldText(document, "Text2", "Аксель Джоачимович Рихтер");
replaceFormFieldText(document, "Text3", "Доверенность");
FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");
document.write(out);
out.close();
document.close();
}
}
This code needs the full jar of all of the schemas ooxml-schemas-1.3.jar
as mentioned in FAQ-N10025.
Produces: