How do I \"flatten\" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?
Same question was answered here:
a quick
I thought I'd share our approach that worked with PDFBox 2+.
We've used the PDAcroForm.flatten()
method.
The fields needed some preprocessing and most importantly the nested field structure had to be traversed and DV and V checked for values.
Finally what worked was the following:
private static void flattenPDF(String src, String dst) throws IOException {
PDDocument doc = PDDocument.load(new File(src));
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
PDResources resources = new PDResources();
acroForm.setDefaultResources(resources);
List<PDField> fields = new ArrayList<>(acroForm.getFields());
processFields(fields, resources);
acroForm.flatten();
doc.save(dst);
doc.close();
}
private static void processFields(List<PDField> fields, PDResources resources) {
fields.stream().forEach(f -> {
f.setReadOnly(true);
COSDictionary cosObject = f.getCOSObject();
String value = cosObject.getString(COSName.DV) == null ?
cosObject.getString(COSName.V) : cosObject.getString(COSName.DV);
System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value);
try {
f.setValue(value);
} catch (IOException e) {
if (e.getMessage().matches("Could not find font: /.*")) {
String fontName = e.getMessage().replaceAll("^[^/]*/", "");
System.out.println("Adding fallback font for: " + fontName);
resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA);
try {
f.setValue(value);
} catch (IOException e1) {
e1.printStackTrace();
}
} else {
e.printStackTrace();
}
}
if (f instanceof PDNonTerminalField) {
processFields(((PDNonTerminalField) f).getChildren(), resources);
}
});
}
After reading about pdf reference guide, I have discovered that you can quite easily set read-only mode for AcroForm fields by adding "Ff" key (Field flags) with value 1. This is what documentation stands about that:
If set, the user may not change the value of the field. Any associated widget annotations will not interact with the user; that is, they will not respond to mouse clicks or change their appearance in response to mouse motions. This flag is useful for fields whose values are computed or imported from a database.
so the code could look like that (using pdfbox lib):
public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException {
PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> acroFormFields = form.getFields();
System.out.println(String.format("found %d acroFrom fields", acroFormFields.size()));
for(PDField field: acroFormFields) {
makeAcroFieldReadOnly(field);
}
}
private static void makeAcroFieldReadOnly(PDField field) {
field.getDictionary().setInt("Ff",1);
}
Solution to flattening acroform AND retaining the form field values using pdfBox:
The solution that worked for me with pdfbox 2.0.1:
File myFile = new File("myFile.pdf");
PDDocument pdDoc = PDDocument.load(myFile);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
// set the NeedAppearances flag to false
pdAcroForm.setNeedAppearances(false);
field.setValue("new-value");
pdAcroForm.flatten();
pdDoc.save("myFlattenedFile.pdf");
pdDoc.close();
I didn't need to do the 2 extra steps in the above solution link:
// correct the missing page link for the annotations
// Add the missing resources to the form
I created my pdf form in OpenOffice 4.1.1 and exported to pdf. The 2 items selected in the OpenOffice export dialogue were:
Using PdfBox I populated the form fields and created a flattened pdf file that removed the form fields but retained the form field values.
This is the code I came up with after synthesizing all of the answers I could find on the subject. This handles flattening text boxes, combos, lists, checkboxes, and radios:
public static void flattenPDF (PDDocument doc) throws IOException {
//
// find the fields and their kids (widgets) on the input document
// (each child widget represents an appearance of the field data on the page, there may be multiple appearances)
//
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> tmpfields = form.getFields();
PDResources formresources = form.getDefaultResources();
Map formfonts = formresources.getFonts();
PDAnnotation ann;
//
// for each input document page convert the field annotations on the page into
// content stream
//
List<PDPage> pages = catalog.getAllPages();
Iterator<PDPage> pageiterator = pages.iterator();
while (pageiterator.hasNext()) {
//
// get next page from input document
//
PDPage page = pageiterator.next();
//
// add the fonts from the input form to this pages resources
// so the field values will display in the proper font
//
PDResources pageResources = page.getResources();
Map pageFonts = pageResources.getFonts();
pageFonts.putAll(formfonts);
pageResources.setFonts(pageFonts);
//
// Create a content stream for the page for appending
//
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
//
// Find the appearance widgets for all fields on the input page and insert them into content stream of the page
//
for (PDField tmpfield : tmpfields) {
List widgets = tmpfield.getKids();
if(widgets == null) {
widgets = new ArrayList();
widgets.add(tmpfield.getWidget());
}
Iterator<COSObjectable> widgetiterator = widgets.iterator();
while (widgetiterator.hasNext()) {
COSObjectable next = widgetiterator.next();
if (next instanceof PDField) {
PDField foundfield = (PDField) next;
ann = foundfield.getWidget();
} else {
ann = (PDAnnotation) next;
}
if (ann.getPage().equals(page)) {
COSDictionary dict = ann.getDictionary();
if (dict != null) {
if(tmpfield instanceof PDVariableText || tmpfield instanceof PDPushButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if (ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSStream stream = (COSStream) ap.getDictionaryObject("N");
if (stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
contentStream.appendRawCommands("Q\n");
}
} else if (tmpfield instanceof PDChoiceButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if(ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSName cbValue = (COSName) dict.getDictionaryObject(COSName.AS);
COSDictionary d = (COSDictionary) ap.getDictionaryObject(COSName.D);
if (d != null) {
COSStream stream = (COSStream) d.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
if (!(tmpfield instanceof PDCheckbox)){
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
}
COSDictionary n = (COSDictionary) ap.getDictionaryObject(COSName.N);
if (n != null) {
COSStream stream = (COSStream) n.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
contentStream.appendRawCommands("Q\n");
}
}
}
}
}
}
// delete any field widget annotations and write it all to the page
// leave other annotations on the page
COSArrayList newanns = new COSArrayList();
List anns = page.getAnnotations();
ListIterator annotiterator = anns.listIterator();
while (annotiterator.hasNext()) {
COSObjectable next = (COSObjectable) annotiterator.next();
if (!(next instanceof PDAnnotationWidget)) {
newanns.add(next);
}
}
page.setAnnotations(newanns);
contentStream.close();
}
//
// Delete all fields from the form and their widgets (kids)
//
for (PDField tmpfield : tmpfields) {
List kids = tmpfield.getKids();
if(kids != null) kids.clear();
}
tmpfields.clear();
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = doc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
}
Full class here: https://gist.github.com/jribble/beddf7620536939f88db
This works for sure - I've ran into this problem, debugged all-night, but finally figured out how to do this :)
This is assuming that you have capability to edit the PDF in some way/have some control over the PDF.
First, edit the forms using Acrobat Pro. Make them hidden and read-only.
Then you need to use two libraries: PDFBox and PDFClown.
PDFBox removes the thing that tells Adobe Reader that it's a form; PDFClown removes the actual field. PDFClown must be done first, then PDFBox (in that order. The other way around doesn't work).
Single field example code:
// PDF Clown code
File file = new File("Some file path");
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");
PageStamper stamper = new PageStamper();
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());
// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path");
composer.setFont(font, 10);
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY();
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate));
// Actually delete the form field!
field.delete();
stamper.flush();
// Create new buffer to output to...
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard);
byte[] bytes = buffer.toByteArray();
// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
// Phew. Finally.
pdfDocument.save("Some file path");
Probably some typos here and there, but this should be enough to get the gist :)
This is the answer of Thomas, from the PDFBox-Mailinglist:
You will need to get the Fields over the COSDictionary. Try this code...
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray fields = acroFormDict.getDictionaryObject("Fields");
fields.clear();