PDFBox: How to “flatten” a PDF-form?

后端 未结 11 2168
难免孤独
难免孤独 2020-12-09 06:32

How do I \"flatten\" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?

Same question was answered here:

a quick

11条回答
  •  时光说笑
    2020-12-09 07:13

    In order to really "flatten" an acrobat form field there seems to be much more to do than at the first glance. After examining the PDF standard I managed to achieve real flatening in three steps:

    1. save field value
    2. remove widgets
    3. remove form field

    All three steps can be done with pdfbox (I used 1.8.5). Below I will sketch how I did it. A very helpful tool in order to understand whats going on is the PDF Debugger.

    Save the field

    This is the most complicated step of the three.

    In order to save the field's value you have to save its content to the pdf's content for each of the field's widgets. Easiest way to do so is drawing each widget's appearance to the widget's page.

    void saveFieldValue( PDField field ) throws IOException
    {
        PDDocument document = getDocument( field );
        // see PDField.getWidget()
        for( PDAnnotationWidget widget : getWidgets( field ) )
        {
            PDPage parentPage = getPage( widget );
    
            try (PDPageContentStream contentStream = new PDPageContentStream( document, parentPage, true, true ))
            {
                writeContent( contentStream, widget );
            }
        }
    }
    
    void writeContent( PDPageContentStream contentStream, PDAnnotationWidget widget )
            throws IOException
    {
        PDAppearanceStream appearanceStream = getAppearanceStream( widget );
        PDXObject xobject = new PDXObjectForm( appearanceStream.getStream() );
        AffineTransform transformation = getPositioningTransformation( widget.getRectangle() );
    
        contentStream.drawXObject( xobject, transformation );
    }
    

    The appearance is an XObject stream containing all of the widget's content (value, font, size, rotation, etc.). You simply need to place it at the right position on the page which you can extract from the widget's rectangle.

    Remove widgets

    As noted above each field may have multiple widgets. A widget takes care of how a form field can be edited, triggers, displaying when not editing and such stuff.

    In order to remove one you have to remove it from its page's annotations.

    void removeWidget( PDAnnotationWidget widget ) throws IOException
    {
        PDPage widgetPage = getPage( widget );
        List annotations = widgetPage.getAnnotations();
        PDAnnotation deleteCandidate = getMatchingCOSObjectable( annotations, widget );
        if( deleteCandidate != null && annotations.remove( deleteCandidate ) )
            widgetPage.setAnnotations( annotations );
    }
    

    Note that the annotations may not contain the exact PDAnnotationWidget since it's a kind of a wrapper. You have to remove the one with matching COSObject.

    Remove form field

    As final step you remove the form field itself. This is not very different to the other posts above.

    void removeFormfield( PDField field ) throws IOException
    {
        PDAcroForm acroForm = field.getAcroForm();
        List acroFields = acroForm.getFields();
        List removeCandidates = getFields( acroFields, field.getPartialName() );
        if( removeAll( acroFields, removeCandidates ) )
            acroForm.setFields( acroFields );
    }
    

    Note that I used a custom removeAll method here since the removeCandidates.removeAll() didn't work as expected for me.

    Sorry that I cannot provide all the code here but with the above you should be able to write it yourself.

提交回复
热议问题