PDFBox: Remove a single field from PDF

前端 未结 1 1203
后悔当初
后悔当初 2021-01-29 06:25

The simplest way I can describe the problem is that we to use PDFbox to remove only one field from a PDF that is sent to us from HelloSign. (e.g. Credit Card Number)

1条回答
  •  醉梦人生
    2021-01-29 07:22

    The code in this answer probably appears to be somewhat generic as it first determines a map of fields in the document and then allows to delete any combination of the text fields. Please be aware, though, that it has been developed with only the single example PDF from this question. Thus, I cannot be sure if I correctly understood the way fields are marked for/by HelloSign and in particular the way HelloSign fills these fields.

    This answer presents two classes, one which analyzes a HelloSign form and one which manipulates it by clearing selected fields; the latter one relies on the information gathered by the former. Both classes are built upon the PDFBox PDFTextStripper utility class.

    The code has been developed for the current PDFBox development version 2.1.0-SNAPSHOT. Most likely it works with all 2.0.x versions, too.

    HelloSignAnalyzer

    This class analyzes the given PDDocument looking for the sequences

    • [$varname ] which appear to define placeholders for placing form field contents, and
    • [def:$varname|type|req|signer|display|label] which appear to define properties of the placeholders.

    It creates a collection of HelloSignField instances each of which describes such a placeholder. They also contain the value of the respective field if text could be found located over the placeholder.

    Furthermore it stores the name of the last xobject drawn on the page which in case of the sample document is the place where HelloSign draws its field contents.

    public class HelloSignAnalyzer extends PDFTextStripper
    {
        public class HelloSignField
        {
            public String getName()
            { return name; }
            public String getValue()
            { return value; }
            public float getX()
            { return x; }
            public float getY()
            { return y; }
            public float getWidth()
            { return width; }
            public String getType()
            { return type; }
            public boolean isOptional()
            { return optional; }
            public String getSigner()
            { return signer; }
            public String getDisplay()
            { return display; }
            public String getLabel()
            { return label; }
            public float getLastX()
            { return lastX; }
    
            String name = null;
            String value = "";
            float x = 0, y = 0, width = 0;
            String type = null;
            boolean optional = false;
            String signer = null;
            String display = null;
            String label = null;
    
            float lastX = 0;
    
            @Override
            public String toString()
            {
                return String.format("[Name: '%s'; Value: `%s` Position: %s, %s; Width: %s; Type: '%s'; Optional: %s; Signer: '%s'; Display: '%s', Label: '%s']",
                        name, value, x, y, width, type, optional, signer, display, label);
            }
    
            void checkForValue(List textPositions)
            {
                for (TextPosition textPosition : textPositions)
                {
                    if (inField(textPosition))
                    {
                        float textX = textPosition.getTextMatrix().getTranslateX();
                        if (textX > lastX + textPosition.getWidthOfSpace() / 2 && value.length() > 0)
                            value += " ";
                        value += textPosition.getUnicode();
                        lastX = textX + textPosition.getWidth();
                    }
                }
            }
    
            boolean inField(TextPosition textPosition)
            {
                float yPos = textPosition.getTextMatrix().getTranslateY();
                float xPos = textPosition.getTextMatrix().getTranslateX();
    
                return inField(xPos, yPos);
            }
    
            boolean inField(float xPos, float yPos)
            {
                if (yPos < y - 3 || yPos > y + 3)
                    return false;
    
                if (xPos < x - 1 || xPos > x + width + 1)
                    return false;
    
                return true;
            }
        }
    
        public HelloSignAnalyzer(PDDocument pdDocument) throws IOException
        {
            super();
            this.pdDocument = pdDocument;
        }
    
        public Map analyze() throws IOException
        {
            if (!analyzed)
            {
                fields = new HashMap<>();
    
                setStartPage(pdDocument.getNumberOfPages());
                getText(pdDocument);
    
                analyzed = true;
            }
            return Collections.unmodifiableMap(fields);
        }
    
        public String getLastFormName()
        {
            return lastFormName;
        }
    
        //
        // PDFTextStripper overrides
        //
        @Override
        protected void writeString(String text, List textPositions) throws IOException
        {
            {
                for (HelloSignField field : fields.values())
                {
                    field.checkForValue(textPositions);
                }
            }
    
            int position = -1;
            while ((position = text.indexOf('[', position + 1)) >= 0)
            {
                int endPosition = text.indexOf(']', position);
                if (endPosition < 0)
                    continue;
                if (endPosition > position + 1 && text.charAt(position + 1) == '$')
                {
                    String fieldName = text.substring(position + 2, endPosition);
                    int spacePosition = fieldName.indexOf(' ');
                    if (spacePosition >= 0)
                        fieldName = fieldName.substring(0, spacePosition);
                    HelloSignField field = getOrCreateField(fieldName);
    
                    TextPosition start = textPositions.get(position);
                    field.x = start.getTextMatrix().getTranslateX();
                    field.y = start.getTextMatrix().getTranslateY();
                    TextPosition end = textPositions.get(endPosition);
                    field.width = end.getTextMatrix().getTranslateX() + end.getWidth() - field.x;
                }
                else if (endPosition > position + 5 && "def:$".equals(text.substring(position + 1, position + 6)))
                {
                    String definition = text.substring(position + 6, endPosition);
                    String[] pieces = definition.split("\\|");
                    if (pieces.length == 0)
                        continue;
                    HelloSignField field = getOrCreateField(pieces[0]);
    
                    if (pieces.length > 1)
                        field.type = pieces[1];
                    if (pieces.length > 2)
                        field.optional = !"req".equals(pieces[2]);
                    if (pieces.length > 3)
                        field.signer = pieces[3];
                    if (pieces.length > 4)
                        field.display = pieces[4];
                    if (pieces.length > 5)
                        field.label = pieces[5];
                }
            }
    
            super.writeString(text, textPositions);
        }
    
        @Override
        protected void processOperator(Operator operator, List operands) throws IOException
        {
            String currentFormName = formName; 
            if (operator != null && "Do".equals(operator.getName()) && operands != null && operands.size() > 0)
            {
                COSBase base0 = operands.get(0);
                if (base0 instanceof COSName)
                {
                    formName = ((COSName)base0).getName();
                    if (currentFormName == null)
                        lastFormName = formName;
                }
            }
            try
            {
                super.processOperator(operator, operands);
            }
            finally
            {
                formName = currentFormName;
            }
        }
    
        //
        // helper methods
        //
        HelloSignField getOrCreateField(String name)
        {
            HelloSignField field = fields.get(name);
            if (field == null)
            {
                field = new HelloSignField();
                field.name = name;
                fields.put(name, field);
            }
            return field;
        }
    
        //
        // inner member variables
        //
        final PDDocument pdDocument;
        boolean analyzed = false;
        Map fields = null;
        String formName = null;
        String lastFormName = null;
    }
    

    (HelloSignAnalyzer.java)

    Usage

    One can apply the HelloSignAnalyzer to a document as follows:

    PDDocument pdDocument = PDDocument.load(...);
    
    HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
    
    Map fields = helloSignAnalyzer.analyze();
    
    System.out.printf("Found %s fields:\n\n", fields.size());
    
    for (Map.Entry entry : fields.entrySet())
    {
        System.out.printf("%s -> %s\n", entry.getKey(), entry.getValue());
    }
    
    System.out.printf("\nLast form name: %s\n", helloSignAnalyzer.getLastFormName());
    

    (PlayWithHelloSign.java test method testAnalyzeInput)

    In case of the OP's sample document the output is

    Found 8 fields:
    
    var1001 -> [Name: 'var1001'; Value: `123 Main St.` Position: 90.0, 580.0; Width: 165.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'Address', Label: 'address1']
    var1004 -> [Name: 'var1004'; Value: `12345` Position: 210.0, 564.0; Width: 45.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'Postal Code', Label: 'zip']
    var1002 -> [Name: 'var1002'; Value: `TestCity` Position: 90.0, 564.0; Width: 65.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'City', Label: 'city']
    var1003 -> [Name: 'var1003'; Value: `AA` Position: 161.0, 564.0; Width: 45.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'State', Label: 'state']
    date2 -> [Name: 'date2'; Value: `2016/12/09` Position: 397.0, 407.0; Width: 124.63202; Type: 'date'; Optional: false; Signer: 'signer2'; Display: 'null', Label: 'null']
    signature1 -> [Name: 'signature1'; Value: `` Position: 88.0, 489.0; Width: 236.624; Type: 'sig'; Optional: false; Signer: 'signer1'; Display: 'null', Label: 'null']
    date1 -> [Name: 'date1'; Value: `2016/12/09` Position: 397.0, 489.0; Width: 124.63202; Type: 'date'; Optional: false; Signer: 'signer1'; Display: 'null', Label: 'null']
    signature2 -> [Name: 'signature2'; Value: `` Position: 88.0, 407.0; Width: 236.624; Type: 'sig'; Optional: false; Signer: 'signer2'; Display: 'null', Label: 'null']
    
    Last form name: Xi0
    

    HelloSignManipulator

    This class makes use of the information a HelloSignAnalyzer has gathered to clear the contents of text fields given by their name.

    public class HelloSignManipulator extends PDFTextStripper
    {
        public HelloSignManipulator(HelloSignAnalyzer helloSignAnalyzer) throws IOException
        {
            super();
            this.helloSignAnalyzer = helloSignAnalyzer;
            addOperator(new SelectiveDrawObject());
        }
    
        public void clearFields(Iterable fieldNames) throws IOException
        {
            try
            {
                Map fieldMap = helloSignAnalyzer.analyze();
                List selectedFields = new ArrayList<>();
                for (String fieldName : fieldNames)
                {
                    selectedFields.add(fieldMap.get(fieldName));
                }
                fields = selectedFields;
    
                PDDocument pdDocument = helloSignAnalyzer.pdDocument;
                setStartPage(pdDocument.getNumberOfPages());
                getText(pdDocument);
            }
            finally
            {
                fields = null;
            }
        }
    
        class SelectiveDrawObject extends OperatorProcessor
        {
            @Override
            public void process(Operator operator, List arguments) throws IOException
            {
                if (arguments.size() < 1)
                {
                    throw new MissingOperandException(operator, arguments);
                }
                COSBase base0 = arguments.get(0);
                if (!(base0 instanceof COSName))
                {
                    return;
                }
                COSName name = (COSName) base0;
    
                if (replacement != null || !helloSignAnalyzer.getLastFormName().equals(name.getName()))
                {
                    return;
                }
    
                if (context.getResources().isImageXObject(name))
                {
                    throw new IllegalArgumentException("The form xobject to edit turned out to be an image.");
                }
    
                PDXObject xobject = context.getResources().getXObject(name);
    
                if (xobject instanceof PDTransparencyGroup)
                {
                    throw new IllegalArgumentException("The form xobject to edit turned out to be a transparency group.");
                }
                else if (xobject instanceof PDFormXObject)
                {
                    PDFormXObject form = (PDFormXObject) xobject;
                    PDFormXObject formReplacement = new PDFormXObject(helloSignAnalyzer.pdDocument);
                    formReplacement.setBBox(form.getBBox());
                    formReplacement.setFormType(form.getFormType());
                    formReplacement.setMatrix(form.getMatrix().createAffineTransform());
                    formReplacement.setResources(form.getResources());
                    OutputStream outputStream = formReplacement.getContentStream().createOutputStream(COSName.FLATE_DECODE);
                    replacement = new ContentStreamWriter(outputStream);
    
                    context.showForm(form);
    
                    outputStream.close();
                    getResources().put(name, formReplacement);
                    replacement = null;
                }
            }
    
            @Override
            public String getName()
            {
                return "Do";
            }
        }
    
        //
        // PDFTextStripper overrides
        //
        @Override
        protected void processOperator(Operator operator, List operands) throws IOException
        {
            if (replacement != null)
            {
                boolean copy = true;
                if (TjTJ.contains(operator.getName()))
                {
                    Matrix transformation = getTextMatrix().multiply(getGraphicsState().getCurrentTransformationMatrix());
                    float xPos = transformation.getTranslateX();
                    float yPos = transformation.getTranslateY();
                    for (HelloSignField field : fields)
                    {
                        if (field.inField(xPos, yPos))
                        {
                            copy = false;
                        }
                    }
                }
    
                if (copy)
                {
                    replacement.writeTokens(operands);
                    replacement.writeToken(operator);
                }
            }
            super.processOperator(operator, operands);
        }
    
        //
        // helper methods
        //
        final HelloSignAnalyzer helloSignAnalyzer;
        final Collection TjTJ = Arrays.asList("Tj", "TJ");
        Iterable fields;
        ContentStreamWriter replacement = null;
    }
    

    (HelloSignManipulator.java)

    Usage: Clear single field

    One can apply the HelloSignManipulator to a document as follows to clear a single field:

    PDDocument pdDocument = PDDocument.load(...);
    
    HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
    
    HelloSignManipulator helloSignManipulator = new HelloSignManipulator(helloSignAnalyzer);
    
    helloSignManipulator.clearFields(Collections.singleton("var1001"));
    
    pdDocument.save(...);
    

    (PlayWithHelloSign.java test method testClearAddress1Input)

    Usage: Clear multiple fields at once

    One can apply the HelloSignManipulator to a document as follows to clear multiple fields at once:

    PDDocument pdDocument = PDDocument.load(...);
    
    HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
    
    HelloSignManipulator helloSignManipulator = new HelloSignManipulator(helloSignAnalyzer);
    
    helloSignManipulator.clearFields(Arrays.asList("var1004", "var1003", "date2"));
    
    pdDocument.save(...);
    

    (PlayWithHelloSign.java test method testClearZipStateDate2Input)

    Usage: Clear multiple fields successively

    One can apply the HelloSignManipulator to a document as follows to clear multiple fields successively:

    PDDocument pdDocument = PDDocument.load(...);
    
    HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
    
    HelloSignManipulator helloSignManipulator = new HelloSignManipulator(helloSignAnalyzer);
    
    helloSignManipulator.clearFields(Collections.singleton("var1004"));
    helloSignManipulator.clearFields(Collections.singleton("var1003"));
    helloSignManipulator.clearFields(Collections.singleton("date2"));
    
    pdDocument.save(...);
    

    (PlayWithHelloSign.java test method testClearZipStateDate2SuccessivelyInput)

    Caveat

    These classes are mere proofs-of-concept. On one hand they are built based on a single example HelloSign file, so there is a huge chance of having missed important details. On the other hand there are some built-in assumptions, e.g. in the HelloSignField method inField.

    Furthermore, manipulating signed HelloSign files in general might be a bad idea. If I understood their concept correctly, they store a hash of each signed document to allow verification of the content, and if the document is manipulated as shown above, the hash value won't match anymore.

    0 讨论(0)
提交回复
热议问题