The simplest way I can describe the problem is that we to use PDFbox to remove only one field from a PDF that is sent to us from HelloSign. (e.g. Credit Card Number)
The code in this answer probably appears to be somewhat generic as it first determines a map of fields in the document and then allows to delete any combination of the text fields. Please be aware, though, that it has been developed with only the single example PDF from this question. Thus, I cannot be sure if I correctly understood the way fields are marked for/by HelloSign and in particular the way HelloSign fills these fields.
This answer presents two classes, one which analyzes a HelloSign form and one which manipulates it by clearing selected fields; the latter one relies on the information gathered by the former. Both classes are built upon the PDFBox PDFTextStripper
utility class.
The code has been developed for the current PDFBox development version 2.1.0-SNAPSHOT. Most likely it works with all 2.0.x versions, too.
This class analyzes the given PDDocument
looking for the sequences
[$varname ]
which appear to define placeholders for placing form field contents, and[def:$varname|type|req|signer|display|label]
which appear to define properties of the placeholders.It creates a collection of HelloSignField
instances each of which describes such a placeholder. They also contain the value of the respective field if text could be found located over the placeholder.
Furthermore it stores the name of the last xobject drawn on the page which in case of the sample document is the place where HelloSign draws its field contents.
public class HelloSignAnalyzer extends PDFTextStripper
{
public class HelloSignField
{
public String getName()
{ return name; }
public String getValue()
{ return value; }
public float getX()
{ return x; }
public float getY()
{ return y; }
public float getWidth()
{ return width; }
public String getType()
{ return type; }
public boolean isOptional()
{ return optional; }
public String getSigner()
{ return signer; }
public String getDisplay()
{ return display; }
public String getLabel()
{ return label; }
public float getLastX()
{ return lastX; }
String name = null;
String value = "";
float x = 0, y = 0, width = 0;
String type = null;
boolean optional = false;
String signer = null;
String display = null;
String label = null;
float lastX = 0;
@Override
public String toString()
{
return String.format("[Name: '%s'; Value: `%s` Position: %s, %s; Width: %s; Type: '%s'; Optional: %s; Signer: '%s'; Display: '%s', Label: '%s']",
name, value, x, y, width, type, optional, signer, display, label);
}
void checkForValue(List textPositions)
{
for (TextPosition textPosition : textPositions)
{
if (inField(textPosition))
{
float textX = textPosition.getTextMatrix().getTranslateX();
if (textX > lastX + textPosition.getWidthOfSpace() / 2 && value.length() > 0)
value += " ";
value += textPosition.getUnicode();
lastX = textX + textPosition.getWidth();
}
}
}
boolean inField(TextPosition textPosition)
{
float yPos = textPosition.getTextMatrix().getTranslateY();
float xPos = textPosition.getTextMatrix().getTranslateX();
return inField(xPos, yPos);
}
boolean inField(float xPos, float yPos)
{
if (yPos < y - 3 || yPos > y + 3)
return false;
if (xPos < x - 1 || xPos > x + width + 1)
return false;
return true;
}
}
public HelloSignAnalyzer(PDDocument pdDocument) throws IOException
{
super();
this.pdDocument = pdDocument;
}
public Map analyze() throws IOException
{
if (!analyzed)
{
fields = new HashMap<>();
setStartPage(pdDocument.getNumberOfPages());
getText(pdDocument);
analyzed = true;
}
return Collections.unmodifiableMap(fields);
}
public String getLastFormName()
{
return lastFormName;
}
//
// PDFTextStripper overrides
//
@Override
protected void writeString(String text, List textPositions) throws IOException
{
{
for (HelloSignField field : fields.values())
{
field.checkForValue(textPositions);
}
}
int position = -1;
while ((position = text.indexOf('[', position + 1)) >= 0)
{
int endPosition = text.indexOf(']', position);
if (endPosition < 0)
continue;
if (endPosition > position + 1 && text.charAt(position + 1) == '$')
{
String fieldName = text.substring(position + 2, endPosition);
int spacePosition = fieldName.indexOf(' ');
if (spacePosition >= 0)
fieldName = fieldName.substring(0, spacePosition);
HelloSignField field = getOrCreateField(fieldName);
TextPosition start = textPositions.get(position);
field.x = start.getTextMatrix().getTranslateX();
field.y = start.getTextMatrix().getTranslateY();
TextPosition end = textPositions.get(endPosition);
field.width = end.getTextMatrix().getTranslateX() + end.getWidth() - field.x;
}
else if (endPosition > position + 5 && "def:$".equals(text.substring(position + 1, position + 6)))
{
String definition = text.substring(position + 6, endPosition);
String[] pieces = definition.split("\\|");
if (pieces.length == 0)
continue;
HelloSignField field = getOrCreateField(pieces[0]);
if (pieces.length > 1)
field.type = pieces[1];
if (pieces.length > 2)
field.optional = !"req".equals(pieces[2]);
if (pieces.length > 3)
field.signer = pieces[3];
if (pieces.length > 4)
field.display = pieces[4];
if (pieces.length > 5)
field.label = pieces[5];
}
}
super.writeString(text, textPositions);
}
@Override
protected void processOperator(Operator operator, List operands) throws IOException
{
String currentFormName = formName;
if (operator != null && "Do".equals(operator.getName()) && operands != null && operands.size() > 0)
{
COSBase base0 = operands.get(0);
if (base0 instanceof COSName)
{
formName = ((COSName)base0).getName();
if (currentFormName == null)
lastFormName = formName;
}
}
try
{
super.processOperator(operator, operands);
}
finally
{
formName = currentFormName;
}
}
//
// helper methods
//
HelloSignField getOrCreateField(String name)
{
HelloSignField field = fields.get(name);
if (field == null)
{
field = new HelloSignField();
field.name = name;
fields.put(name, field);
}
return field;
}
//
// inner member variables
//
final PDDocument pdDocument;
boolean analyzed = false;
Map fields = null;
String formName = null;
String lastFormName = null;
}
(HelloSignAnalyzer.java)
One can apply the HelloSignAnalyzer
to a document as follows:
PDDocument pdDocument = PDDocument.load(...);
HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
Map fields = helloSignAnalyzer.analyze();
System.out.printf("Found %s fields:\n\n", fields.size());
for (Map.Entry entry : fields.entrySet())
{
System.out.printf("%s -> %s\n", entry.getKey(), entry.getValue());
}
System.out.printf("\nLast form name: %s\n", helloSignAnalyzer.getLastFormName());
(PlayWithHelloSign.java test method testAnalyzeInput
)
In case of the OP's sample document the output is
Found 8 fields: var1001 -> [Name: 'var1001'; Value: `123 Main St.` Position: 90.0, 580.0; Width: 165.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'Address', Label: 'address1'] var1004 -> [Name: 'var1004'; Value: `12345` Position: 210.0, 564.0; Width: 45.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'Postal Code', Label: 'zip'] var1002 -> [Name: 'var1002'; Value: `TestCity` Position: 90.0, 564.0; Width: 65.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'City', Label: 'city'] var1003 -> [Name: 'var1003'; Value: `AA` Position: 161.0, 564.0; Width: 45.53601; Type: 'text'; Optional: false; Signer: 'signer1'; Display: 'State', Label: 'state'] date2 -> [Name: 'date2'; Value: `2016/12/09` Position: 397.0, 407.0; Width: 124.63202; Type: 'date'; Optional: false; Signer: 'signer2'; Display: 'null', Label: 'null'] signature1 -> [Name: 'signature1'; Value: `` Position: 88.0, 489.0; Width: 236.624; Type: 'sig'; Optional: false; Signer: 'signer1'; Display: 'null', Label: 'null'] date1 -> [Name: 'date1'; Value: `2016/12/09` Position: 397.0, 489.0; Width: 124.63202; Type: 'date'; Optional: false; Signer: 'signer1'; Display: 'null', Label: 'null'] signature2 -> [Name: 'signature2'; Value: `` Position: 88.0, 407.0; Width: 236.624; Type: 'sig'; Optional: false; Signer: 'signer2'; Display: 'null', Label: 'null'] Last form name: Xi0
This class makes use of the information a HelloSignAnalyzer
has gathered to clear the contents of text fields given by their name.
public class HelloSignManipulator extends PDFTextStripper
{
public HelloSignManipulator(HelloSignAnalyzer helloSignAnalyzer) throws IOException
{
super();
this.helloSignAnalyzer = helloSignAnalyzer;
addOperator(new SelectiveDrawObject());
}
public void clearFields(Iterable fieldNames) throws IOException
{
try
{
Map fieldMap = helloSignAnalyzer.analyze();
List selectedFields = new ArrayList<>();
for (String fieldName : fieldNames)
{
selectedFields.add(fieldMap.get(fieldName));
}
fields = selectedFields;
PDDocument pdDocument = helloSignAnalyzer.pdDocument;
setStartPage(pdDocument.getNumberOfPages());
getText(pdDocument);
}
finally
{
fields = null;
}
}
class SelectiveDrawObject extends OperatorProcessor
{
@Override
public void process(Operator operator, List arguments) throws IOException
{
if (arguments.size() < 1)
{
throw new MissingOperandException(operator, arguments);
}
COSBase base0 = arguments.get(0);
if (!(base0 instanceof COSName))
{
return;
}
COSName name = (COSName) base0;
if (replacement != null || !helloSignAnalyzer.getLastFormName().equals(name.getName()))
{
return;
}
if (context.getResources().isImageXObject(name))
{
throw new IllegalArgumentException("The form xobject to edit turned out to be an image.");
}
PDXObject xobject = context.getResources().getXObject(name);
if (xobject instanceof PDTransparencyGroup)
{
throw new IllegalArgumentException("The form xobject to edit turned out to be a transparency group.");
}
else if (xobject instanceof PDFormXObject)
{
PDFormXObject form = (PDFormXObject) xobject;
PDFormXObject formReplacement = new PDFormXObject(helloSignAnalyzer.pdDocument);
formReplacement.setBBox(form.getBBox());
formReplacement.setFormType(form.getFormType());
formReplacement.setMatrix(form.getMatrix().createAffineTransform());
formReplacement.setResources(form.getResources());
OutputStream outputStream = formReplacement.getContentStream().createOutputStream(COSName.FLATE_DECODE);
replacement = new ContentStreamWriter(outputStream);
context.showForm(form);
outputStream.close();
getResources().put(name, formReplacement);
replacement = null;
}
}
@Override
public String getName()
{
return "Do";
}
}
//
// PDFTextStripper overrides
//
@Override
protected void processOperator(Operator operator, List operands) throws IOException
{
if (replacement != null)
{
boolean copy = true;
if (TjTJ.contains(operator.getName()))
{
Matrix transformation = getTextMatrix().multiply(getGraphicsState().getCurrentTransformationMatrix());
float xPos = transformation.getTranslateX();
float yPos = transformation.getTranslateY();
for (HelloSignField field : fields)
{
if (field.inField(xPos, yPos))
{
copy = false;
}
}
}
if (copy)
{
replacement.writeTokens(operands);
replacement.writeToken(operator);
}
}
super.processOperator(operator, operands);
}
//
// helper methods
//
final HelloSignAnalyzer helloSignAnalyzer;
final Collection TjTJ = Arrays.asList("Tj", "TJ");
Iterable fields;
ContentStreamWriter replacement = null;
}
(HelloSignManipulator.java)
One can apply the HelloSignManipulator
to a document as follows to clear a single field:
PDDocument pdDocument = PDDocument.load(...);
HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
HelloSignManipulator helloSignManipulator = new HelloSignManipulator(helloSignAnalyzer);
helloSignManipulator.clearFields(Collections.singleton("var1001"));
pdDocument.save(...);
(PlayWithHelloSign.java test method testClearAddress1Input
)
One can apply the HelloSignManipulator
to a document as follows to clear multiple fields at once:
PDDocument pdDocument = PDDocument.load(...);
HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
HelloSignManipulator helloSignManipulator = new HelloSignManipulator(helloSignAnalyzer);
helloSignManipulator.clearFields(Arrays.asList("var1004", "var1003", "date2"));
pdDocument.save(...);
(PlayWithHelloSign.java test method testClearZipStateDate2Input
)
One can apply the HelloSignManipulator
to a document as follows to clear multiple fields successively:
PDDocument pdDocument = PDDocument.load(...);
HelloSignAnalyzer helloSignAnalyzer = new HelloSignAnalyzer(pdDocument);
HelloSignManipulator helloSignManipulator = new HelloSignManipulator(helloSignAnalyzer);
helloSignManipulator.clearFields(Collections.singleton("var1004"));
helloSignManipulator.clearFields(Collections.singleton("var1003"));
helloSignManipulator.clearFields(Collections.singleton("date2"));
pdDocument.save(...);
(PlayWithHelloSign.java test method testClearZipStateDate2SuccessivelyInput
)
These classes are mere proofs-of-concept. On one hand they are built based on a single example HelloSign file, so there is a huge chance of having missed important details. On the other hand there are some built-in assumptions, e.g. in the HelloSignField
method inField
.
Furthermore, manipulating signed HelloSign files in general might be a bad idea. If I understood their concept correctly, they store a hash of each signed document to allow verification of the content, and if the document is manipulated as shown above, the hash value won't match anymore.