I am trying to verify digitally signed PDF document in Java.
I\'m using Apache PDFBox 2.0.6 to get the signature and the original PDF that was signed, then I\'m using B
You appear to have a misconception concerning the getSignedContent
method in particular and PDF signing in general.
I'm using Apache PDFBox 2.0.6 to get the signature and the original PDF that was signed
If by "the original PDF that was signed" you mean a PDF before it entered the signing process, then the second part of your task is impossible for generic signed PDFs.
The reason is that the original PDF before creation of the actual signature is prepared for the act of signing.
This preparation might mean as little as adding a value dictionary (including a gap for later injection of the signature container) for a pre-existing empty signature field as an incremental update leaving the original PDF an untouched starting piece of the resulting signed document.
On the other hand, though, it may additionally mean that a number of the following changes also occur:
If the document was not signed before, these additions need not be added as incremental updates, instead all the objects (changed or unchanged) may be re-ordered, renumbered, indirect object may become direct ones and vice versa, unused objects might be dropped, duplicate objects might be reduced to a single one, fonts of form fields made read-only may be reduced to the actually used glyphs, etc pp
Only for this prepared PDF the actual signature is created and embedded in the gap left in the signature value dictionary.
If you apply your calls
byte[] origPDF = doc.getSignatureDictionaries().get(0).getSignedContent(signedPDF);
byte[] signature = doc.getSignatureDictionaries().get(0).getContents(signedPDF);
to the signed document, origPDF
contains the bytes of the signed document except the gap in the signature value dictionary and signature
contains the (hex decoded) contents of the gap.
So origPDF
in particular contains all the changes done during the preparation; calling it orig
, therefore, is vehemently misleading.
Furthermore, as the gap originally reserved for the signature container is missing, it is very likely that these bytes actually don't form a valid PDF anymore: PDFs contain cross references which point to the starting offsets (from the start of the document) of each PDF object; as the gap is missing, the bytes after its former position have moved and offsets going there now are wrong.
Thus, your origPDF
merely contains the ensemble of signed bytes which may be very different from the file you consider the original one.
Your verifySig
completely ignores the SubFilter of the signature field value dictionary. Depending on that value, the signature bytes you retrieve using getContents
might have entirely different contents.
So without your signed PDF, further review of that method does not make sense.