Comparing a signed PDF to an unsigned PDF using document hash

前端未结

关注

 2  1748

不要未来只要你来

After extensive google searches, I\'m starting to wonder if I\'m missing the point of digital signatures in some way.

This is fundamentally what I believe I should be ab

相关标签:

2条回答

夕颜

2021-02-14 02:58
A strategy of verifying the integrity of a signed PDF:
1. Don't send out an unsigned PDF in the first place. Using iText (Java version for linux-friendly applications), sign and certify the document using CERTIFIED_FORM_FILLING.
2. Get the end-user to add their signature to a form field and send it back. This can be done because changes to the form won't break the document certification.
3. Validate both signatures and the document certification.
You should be able to figure out how to do all of this from the iText documentation: http://itextpdf.sourceforge.net/howtosign.html

All you would need to do to verify that a certified document is the same as an original would be compare the document metadata to the original. The title comes to mind as a potentially good candidate.

To get the title from a pdf to compare using iText you would just use this code:
```
PdfReader reader = new PdfReader("AsignedPDF.pdf");
string s = reader.Info["Title"];
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
南笙

2021-02-14 03:09
About this:

"This is because I not only want to verify that the signed PDF is authentic, but also that it's the same unsigned PDF I have on record"

Assuming you just want to know that a document you get on your server is authentic:

When creating a signed document, you have the choice of signing only one part of the file, or the entire document. You can then use a "whole document" signature, and if the document you get back on your server is "authentic" (which means that the verification of the signature succeeded), then it is for sure the same document you have on record.

It's worth mentioning that there are two types of PDF signatures, approval signatures and certification signatures. From the document Digital Signatures in PDF from Adobe:
(...) approval signatures, where someone signs a document to show consent, approval, or acceptance. A certified document is one that has a certification signature applied by the originator when the document is ready for use. The originator specifies what changes are allowed; choosing one of three levels of modification permitted:
- no changes
- form fill-in only
- form fill-in and commenting
Assuming you want to match certain signed document that you got on your server, with its unsigned equivalent on a database:

For document identification, I would suggest to deal with it separately. Once a document can be opened, a hash (md5 for example) can be created from the concatenation of the decompressed content of all its pages, and then compare it to another similar hash from the original document, (that can be generated once and stored in a database).

The reason I would do it this way is that it will be independent from the type of signature that was used on the document. Even when form fields are edited in a PDF file, or annotations are added, or new signatures are created, the page content is never modified, it will always remain the same.

If you are using iText, you can get a byte array of the page content by using the method PdfReader.getPageContent and use the result for computing a MD5 hash.

The code in Java might look like this:
```
PdfReader reader = new PdfReader("myfile.pdf");
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
int pageCount = reader.getNumberOfPages(); 
for(int i=1;i <= pageCount; i++)
{
     byte[] buf = reader.getPageContent(i);
     messageDigest.update(buf, 0, buf.length);
}
byte[] hash = messageDigest.digest();
```
Additionally, if the server receives a file that went out unsigned an came back signed, the signature may refer to just one part of the file and not all. In this scenario, the signature digests might not be enough to identify the file.

From the PDF specification (sections in bold on my account):

Signatures are created by computing a digest of the data (or part of the data) in a document, and storing the digest in the document.(...) There are two defined techniques for computing a reproducible digest of the contents of all or part of a PDF file:

-A byte range digest is computed over a range of bytes in the file, indicated by the the ByteRange entry in the signature dictionary. This range is typically the entire file, including the signature dictionary but excluding the signature value itself (the Contents entry).

-An object digest (PDF 1.5) is computed by selectively walking a subtree of objects in memory, beginning with the referenced object, which is typically the root object. The resulting digest, along with information about how it was computed, is placed in a signature reference dictionary (...).
0 讨论(0)
发布评论:

提交评论
- 加载中...