Comparing a signed PDF to an unsigned PDF using document hash

前端 未结 2 1738

After extensive google searches, I\'m starting to wonder if I\'m missing the point of digital signatures in some way.

This is fundamentally what I believe I should be ab

2条回答
  •  南笙
    南笙 (楼主)
    2021-02-14 03:09

    About this:

    "This is because I not only want to verify that the signed PDF is authentic, but also that it's the same unsigned PDF I have on record"

    Assuming you just want to know that a document you get on your server is authentic:

    When creating a signed document, you have the choice of signing only one part of the file, or the entire document. You can then use a "whole document" signature, and if the document you get back on your server is "authentic" (which means that the verification of the signature succeeded), then it is for sure the same document you have on record.

    It's worth mentioning that there are two types of PDF signatures, approval signatures and certification signatures. From the document Digital Signatures in PDF from Adobe:

    (...) approval signatures, where someone signs a document to show consent, approval, or acceptance. A certified document is one that has a certification signature applied by the originator when the document is ready for use. The originator specifies what changes are allowed; choosing one of three levels of modification permitted:

    • no changes
    • form fill-in only
    • form fill-in and commenting

    Assuming you want to match certain signed document that you got on your server, with its unsigned equivalent on a database:

    For document identification, I would suggest to deal with it separately. Once a document can be opened, a hash (md5 for example) can be created from the concatenation of the decompressed content of all its pages, and then compare it to another similar hash from the original document, (that can be generated once and stored in a database).

    The reason I would do it this way is that it will be independent from the type of signature that was used on the document. Even when form fields are edited in a PDF file, or annotations are added, or new signatures are created, the page content is never modified, it will always remain the same.

    If you are using iText, you can get a byte array of the page content by using the method PdfReader.getPageContent and use the result for computing a MD5 hash.

    The code in Java might look like this:

    PdfReader reader = new PdfReader("myfile.pdf");
    MessageDigest messageDigest = MessageDigest.getInstance("MD5");
    int pageCount = reader.getNumberOfPages(); 
    for(int i=1;i <= pageCount; i++)
    {
         byte[] buf = reader.getPageContent(i);
         messageDigest.update(buf, 0, buf.length);
    }
    byte[] hash = messageDigest.digest();
    

    Additionally, if the server receives a file that went out unsigned an came back signed, the signature may refer to just one part of the file and not all. In this scenario, the signature digests might not be enough to identify the file.

    From the PDF specification (sections in bold on my account):

    Signatures are created by computing a digest of the data (or part of the data) in a document, and storing the digest in the document.(...) There are two defined techniques for computing a reproducible digest of the contents of all or part of a PDF file:

    -A byte range digest is computed over a range of bytes in the file, indicated by the the ByteRange entry in the signature dictionary. This range is typically the entire file, including the signature dictionary but excluding the signature value itself (the Contents entry).

    -An object digest (PDF 1.5) is computed by selectively walking a subtree of objects in memory, beginning with the referenced object, which is typically the root object. The resulting digest, along with information about how it was computed, is placed in a signature reference dictionary (...).

提交回复
热议问题