Message digest of pdf in digital signature

╄→尐↘猪︶ㄣ 提交于 2019-12-17 20:17:41

问题


I want to manually verify the integrity of a signed pdf. I have been able to reach at:-

  • got the value of '/Content' node from pdf(using PyPDF2). This is a der encoded PKCS#7 certificate.

Now as per pdf specifications, the message digest of the pdf data is stored along with the certificate in /Content node. Tried a lot but I am not able to get the digest value which I would eventually compare with hashed pdf content(specified by /ByteRange).

  • PDF specification snapshot:-

Don't understand the last part that says write signature object data into the dictionary. where does this write actually happens and how can I extract the message digest?


回答1:


(This is more a comment than an answer. Due to the size and formatting restrictions of comments, I put it into an answer nonetheless.)

A signature in a PDF

In a prior question the OP already inserted a sketch illustrating a signature embedded in a PDF in case of SubFilter ETSI.CAdES.detached, adbe.pkcs7.detached, or adbe.pkcs7.sha1:

But this is merely a sketch, and interpreting it too literally may leave the incorrect impression that the value of the Contents entry in the signature dictionary is something like a list containing a "Certificate", a "Signed message digest" and a "Timestamp". Furthermore calling this list the "Signature value" can also confuse as that name is also used for a small part of the content, see below.

The actual content is specified (cf. this document) as:

When PKCS#7 signatures are used, the value of Contents shall be a DER-encoded PKCS#7 binary data object containing the signature. The PKCS#7 object shall conform to RFC3852 Cryptographic Message Syntax.

(As an aside: While the specification here requires the data object to be DER-encoded, there are many signed PDFs in the wild which use some much less strict BER-encoding for the object as a whole and DER only for parts also required by RFC3852 to be DER-encoded.)

The PKCS#7 binary data object

The PKCS#7 binary data object containing the signature conforming to RFC3852 more exactly is a ContentInfo object with a SignedData content, often named a "signature container".

According to RFC 3852

The CMS associates a content type identifier with a content. The syntax MUST have ASN.1 type ContentInfo:

  ContentInfo ::= SEQUENCE {
    contentType ContentType,
    content [0] EXPLICIT ANY DEFINED BY contentType }

The signed-data content type shall have ASN.1 type SignedData:

  SignedData ::= SEQUENCE {
    version CMSVersion,
    digestAlgorithms DigestAlgorithmIdentifiers,
    encapContentInfo EncapsulatedContentInfo,
    certificates [0] IMPLICIT CertificateSet OPTIONAL,
    crls [1] IMPLICIT RevocationInfoChoices OPTIONAL,
    signerInfos SignerInfos }

Here you see the optional collection certificates in which usually at least the signer certificate and often also its chain of issuer certificates are contained. Here is the "Certificate" from the sketch above.

You also see the signerInfos structure which contains actual signing information:

  SignerInfos ::= SET OF SignerInfo

Per-signer information is represented in the type SignerInfo:

  SignerInfo ::= SEQUENCE {
    version CMSVersion,
    sid SignerIdentifier,
    digestAlgorithm DigestAlgorithmIdentifier,
    signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
    signatureAlgorithm SignatureAlgorithmIdentifier,
    signature SignatureValue,
    unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }

  SignedAttributes ::= SET SIZE (1..MAX) OF Attribute

  Attribute ::= SEQUENCE {
    attrType OBJECT IDENTIFIER,
    attrValues SET OF AttributeValue }

(Here you see the structure the RFCs call the SignatureValue... as already mentioned, the sketch above calling the whole signature container "Signature value" can confuse as down here already is an entity of a type called like that.)

You are after the message digest of the signed PDF byte ranges for a adbe.pkcs7.detached type PDF signature. There actually are two possibilities:

  • In the rare case of the most simple SignerInfo instances, there are no SignedAttributes. In this case the SignatureValue is the value of a signature algorithm immediately applied to the signed byte ranges.

    If the signature algorithm is based on RSA, you can retrieve the document digest value by decoding the value using the signer's public key (from his certificate) and extracting the digest from the decoded DigestInfo object.

    DigestInfo ::= SEQUENCE {
      digestAlgorithm DigestAlgorithmIdentifier,
      digest Digest }
    

    If the signature algorithm is based on DSA or EC DSA, you cannot retrieve the digest value at all. These algorithm only allow you to check whether a digest value you provide (e.g. having hashed the signed byte range of the document as you have retrieved it) is the originally signed one.

  • In the far more common case of SignerInfo instances with SignedAttributes, you have to search these SignedAttributes for the message digest attribute which is identified by

     id-messageDigest OBJECT IDENTIFIER ::= { iso(1) member-body(2)
        us(840) rsadsi(113549) pkcs(1) pkcs9(9) 4 }
    

As already mentioned in comments, though, I cannot explain how to drill down here using Python or openssl. You will need some tool which knows these specific ASN.1 structures or ASN.1 structures in general.



来源:https://stackoverflow.com/questions/28408047/message-digest-of-pdf-in-digital-signature

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!