问题
I want to manually verify the integrity of a signed pdf. I have been able to reach at:-
- got the value of '/Content' node from pdf(using
PyPDF2
). This is ader
encoded PKCS#7 certificate.
Now as per pdf specifications
, the message digest of the pdf data is stored along with the certificate in /Content
node. Tried a lot but I am not able to get the digest value which I would eventually compare with hashed pdf content(specified by /ByteRange
).
- PDF specification snapshot:-
Don't understand the last part that says write signature object data into the dictionary
. where does this write actually happens and how can I extract the message digest?
回答1:
(This is more a comment than an answer. Due to the size and formatting restrictions of comments, I put it into an answer nonetheless.)
A signature in a PDF
In a prior question the OP already inserted a sketch illustrating a signature embedded in a PDF in case of SubFilter ETSI.CAdES.detached, adbe.pkcs7.detached, or adbe.pkcs7.sha1:
But this is merely a sketch, and interpreting it too literally may leave the incorrect impression that the value of the Contents entry in the signature dictionary is something like a list containing a "Certificate", a "Signed message digest" and a "Timestamp". Furthermore calling this list the "Signature value" can also confuse as that name is also used for a small part of the content, see below.
The actual content is specified (cf. this document) as:
When PKCS#7 signatures are used, the value of Contents shall be a DER-encoded PKCS#7 binary data object containing the signature. The PKCS#7 object shall conform to RFC3852 Cryptographic Message Syntax.
(As an aside: While the specification here requires the data object to be DER-encoded, there are many signed PDFs in the wild which use some much less strict BER-encoding for the object as a whole and DER only for parts also required by RFC3852 to be DER-encoded.)
The PKCS#7 binary data object
The PKCS#7 binary data object containing the signature conforming to RFC3852 more exactly is a ContentInfo object with a SignedData content, often named a "signature container".
According to RFC 3852
The CMS associates a content type identifier with a content. The syntax MUST have ASN.1 type ContentInfo:
ContentInfo ::= SEQUENCE { contentType ContentType, content [0] EXPLICIT ANY DEFINED BY contentType }
The signed-data content type shall have ASN.1 type SignedData:
SignedData ::= SEQUENCE { version CMSVersion, digestAlgorithms DigestAlgorithmIdentifiers, encapContentInfo EncapsulatedContentInfo, certificates [0] IMPLICIT CertificateSet OPTIONAL, crls [1] IMPLICIT RevocationInfoChoices OPTIONAL, signerInfos SignerInfos }
Here you see the optional collection certificates
in which usually at least the signer certificate and often also its chain of issuer certificates are contained. Here is the "Certificate" from the sketch above.
You also see the signerInfos
structure which contains actual signing information:
SignerInfos ::= SET OF SignerInfo
Per-signer information is represented in the type SignerInfo:
SignerInfo ::= SEQUENCE { version CMSVersion, sid SignerIdentifier, digestAlgorithm DigestAlgorithmIdentifier, signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL, signatureAlgorithm SignatureAlgorithmIdentifier, signature SignatureValue, unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL } SignedAttributes ::= SET SIZE (1..MAX) OF Attribute Attribute ::= SEQUENCE { attrType OBJECT IDENTIFIER, attrValues SET OF AttributeValue }
(Here you see the structure the RFCs call the SignatureValue
... as already mentioned, the sketch above calling the whole signature container "Signature value" can confuse as down here already is an entity of a type called like that.)
You are after the message digest of the signed PDF byte ranges for a adbe.pkcs7.detached type PDF signature. There actually are two possibilities:
In the rare case of the most simple
SignerInfo
instances, there are noSignedAttributes
. In this case theSignatureValue
is the value of a signature algorithm immediately applied to the signed byte ranges.If the signature algorithm is based on RSA, you can retrieve the document digest value by decoding the value using the signer's public key (from his certificate) and extracting the digest from the decoded DigestInfo object.
DigestInfo ::= SEQUENCE { digestAlgorithm DigestAlgorithmIdentifier, digest Digest }
If the signature algorithm is based on DSA or EC DSA, you cannot retrieve the digest value at all. These algorithm only allow you to check whether a digest value you provide (e.g. having hashed the signed byte range of the document as you have retrieved it) is the originally signed one.
In the far more common case of
SignerInfo
instances withSignedAttributes
, you have to search theseSignedAttributes
for the message digest attribute which is identified byid-messageDigest OBJECT IDENTIFIER ::= { iso(1) member-body(2) us(840) rsadsi(113549) pkcs(1) pkcs9(9) 4 }
As already mentioned in comments, though, I cannot explain how to drill down here using Python or openssl. You will need some tool which knows these specific ASN.1 structures or ASN.1 structures in general.
来源:https://stackoverflow.com/questions/28408047/message-digest-of-pdf-in-digital-signature