I\'m decoding ASN1 (as used in X.509 for HTTPS certificates). I\'m doing pretty well, but there is a thing that I just cannot find and understandable documentation for.
It sounds like you need an introduction to ASN.1 tagging. There are two angles to approach this from. X.690 defines BER/CER/DER encoding rules. As such, it answers the question of how tags are encoded. X.680 defines ASN.1 itself. As such, it defines the syntax and rules for tagging. Both specifications can be found on the ITU-T website. I'll give you a quick overview.
Tags are used in BER/DER/CER to identify types. They are especially useful for distinguishing the components of a SEQUENCE and the alternatives of a CHOICE.
A tag combines a tag class and a tag number. The tag classes are UNIVERSAL, APPLICATION, PRIVATE, and CONTEXT-SPECIFIC. The UNIVERSAL class is basically used for the built-in types. APPLICATION is typically used for user-defined types. CONTEXT-SPECIFIC is typically used for the components inside constructed types (SEQUENCE, CHOICE, SEQUENCE OF). Syntactically, when tags are specified in an ASN.1 module, they are written inside brackets: [ tag_class tag_number ]; for CONTEXT-SPECIFIC, the tag_class is omitted. Thus, [APPLICATION 10] or [0].
While every ASN.1 type has an associated tag, syntactically, there is also the "TaggedType", which is used by an ASN.1 author to specify the tag to encode a type with. Basically, a TaggedType puts a tag prefix ahead of a type. For example:
MyType ::= SEQUENCE {
field_with_tagged_type [0] UTF8String
}
The tag in a TaggedType is either explicit or implicit. If explicit, this means that I want the original tag to be explicitly encoded. If implicit, this means I am happy to have only the tag that I specified be encoded. In the explicit case, the BER encoding results in a nested TLV (tag-length-value): the outer tag ([0] in the example above), the length, and then another TLV as the value. In the example, this inner TLV would have a tag of [UNIVERSAL 12] for the UTF8String.
Whether the tag is explicit or implicit depends upon how you write the tag and the tagging environment. For example:
MyType2 ::= SEQUENCE {
field_with_explicit_tag [0] EXPLICIT UTF8String OPTIONAL,
field_with_implicit_tag [1] IMPLICIT UTF8String OPTIONAL,
field_with_tag [2] UTF8String OPTIONAL
}
If you specify neither IMPLICIT nor EXPLICIT, there are some rules that define whether the tag is explicit or implicit (see X.680 31). These rules take into consideration the tagging environment defined for the ASN.1 module. The ASN.1 module may specify the tagging environment as IMPLICIT TAGS, EXPLICIT TAGS, or AUTOMATIC TAGS. Roughly speaking, if you don't specify IMPLICIT or EXPLICIT for a tag, the tag will be explicit if the tagging environment is EXPLICIT and implicit if the tagging environment is IMPLICIT or AUTOMATIC. An automatic tagging environment is basically the same as an IMPLICIT tagging environment, except that unique tags are automatically assigned for members of SEQUENCE and CHOICE types.
Note that in the above example, the three components of MyType2 are all optional. In BER/CER/DER, a decoder will know what component is present based on the encoded tag (which obviously better be unique).
ASN.1 BER and DER use ASN.1 TAGS to unambiguously identify certain components in an encoded stream. There are 4 classes of ASN.1 tags: UNIVERSAL, APPICATION, PRIVATE, and context-specific. The [0] is a context-specific tag since there is no tag class keword in front of it. UNIVERSAL is reserved for built-in types in ASN.1. Most often you see context specific tags to eliminate potential ambiguity in a SEQUENCE which contains OPTIONAL elements. If you know you are receiving two items that are not optional, one after the other, you know which is which even if their tags are the same. However, if the first one is optional, the two must have different tags, or you would not be able to tell which one you had received if only one was present in the encoding.
Most often today, ASN.1 specification use "AUTOMATIC TAGS" so that you don't have to worry about this kind of disambiguation in messages since components of SEQUENCE, SET and CHOICE will automatically get context specific tags starting with [0], [1], [2], etc. for each component.
You can find more information on ASN.1 tags at http://www.oss.com/asn1/resources/books-whitepapers-pubs/asn1-books.html where two free downloadable books are available.
Another excellent resource is http://asn1-playground.oss.com where you can try variations of ASN.1 specifications with different tags in an online compiler and encoder/decoder. There you can see the effects of tag changes on encodings.
[0] is a context-specific tagged type, meaning that to figure out what meaning it gives to the fields (if the "Constructed" flag is set) or data value (if "Constructed" flag is not set) it wraps; you have to know in what context it appears in.
In addition, you also need to know what kind of object the sender and receiver are exchanging in the DER stream, ie. the "ASN.1 module".
Let's say they're exchanging a Certificate Signing Request, and [0] appears as the 4th field inside a SEQUENCE inside the root SEQUENCE:
SEQUENCE
SEQUENCE
INTEGER 0
SEQUENCE { ... }
SEQUENCE { ... }
[0] { ... }
}
}
Then according to RFC2968, which defines the DER contents for Certificate Signing Request, Appendix A, which defines the ASN.1 Module, the meaning of that particular field is sneakily defined as "Attributes" and "Should have the Constructed flag set":
attributes [0] Attributes{{ CRIAttributes }}
You can also go the other way and see that "attributes" must be the 4th field inside the first sequence inside the root sequence and tagges as [0] by looking at the root sequence definition (section 4: "the top-level type CertificationRequest"), finding the CertificationRequestInfo placement inside that, and finding where the "attributes" item is located inside the CertificationRequestInfo, and finally seeing how it is tagged.
ASN.1 is so simple, I don't understand how more people don't get it.. "It's just a TLV format.." ;-)
I finally worked through this and thought that I would provide some insight for anyone still trying to understand this. In my example, as in the one above, I was using an X.509 certificate in DER format. I came across the "A0 03 02 01 02" sequence and could not figure out how that translated to a version number of 2. So if you are having the same problem, here is how that works.
The A0 tells you it is a "Context-Specific" field, a "Constructed" tag, and has the type value of 0x00. Immediately, the context-specific tells you not to use the normal type fields for DER/BER. Instead, given this is a X.509 certificate, the type value is labeled in the RFC 5280, p 116. There you will see four fields with markers on them of [0], [1], [2], and [3], standing for "version", "issuerUniqueID", "subjectUniqueID", and "extension", respectively. So in this case, a value of A0 tells you that this is one of the X.509 context-specific fields, specifically the "version" type. That takes care of the "A0" value.
The "03" value is just your length, as you might expect.
Since this was identified as "Constructed", the data should represent a normal DER/BER object. The "02 01 02" is the actual version number you are looking for, expressed as an Integer. "02" is the standard BER encoding of Integer, "01" is your length, and "02" is your value, or in this case, your version number.
So given that X.509 defines 4 context-specific types, you should expect to see "A0", "A1", "A2", and "A3" anywhere in the certificate. Hopefully the information provided above will now make more sense and help you better understand what those marker represent.