What are the reasons for choosing ASN.1?

Some reasons for choosing ASN.1 are:

  • ASN.1 allows implementers to choose whatever programming language best suits them, and within that language to select the binding for data types that are most appropriate for their applications. For example, if ASN.1 defines a type as a collection of items, you are free to represent it as a linked list, array, etc., depending on what works best for your language/application.
  • ASN.1 allows you to define messages in such a way that if new fields are added to the message in the future, your old applications that do not understand the new fields will continue to work just fine with newer applications that understand the new fields. This way you don't have to switch all implementations to the new version of the message at the same time. With such messages, older implementations are aware that they should expect and ignore new fields in messages.
  • ASN.1 allows you to impose constraints on fields in a message. For example, you can indicate that an integer type should carry only the the values 1, 2, 7-10, or that a character string should be between 20-30 bytes in length.
  • ASN.1 allows you to express relationship between fields of a message. For example, you can indicate that if a given field contains a 7 then another field must be present.
  • ASN.1 allows you to define OPTIONAL fields for which little or no data is transmitted if there is no data present for them.
  • ASN.1 allows the author of the message specification (e.g., a standards writer) to clearly indicate to implementers the nature of fields in the message in a clear and concise manner.
  • ASN.1 boosts productivity by freeing protocol designers to describe the layout of messages without them having to delve into the details of the bits and bytes of how data will look while in transit between two machines.
  • By defining messages using a formal, compilable notation ASN.1 makes it possible to boost productivity by use of tools that convert messages described using ASN.1 to languages such as C, C++ or Java, etc., and encoders/decoders to minimize or eliminate the need to figure out how to serialize the data for transmission.

Is there any trade-off to using extensibility?

Use of extensibility will result in a slightly more complex header file being generated if you wish to relay the value that was received (i.e., if you want unexpected values to be returned to you by the decoder, instead of being ignored by the decoder). In the majority of cases it is sufficient for a "older" versions to ignore the extension values received from a "newer" versions because it typically will not know what to do with them.

As far as your application code goes, in general it does not result in larger or more complex code.

If you are using PER the encodings will be a little larger if you use type extensibility. Type extensibility has no effect on the size of BER encodings.

Unless you know ahead of time that you will never need to extend a given type, you should define it as extensible.

Does tagging affect encoded data in PER?

In general, encoded data looks the same no matter what the tags.

The only exception is the encoding of choice types. In PER, each alternative of a choice type is identified by an index. Those indexes are assigned to the alternatives in an order that depends on the tag of each alternative. When automatic tagging is used, the indexes do correspond to the order of definition of the alternatives.

How are open type values encoded in PER?

In PER open types are encoded the same as values of unconstrained OCTET STRING types. This means that the length can be one or two bytes, or that the encoding is fragmented if the length is > 16K bytes, etc.

What is OER?

OER stands for Octet Encoding Rules. OER was published as Rec. ITU-T X.696 | ISO/IEC 8825-7, and was designed to be easy to implement and to produce encodings more compact than those produced by the Basic Encoding Rules (BER). In addition to reducing the effort of developing handwritten encoder/decoders, the use of OER can decrease bandwidth utilization (though not as much as the Packed Encoding Rules), save CPU cycles, and lower encoding/decoding latency.

What is ASN.1 and its Encoding Rules?

The International Standards Organization (ISO), the International Electrotechnical Commission (IEC) and the International Telecommunications Union - Telecommunications Sector (ITU-T) (formerly known as the International Telegraph and Telephone Consultative Committee (CCITT)) have established Abstract Syntax Notation One (ASN.1) and its encoding rules as a standard for describing and encoding messages. ASN.1 is a formal notation for abstractly describing data to be exchanged between distributed computer systems. Encoding rules are sets of rules used to transform data specified in the ASN.1 notation into a standard format that can be decoded by any system that has a decoder based on the same set of rules.

Why does the presence of a named bit list influence whether trailing zero bits are encoded or not in PER and DER?

Named bit lists influence whether trailing 0 bits are encoded because they are typically used to specify the semantics of particular bits in a bitmap. For example, you might have:

Capabilities ::= BIT STRING {eject(0), rewind(1), retension(2)}

Here the first three bits of every Capabilities value carry the same distinct meaning. Contrast this with a BIT STRING that has no named bit list, which is simply a container for arbitrary binary data for which particular bits don't carry predefined semantics. For example:

BinaryString ::= BIT STRING

The semantics of a Capabilities value of the values '1'B, '10'B and '100'B are the same (they all mean the eject capability is supported but not the rewind or retension capability). Further, these carry the same semantics as '1000'B, '10000'B, etc., and a receiving application must be prepared to accept such values. Since the value '1'B has the same semantics as all these other values, PER & DER require that you encode it in the canonically simplest form that satisfies any subtype constraint placed on it, in this case as a single bit value of '1'B. This way there is a single way to encode BIT STRING that has a named bit list.

How are CHOICE extension additions encoded in PER?

X.691(2008) clause 23.8 stipulates that CHOICE extension addition values be encoded as if they were the value of an open type. The encoding of open type values are always prefixed with a length of the value that follows. The reason that PER requires CHOICE extension addition values be encoded as open type values is that an older version receiver will not know the type definition of the encoded value. Thus, it always prefixes the encoding of the CHOICE extension addition value so an older version application which is receiving will know how to skip past the value of the type which is unknown to it.

I am using the indefinite-length form of encoding. Why is the primitive/constructed bit always set to 1?

This is because the restriction on the use of indefinite-length form requires that indefinite length encodings are always constructed ones. Values of simple types, like INTEGER, BOOLEAN, etc., must use a definite-length form of encoding. This ensures that the constructed types are safely encoded, such as SEQUENCE, SET, etc., using the indefinite-length form, since the decoder can always correctly determine where the end-of-contents--two zero octets--appear.

Do I have to manually add tags to CHOICE and SET components which have the same ASN.1 type? Or is there a way to automatically add the needed tags?

You can add the AUTOMATIC TAGS keywords to your module definition statement to instruct an ASN.1 tool to automatically add the needed tags for component differentiation. In particular, the OSS ASN.1 compiler provides all necessary tags when the AUTOMATIC TAGS keywords are present.

The following excerpt illustrates the use of the AUTOMATIC TAGS keywords:

ChoiceA ::= CHOICE {

Note how the elements of the CHOICE above do not have any context-sensitive tags applied to them.

When decoding a PER-encoded PDU, IA5String characters appear to be decoded as if 1 were subtracted from their numeric values. Why?

This is a common error that is a result of a typo in the ASN.1 syntax on the encoder or the decoder side. It is often very easy to omit a space character. For example suppose on the encoder side IA is defined as:

IA ::= IA5String (FROM ("0123456789No.*,"))

but on the decoder side as:

IA ::= IA5String (FROM ("0123456789No. *,")) <-- space is here

with an additional space character " ". This leads to the value

a IA ::= "1234"

being decoded in PER as "2345" instead of "1234". The permitted alphabet in PER plays a crucial role in how the value is encoded/decoded.

Can you explain UTF8String and how it is encoded?

UniversalString and UTF8String both support exactly the same character set, and for both the first 64K characters are the set of characters in BMPString. Notice that the first 128 characters of BMPString is the same set of abstract characters as IA5String (we use the term "abstract" to point out that they are effectively the same, but that their encoding is different), and since BMPString is a subset of UniversalString and UTF8String it implies that IA5String is the first 128 abstract characters of these character string types.

Okay, now that we know that UTF8String is not composed of BMPString and UniversalString characters, but is simply a different way of encoding exactly the same set of characters that BMPString and UniversalString encode, let's talk about how it is actually encoded.

In short, if the first bit of the first byte in a character is a 0 it means that this character is one byte long, and if you look at the character map you will see that this set of characters (of which there are 128, naturally) is U.S. ASCII (i.e., IA5String).

If the first 3 bits of a character is 110 it means that the character is 2 bytes long, and has value 110xxxxx 10xxxxxx, where the x's are the significant bits and the 11 in 110 means the character is two bytes long.

If the first 4 bits of a character is 1110 it means that the character is 3 bytes long, and has value 1110xxxx 10xxxxxx 10xxxxxx, where the x's are the significant bits and the 111 in 1110 means the character is three bytes long.

If the first 5 bits of a character is 11110 it means that the character is 4 bytes long, and has value 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx, where the X's are the significant bits and the 1111 in 11110 means the character is four bytes long.

If the first 6 bits of a character is 111110 it means that the character is 5 bytes long, and has value 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx, where the X's are the significant bits and the 11111 in 111110 means the character is five bytes long.

If the first 7 bits of a character is 1111110 it means that the character is 6 bytes long, and has value 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx, where the X's are the significant bits and the 111111 in 1111110 means the character is six bytes long.

For more information about UTF8String please refer to RFC2279 in either of these locations:

Should the starting bit of a PER ALIGNED encoding of extension additions be octet-aligned or octet-unaligned?

It should be added as an octet-unaligned bit-field.
Sections 19.7 and 19.8 of X.691(2008) says that the encoding of extension additions starts with the bit mask whose bits indicate the presence of particular extensions. The bit mask, in turn, is prefixed with its length, which according to 19.8 is encoded as a "normally small length".
Section of X.691(2008) states that the encoding of a "normally small length" starts with a single bit bit-field that is either 0 or 1 (0 if the number of extensions is <=64 and 1 otherwise).
The term "bit-field" is explained in section 3.7.3 of X.691(2008), which is followed by the clarifying note:
Note: If the use of this term is followed by the "octet-aligned in the ALIGNED variant", this means that the bit-field is required to begin on an octet boundary in the complete encoding for the aligned variant of PER.
Since section of X.691(2008) does not explicitly mention that the single bit bit-field is octet aligned, it implies that alignment on an octet boundary is not required.

See clause 11.1.4 of X.691(2008) for the difference in how bit-fields are used in constructing the complete encoding, as opposed to how octet-aligned bit-fields are used.

Is there a way to skip decoding some unwanted fields in a SET/SEQUENCE in BER?

Yes, you can do that but only in BER/DER/CER, not in PER/UPER due to the nature of PER. Consider the following ASN.1 syntax in BER/DER/CER:




s S1 ::= {
b 25,
c "xx",
d '11'H


Based on the above syntax, you can encode the S1 PDU but decode it using the S2 PDU, the type of which makes use of ASN.1 extensibility. The first "..." marks the start of the extension and the second one marks its end. The fields, d, that follow the second "..." continue the extension root. In BER/DER/CER the decoder would simply skip all the fields that are in between the two extension markers and continue decoding with the field d.

Why does zero padding appear in PER ALIGNED encodings of short constrained restricted Character String Types?

Let's consider:

N ::= NumericString (SIZE(0..3))

n N ::= "27"


N ::= NumericString (SIZE(0..4))

n N ::= "27"

Clause 30.5.7 of X.691(2008) says:
30.5.7 If "aub" does not equal "alb" or is greater than or equal to 64K, then 11.9 shall be invoked to add the bit-field preceded by a length determinant with n as a count of the characters in the character string with a lower bound for the length determinant of "alb" and an upper bound of "aub". The bit-field shall be added as a field (octet-aligned in the ALIGNED variant) if "aub" times "b" is greater than or equal to 16, but shall otherwise be added as a bit-field that is not octet-aligned. This completes the procedures of this subclause.

Since we have SIZE(0..4), we calculate:

aub * b == 4 * 4 = 16

which makes us add the padding in question when the upper bound is greater than 3.

What is the difference in usage of the extension marker in type definitions and Information Object Sets? Is the extension marker invisible?

The extension marker is invisible as far as the type definition is concerned, but not invisible as far as simple table constraints and component relation constraints are concerned.
There is a distinction between the type itself being extensible, versus the object set that constrains it being extensible. In the case of the type being extensible it innately can assume any value allowed by the extensible constraint. For example,

INTEGER (1..8, ...)

can assume any valid value at any time. Contrast this with an INTEGER type that is constrained using a simple table constraint, where such a type can assume only those values that happen to be contained in the information object set at the instant in time when the type is being encoded/decoded. This can vary from minute to minute as a program runs, since the set of objects within an extensible information object set can vary at runtime.
The distinction is less significant in the case of BER, DER and CER where the extensibility of a type does not play a role in how it is encoded, but it plays a major role in PER. In PER values of types defined with an extension marker "..." are encoded with a 1-bit prefix which when set to 0 means that the value that follows is in the extension root and thus is encoded in an optimized form. (E.g., values 1-8 in the above example would be encoded in 3 bits). However, when set to 1 it means that the value that follows is encoded in a more generic form. (E.g., values not in the range 1-8 in the above example occupy 16 bits or more).

Can you explain how type extensibility works in PER?

Consider the following two ASN.1 syntax definitions: 

A ::= SEQUENCE { --defined in v1

A ::= SEQUENCE { --defined in v2

The purpose behind type extensibility is to allow V1 applications that don't understand the new fields to receive V2 messages that have fields that it does not recognize and treat them as though they were sent by a V1 application, and likewise, for V2 applications to receive V1 messages that are missing fields. If the V2 application receives a message that is missing mandatory extension additions, it can safely assume that the message was originated by a V1 application.

The mandatory field after the extension marker MUST be encoded only if there is a bit for it in the extension addition bitmap that says which extension addition values are present/absent. Thus, there is an extension addition x defined after the mandatory extension addition y and a value for x is present in the encoding, then a value for y MUST be present. Also, if the mandatory extension addition y is the last component in the SEQUENCE and a bit is present for it in the extension addition bitmap then that bit must be set to 1, for the very presence of the bit indicates that the originator of the message knew about this extension addition, and as such its presence is mandatory. Only when the message is being relayed from an earlier version of the message definition that did not define the mandatory extension addition can it be omitted (in which case there would be no bit for it in the extension addition bitmap). This is pointed out in the ITU-T Recommendation X.680(2008) 25.15 Note 2:

"ComponentType"s that are extension additions but not contained within an "ExtensionAdditionGroup" should always be encoded if they are not marked OPTIONAL or DEFAULT, except when the abstract value is being relayed from a sender that is using an earlier version of the abstract syntax in which the "ComponentType" is not defined.

In other words, PER treats extension additions marked OPTIONAL exactly the same as extension additions that are not OPTIONAL.

How are GeneralString, GraphicString, etc. different from other widely used string types in ASN.1?

GeneralString, GraphicString, TeletexString and VideotexString all have the characteristic that they allow escape sequences in specifying characters. Thus, a character in one of these types may occupy one octet, or two, or three, ...., and the number of octets per character is not necessarily fixed for a given character string value, it can vary. Contrast this with IA5String,PrintableString, VisibleString, NumericString, BMPString and UniversalString which all have a fixed number of bits per character, and are thus called known-multiplier character string types or fixed-width character string types. The likes of GeneralString, etc. are variable-width character string types.

Would I ever need to encode/decode Information Objects?

No, you never encode or decode information objects. They serve only to provide information on how components of messages are related, how to determine the type of the value within an open type, etc.

Is there any difference between encoding signed vs. unsigned INTEGER in BER?

All INTEGERs in BER are encoded as signed integers. Note that the first nine bits of an integer value is never all 1's or all 0's.

Is there any limitation on the tag number in ASN.1?

ASN.1 imposes no limit on the tag number, but the NIST Stable Implementation Agreements (1991) and its European and Asian counterparts limit the size of tags to 16383.

What does DEFAULT {} mean in ASN.1?

DEFAULT in general means that it is semantically indistinguishable whether the value was encoded or not. In general it means that you can choose to omit the value if it is the default value, though some encoding rules (e.g., DER) mandate that the value never be encoded if it is the default value.

DEFAULT {} is valid only for BIT STRING that has a named bit list, SET OF and SEQUENCE OF. In the case of BIT STRING it means that the default value is the empty string (length 0), and in the case of SET OF and SEQUENCE OF it means a value with 0 occurrences.

Could I encode something with DER and then decode it with BER?

Yes, you can. All valid DER encodings are valid BER encodings, but the opposite is not always true.

What Is Canonical Encoding Rules (CER)?

It is similar to BER in the sense that all valid CER encodings are valid BER encodings. Whereas BER allows multiple ways to encode most values, CER stipulates that only a single one of those ways is allowed for a given value (e.g., BER says for BOOLEAN a value of 00 is FALSE and any non-zero value is TRUE, while CER says that 00 is FALSE and FF is TRUE, and values 01-FE are false). It is similar to DER in most ways, since DER also stipulates a single way to encode any given value. Where they differ most is:

1. DER uses definite-length encoding, while CER uses indefinite-length encoding.

2. DER requires string types to be encoded in the primitive form, while CER requires that string types be encoded in the primitive form if they are less than 1000 octets long, and in the constructed form with 1000 byte segments (except for possibly the last sentence) if they are more than 1000 bytes long.

3. In DER components of a SET must be sorted at runtime. In CER components of a SET are presorted based on the tags, using the same algorithm that is utilized in PER.

What is the significance of an OPTIONAL element in extension additions?

Consider the type:

MySeq ::= SEQUENCE {
e2 INTEGER (0..65535) OPTIONAL

Extension addition items are all "optional" for versions of implementations in which those items are not defined (e.g., in version 1 of MySeq e1 and e2 are not defined), but in versions of implementations in which the extension additions are defined (e.g., let's say e1 and e2 are defined in version 2) an extension addition item which is not marked OPTIONAL is mandatory for that version, and those items that are marked OPTIONAL are optional to that version. Thus, if e1 and e2 above are defined in version 2 but not version 1, a version 2 implementation is required to always transmit e1 if it is originating the message because it is not marked OPTIONAL, but can omit e2. If it is not originating the message (e.g., it is forwarding a message that was received from a version 1 implementation) then it is free to omit e1 and e2 if they were not present in the message. An implication of this is that you can never have e2 present in a message if e1 is absent.