This table comes from a format specification we use:
Ignore the 'Logical' column above, that's about enums. Ignore the "*" which is just about when a value must be reserved as an in-band null indicator which is the suggested such value.
What is called 'Mod Twos Complement' here is what our existing proposed DFDL 2.0 feature calls 'offsetBinary'.
So this table suggests the need for 'unsignedBinary' (already mentioned), but also two others: 'signPlusMagnitudeBinary', and 'onesComplementBinary'.
Point 4: Zig Zag Integer representation is getting popular
There's one other representation I know of which is more recent/modern called zig-zag integers, popularized by google protocol buffers, but it's a clever representation and seemingly used in many places now.
Binary Value Zig Zag
000 0
001 -1
010 1
011 -2
100 2
101 -3
110 3
111 -4
Point 5: Variable Length Binary Integers
There are also variable-length integer formats that are not just strings of bits. A common one I have seen is used by ASN.1 BER representation where each byte if its MSB is 1 indicates that the integer extends an additional byte, contributing 7 bits to
the value. Unsigned integers are just the concatenation of these bits.
Signed integers are handled after the bits are concatenated together. If the first bit of the concatenation is 1, the value is twos complement negative value. Hence, if a positive value would have a first bit of 1, then an additional byte containing 10000000
must be used as the most significant byte so that the first bit will not be 1.
There is no way in DFDL to represent such a variable length integer representation and get an integer in the infoset. You have to use a hexBinary byte array.
There is a need for a variable-length integer like this to support not only explicit length (used by ASN.1 BER), but implicit length as well. In this case the last byte of the variable length integer does not have the MSB set. Hence, a single byte can
represent signed -64 to +63, or unsigned 0 to 127. Outside that range multiple bytes must be used, each byte contributing 7 bits.
This suggests a need for several additional dfdl:binaryNumberRep enums.
Point 6: Extensibility by implementations is needed here
There are many other representations out there as well.
I think we should have a convention where there is a core set that all DFDL representations must provide, and a convention by which DFDL implementations can provide additional support.
To me, a good way to do this is to allow the enum values for dfdl:binaryNumberRep to be not only regular enums (all of which are reserved) but QName syntax, where the prefix can be for a namespace recognized by an implementation for providing an extended
set of binary number representations. (Perhaps the dfdlx: prefix and namespace, or maybe we just allow implementation specific namespaces?)
This means of extending enums for existing properties is not part of our existing 'experimental features' conventions, but I propose that it should be added.
To me, this is a good way to generally allow property enums to be extended with experimental features in DFDL implementations, and applies to other places such as dfdl:binaryCalendarRep, and numerous other properties where we are finding a need for additional
enums and want to add them as experimental features.
That was long. Thanks for your consideration.
Mike Beckerle