Asterix is a Eurocontrol aviation-related messaging data format.
We are trying to create a DFDL schema for Asterix.
Asterix messages have a field-spec or fspec at the start. This is a variable-length set of presence-bit flag bytes at the start of the message.
The fspec flags are a vector of bytes. Each byte contains 7 bits which are the presence-indicator flags, and an FX bit indicating whether there is another flags byte.
After those flags are the fields themselves which are, if present, fixed length. The fields are commonly numbers of various fixed sizes (int, float, byte, etc.)
The challenge is this. A given flag byte is only needed when unparsing if
- a field whose presence bit is in that flag byte is present in the message.
- a subsequent flag byte is needed. In that case the flag byte could be all 0 flags, but the FX bit will be 1 to indicate that there is a subsequent flag byte.
As an example, imagine a message has from 1 to 3 flag bytes at the start, allowing up to 21 optional fields to follow.
Flag byte 2 is needed if any of fields 8-14 exists, or if flag byte 3 exists.
Flag byte 2's FX bit is 1 if flag byte 3 exists.
For the sake of discussion, let's call the presence flags p1 to p21, so the fspec flags bytes are:
fspec1 fspec2 fspec3
p1, p2, p3, p4, p5, p6, p7 fx1 | p8, p9, p10, p11, p12, p13, p14, fx2 | p15, p16, p17, p18 p19, p20, p21
(fspec3 has a final unused bit. It uses only the first 7 bits of the byte.)
We'd like to compute the entire flag byte vector including the existence of each flag byte, based on the existence of the subsequent fields, which we can call f1 to f21.
We do this with dfdl:newVariableInstance like so:
<dfdl:newVariableInstance ref="s:p1Exists">{ fn:exists(../p1) }</dfdl:newVariableInstance>
...
<dfdl:newVariableInstance ref="s:p21Exists">{ fn:exists(../p21) }</dfdl:newVariableInstance>
<dfdl:newVariableInstance ref="s:fspec3Exists">{ $s:p15Exists or $s:p16Exists or ... $s:p21Exists }</dfdl:newVariableInstance>
<dfdl:newVariableInstance ref="s:fspec2Exists">{ $fspec3Exists or $s:p8Exists or $s:p9Exists or ... $s:p14Exists }</dfdl:newVariableInstance>
That allows us to then easily compute the values of each flag bit, and of each FX bit.
But we have no way at unparse time to only create the entire fspec2 or fspec3 element based on $s:fspec2Exists, or $s:fspec3Exists.
It appears it is not possible to do this in DFDL v1.0. Because dfdl:occursCountKind 'expression' does not evaluate the expression at unparse time.
Nor does choice evaluate a choice dispatch key at unparse time.
So it appears we have no way to evaluate an expression at unparse time, and use the value of that expression to decide to create an element in the augmented infoset that was not present in the initial infoset at the start of unparsing.
Hence, we cannot cause the creation of these optional fspec flag bytes based on the existence of subsequent fields when unparsing.
It is illegal in Asterix for the last flag byte to contain only zero bits.
So I believe DFDL cannot properly compute the fspec flag bytes based on the existence of the subsequent fields. I think application logic outside of the DFDL schema has to replicate the logic about whether a flag byte needs to exist or not, and place the corresponding fspec elements into the infoset.
It is very close. If we had either the ability to evaluate and select a choice branch at unparse time based on an expression, or the ability to evaluate an occursCount expression at unparse time to determine the existence or not of an element in the infoset, those would enable us to fully capture the Asterix format.