The enveope/payload style of data format
is quite common, where the envelope provides control information and the
payload contains the business data. Examples are SWIFT and SAP IDocs. Typically
the envelope contains a tag that identifies the payload, which can be one
of many types. For SWIFT there are 300 possible types. To model this today
in DFDL requires an xs:choice with each type modeled as an xs:element branch
of the choice. A discriminator on each xs:element refers back to the envelope
tag element thus enabling the choice to be resolved.
There are two issues with this approach.
1) Performance. Even if the elements
in the branches are ordered for expected frequency, there will still be
cases when tens or hundreds of discriminators need to be evaluated before
the choice is resolved.
2) Tight coupling. When a new type is
added, a new element branch needs to be added to the choice.
Action 145 proposes a mechanism to solve
issue #1 and which opens the door to a possible extension to DFDL to solve
issue #2 - namely a faster way to resolve a choice.
Details:
A new dfdl:choice property is added
called dfdl:choiceBranchRef of type DFDL Expression. The expression must
evaluate to a QName which corresponds to one of the element branches of
the choice, and asserts 'known to exist' for that branch. Rules:
- The property behaves like dfdl:ref
and dfdl:hiddenGroupRef in that it is not possible to set a value in scope
by a dfdl:format annotation, and is only set at its point of use. This
is because there is nothing sensible that could be set in scope. But it
has the benefit that adding support for the property to existing DFDL implementations
will not suddenly cause errors to appear in existing DFDL schemas.
- Empty string is not an allowed value.
- The property is only used when parsing.
- All branches must be local elements
or element references. It is a schema definition error if any branch is
a sequence, a choice or a group reference.
- It is a processing error if the QName
does not resolve to one of the branches when parsing..
- It is a schema definition error if
a choice has the property set and also has dfdl:initiatedContent="yes"
set locally.
- Because the expression must return
a QName, the expression language must provide a constructor for creating
a QName from a namespace string and a name string. If you take SWIFT MT103
payload as an example, the tag in the envelope says "103" but
a DFDL schema would actually model the global MT103 element with name "Document"
and namespace ="urn:swift:xsd:fin.103.2011".
So the dfdl:choiceBranchRef expression
would have to look like:
{fn:QName(fn:concat(fn:concat('urn:swift:xsd:fin.',
FinMessage/Block2/MessageType), ".2011"), 'Document')}
So we now have the ability to derive
a QName and apply it before we start to process a choice. That makes the
processing time for each branch of the choice independent of its order
in the schema.
We still have issue #2 so when a new
payload is added, a new branch must be added to the choice. A solution
to this is to allows xs:any wildcard elements back into DFDL, then provide
a property dfdl:wildcardRef which works in the same way as dfdl:choiceRef.
So at the point of encountering the wildcard we know its resolution in
the schema. This obviously will require some further discussion,
but you can see how this ability to evaluate an expression and return a
QName can be used in multiple ways.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU