Open Grid Forum: Data Format Description
Language Working Group
Weekly Working Group Conference Call
17:00 GMT, 21 Nov 2007
Attendees
Mike Beckerle (IBM)
Geoff Judd (IBM)
Steve Hanson (IBM)
Suman Kalia (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
1. Introduction
Mike would like, in this meeting, to
cover the hexBinary and base64Binary debate, and also to discuss the use
of 'any' wildcards with minOccurs and maxOccurs.
2. 'Any' wildcards with minOccurs
and maxOccurs
The DFDL specification presently disallows
the use of minOccurs and maxOccurs on 'any' wildcards, in contrast to XML
schema. Simon stated that he sees no reason to forbid this, and that these
properties might be useful, for example when structures could be followed
by arbitrary extensions.
Suman felt that the most common use
case would be minOccurs="0" and maxOccurs="1". Mike
wondered whether this would be useful within unordered groups, and thought
it would be a good way to model an 'all' group containing some known fields
and a number of unknown fields.
Mike took an action item to investigate
this further.
3. hexBinary and base64Binary
The working group has been discussing,
via email, the use of hexBinary and base64Binary types, along with 'enumeration'
and 'pattern' properties. Mike asked the group whether we should disallow
the 'enumeration' and 'pattern' properties on binary types (as they are
difficult to use, in particular with base64Binary), and whether we should
remove support for base64Binary (as it shares a value space with hexBinary,
and is therefore a synonym of hexBinary in DFDL).
Simon felt that this distinction remained
a useful hint to any component emitting XML based on a DFDL infoset. He
also observed that base64Binary is more commonly used than hexBinary, and
is preferred. Mike argued that while base 64 might be preferred to hexadecimal
in terms of space, hexadecimal is more readable. Steve observed that hexBinary
is commonly used.
While Mike felt that supporting fixed
and default binary values might lead to requiring support for patterns
and enumeration, the meeting agreed that there are use cases for default
and fixed values - for example, some file formats use "eyecatchers"
which are best expressed in hexadecimal. Mike pointed out that this could
be achieved usng a string type, but suggested allowing 'default' and 'fixed'
for hexBinary. Simon suggested that something similar would be necesssary
for base64Binary, as some values (such as passwords and identifiers) are
frequently expressed in base 64.
Sandy Gao (IBM) has been asked to comment
on whether there are any use cases where patterns are used with hexBinary
of base64Binary.
To conclude this discussion, Mike proposed
the following: to retain support for both base64Binary and hexBinary, with
identical content in DFDL; to allow both 'fixed' and 'default' for both
base64Binary and hexBinary, but (pending further information from Sandy)
to disallow 'pattern' and 'enum'.
[Simon and Suman left the meeting]
4. Array Prefixes and Suffixes
Mike asked whether the group was happy
with the omission of array prefixes and suffixes. We know how to add these
back should we ever need to, and there is a concern that including them
would lead to many more array properties being necessary. Steve was happy
with the present proposal.
5. Choice type and Length properties
on Choice
Mike observed that there is a need to
distinguish between choice groups which are of constant length, and choice
groups where the length is determined by the relevent subelement. Where
the choice is unresolvable, it is not possible to have a choice of variable
length.
Steve felt that as we are able to make
assertions, there would be very few cases where a choice is unresolvable.
Mike pointed out that in an unresolvable choice group, each arm would need
consistant enough syntax for a parser to be able to determine the end;
and that this could be modelled as arms with enough information to discriminate.
Geoff suggested that experience with IBM's MRM technology shows this to
be unusual.
The meeting considered two options.
We could specify two properties, one to select between constant length
and variable length; and one to select between resolvable and unresolvable.
In this option, the combination variable-length/unresolvable would be disallowed.
In the second option, we would have a single property with three possible
values: constant length, variable length or unresolvable. The meeting agreed
upon the second option.
When experimenting with the DFDL language
recently, Steve found specifying length on structures to be awkward. He
proposed removing 'lengthKind' on choice elements, and Mike added that
we would also wish to remove other associated properties such as 'intitator'
and 'terminator'. On reflection, the meeting decided to keep these properties,
noting that using these on a choice element is identical to wrapping the
choice element in a sequence element with the same properties.
6. Other business
There has, internally within IBM, been
a discussion regarding length prefixes on strings. Mike will circulate
a proposal to the working group, to allow prefix formats to be described
through annotations on simpleType definitions.
Meeting closed, 17:55 GMT
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU