Open Grid Forum: Data Format Description Language Working Group

Weekly Working Group Conference Call
17:00 GMT, 21 Nov 2007

Attendees
Mike Beckerle (IBM)
Geoff Judd (IBM)
Steve Hanson (IBM)
Suman Kalia (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)


1. Introduction
Mike would like, in this meeting, to cover the hexBinary and base64Binary debate, and also to discuss the use of 'any' wildcards with minOccurs and maxOccurs.

2. 'Any' wildcards with minOccurs and maxOccurs
The DFDL specification presently disallows the use of minOccurs and maxOccurs on 'any' wildcards, in contrast to XML schema. Simon stated that he sees no reason to forbid this, and that these properties might be useful, for example when structures could be followed by arbitrary extensions.

Suman felt that the most common use case would be minOccurs="0" and maxOccurs="1". Mike wondered whether this would be useful within unordered groups, and thought it would be a good way to model an 'all' group containing some known fields and a number of unknown fields.

Mike took an action item to investigate this further.

3. hexBinary and base64Binary
The working group has been discussing, via email, the use of hexBinary and base64Binary types, along with 'enumeration' and 'pattern' properties. Mike asked the group whether we should disallow the 'enumeration' and 'pattern' properties on binary types (as they are difficult to use, in particular with base64Binary), and whether we should remove support for base64Binary (as it shares a value space with hexBinary, and is therefore a synonym of hexBinary in DFDL).

Simon felt that this distinction remained a useful hint to any component emitting XML based on a DFDL infoset. He also observed that base64Binary is more commonly used than hexBinary, and is preferred. Mike argued that while base 64 might be preferred to hexadecimal in terms of space, hexadecimal is more readable. Steve observed that hexBinary is commonly used.

While Mike felt that supporting fixed and default binary values might lead to requiring support for patterns and enumeration, the meeting agreed that there are use cases for default and fixed values - for example, some file formats use "eyecatchers" which are best expressed in hexadecimal. Mike pointed out that this could be achieved usng a string type, but suggested allowing 'default' and 'fixed' for hexBinary. Simon suggested that something similar would be necesssary for base64Binary, as some values (such as passwords and identifiers) are frequently expressed in base 64.

Sandy Gao (IBM) has been asked to comment on whether there are any use cases where patterns are used with hexBinary of base64Binary.

To conclude this discussion, Mike proposed the following: to retain support for both base64Binary and hexBinary, with identical content in DFDL; to allow both 'fixed' and 'default' for both base64Binary and hexBinary, but (pending further information from Sandy) to disallow 'pattern' and 'enum'.

[Simon and Suman left the meeting]

4. Array Prefixes and Suffixes
Mike asked whether the group was happy with the omission of array prefixes and suffixes. We know how to add these back should we ever need to, and there is a concern that including them would lead to many more array properties being necessary. Steve was happy with the present proposal.

5. Choice type and Length properties on Choice
Mike observed that there is a need to distinguish between choice groups which are of constant length, and choice groups where the length is determined by the relevent subelement. Where the choice is unresolvable, it is not possible to have a choice of variable length.

Steve felt that as we are able to make assertions, there would be very few cases where a choice is unresolvable. Mike pointed out that in an unresolvable choice group, each arm would need consistant enough syntax for a parser to be able to determine the end; and that this could be modelled as arms with enough information to discriminate. Geoff suggested that experience with IBM's MRM technology shows this to be unusual.

The meeting considered two options. We could specify two properties, one to select between constant length and variable length; and one to select between resolvable and unresolvable. In this option, the combination variable-length/unresolvable would be disallowed. In the second option, we would have a single property with three possible values: constant length, variable length or unresolvable. The meeting agreed upon the second option.

When experimenting with the DFDL language recently, Steve found specifying length on structures to be awkward. He proposed removing 'lengthKind' on choice elements, and Mike added that we would also wish to remove other associated properties such as 'intitator' and 'terminator'. On reflection, the meeting decided to keep these properties, noting that using these on a choice element is identical to wrapping the choice element in a sequence element with the same properties.

6. Other business
There has, internally within IBM, been a discussion regarding length prefixes on strings. Mike will circulate a proposal to the working group, to allow prefix formats to be described through annotations on simpleType definitions.

Meeting closed, 17:55 GMT


Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU