Open Grid Forum: Data Format Description Language Working Group

Weekly Working Group Conference Call
17:00 GMT, 09 Jan 2008

Attendees
Mike Beckerle (Oco)
Geoff Judd (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)
Steve Hanson (IBM)
Suman Kalia (IBM)

Agenda
1. OGF 22 in Cambridge, MA
2. Level set on specification drafts
3. Expression Language
4. Nulls and defaults - can we drop useNullForDefault?
5. Other business

1. OGF22
The next OGF conference will be held February 25-29 in Cambridge, MA. As he is local, Mike is planning to attend to represent DFDL. The working group should decide what we would like to present at the conference, if anything, and Mike will enquire upon the closing date for submissions. Could be Jan 11th?

2. Specification Drafts
Mike circulated draft 30 of the DFDL specification before Christmas, and had prepared a plan covering the contents of the next three drafts. The objective of the plan was to guide the group to the stage where the specification was not a limiting factor to progress and that implementations could proceed with a reasonable expectation that the specification would not change significantly. Steve mentioned that IBM are attempting to assign remaining workitems internally, and wanted to coordinate this with the other working group members to avoid duplication of effort.

Due to the demands of his new role, Mike will need to pass some items that he had been hoping to tackle on to other people. He suggested that editorship of the specification should pass around the group with each draft, ideally to whoever would be making the most significant changes in that draft.

For the next draft, number 31, Steve suggested that Alan might be an appropriate editor as he is working on the expression language, which is a key subject for the next draft. Simon would also like to own a draft and would consider this, but that he could not commit in the meeting.

3. Expression Language
The group has previously discussed difficulties with forward/backward references in expressions. Mike observed that forward-referencing expressions can occur in a DFDL schema but could only be used during unparse. Discussing whether it is feasible to police this statically, Mike reckoned that while it may be difficult to analyze an expression to see whether it referred forward or not, this would probably be a decidable problem (eg, follow the dfdl:outputValue chain).

Steve asked how we should specify the data type to be returned from an expression, there being two candidates:
a) the XML Schema type of the DFDL property
b) the 'resolved' data type of the DFDL property as needed by the parser
Take dfdl:length as an example. The XML Schema type is 'string' because the field can accept numeric literals, expressions, regular expression, etc. But the parser will always want an integer.
Agreed that an expression should return the 'resolved' data type.

Steve asked whether, in the property descriptions, we should include the allowable return type from an expression. Mike believed that we should, as it may be distinct from the DFDL type for that field.
So, the dfdl:length property description in the spec needs to say exactly what the options are - eg, "a literal integer, or an expression that resolves to an integer, or a regular expression that resolves to an integer".

Using the XSD "maxOccurs" field as an example, which is normally an integer but may also be the token 'unbounded', Simon suggested that simply using the 'resolved' type may not be sufficient and that a processor will need to be aware that, in some cases, the result of an expression may not be the natural type. Mike concluded that we would need to specify both types as above and also any 'distinguished tokens'.

Finally, should a DFDL engine automatically cast an expression result to the 'resolved' type, or instead strictly enforce the return type of the expression. The group felt the latter option to be preferable.

(Alan Powell joined the meeting)

4. Nulls and defaults
Steve would like to review his previous correspondance with Mike before discussing this further. It will be included in the agenda for next week's meeting.

5. Property Precedence
Geoff and Steve have been preparing a proposal for precedence using a mind map. Steve will distribute this initial proposal for wider review.

(Mike Beckerle and Suman Kalia left the meeting)

6. Entity references
Alan has been looking at the use of XML entity references to more easily allow non-printable characters to be written into DFDL documents, and has distributed a proposal within IBM. There are some issues around this at the moment (need DTD to define entities, allowable characters in XML 1.0 docs). Alan is looking at these.

This discussion in IBM had led to the concept of a mechanism to easily represent arbitrary whitespace, which is a common feature of text formats but which causes problems when modelling. Simon has experience with this concept and will send Steve a description of how PolarLake handle this..

Steve suggested we could handle this by allowing delimiters to be a list of allowable values, with the first used as a default on unparse. (We already have this idea for dfdl:nullValue). Simon observed that this could not handle arbitrary length whitespace. Steve said that we should have entities that cover that - like <WSP> and <OWSP> in IBM's WTX parser (the O meaning optional) - these are extremely useful. So then you could say things like (ignore incorrect entity syntax):

dfdl:separator ="x0Dx0A x0D"

meaning allow the separator to default to CRLF but allow LF on its own.

However, Steve also pointed out that in the EDI data format the choice of delimiter comes from an expression, adding to the complexity, because the allowable value of the delimiter is then <value from expression> concatenated with <entity>. Is that supported by current spec?. Eg:

dfdl:separator ="{..\delimiter} {..\delimiter}x0Dx0A {..\delimiter}x0D"

Simon wondered if we could deal with this situation in a different way by perhaps handling it as 'delimiter padding' and having a DFDL option to allow/trim it. But he cautioned that we must avoid ambiguity - for example, to handle whitespace at the end of a delimiter which is followed by data which allows whitespace. Steve said that in that situation you have no choice but to explictly model the whitespace and not use the arbitrary entities.
Geoff thought that if we did go for the trimming approach we may need to describe separate sets of rules for whitespace handling for the markup region and for the data region.

Steve will take an action to come up with a proposal.

7. Other business

Steve would like to discuss a model of ACORD AL3 length-prefixed data on the working group call, and will add an item to next week's agenda. Mike and Geoff have been corresponding on that.
Within IBM, some changes have been proposed to Mike's UML model of DFDL. This will be circulated to the working group when IBM comments are complete.

Meeting closed, 17:45 GMT

Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU