All,

Here are the minutes from Wednesday's meeting - apologies for the delay. As ever please let me know of any corrections.

Cheers,

Ian

Open Grid Forum: Data Format Description Language Working Group

Weekly Working Group Conference Call
17:00 GMT, 30 Jan 2008

Attendees
Mike Beckerle (Oco)
Steve Hanson (IBM)
Geoff Judd (IBM)
Suman Kalia (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)

1. Specification drafts
Alan has not yet received any updates for the next draft of the specification, which is due soon. As Alan is on vacation from the end of next week, he is looking for input as soon as possible. The group recorded the following status for items targetting for the next specification draft:

Nulls/default/optionals - Mike and Steve will collaborate on this and hope to have a draft ready for the end of next week.
Description of schema components - Simon expects to have this ready by the end of the week..
Regular expressions for lengths - This is now targeted for the following specification draft ("vX+2")
Expression language - Alan has distributed a new draft for review and has asked for comments. Mike will review this next week.
valueCalc - Mike will write up at least a first draft, and aims to have this ready for next week.
Property precedence - Steve is looking for review comments on the previously distributed mindmap. The specification can include this information simply as a list, not as a "mindmap" tree. It may be useful to combine this information with the schema components diagram. The present proposal only handles parsing; Steve will extend this to cover unparsing.
Entities - The group has been having good discussions regarding the entities proposal.
White space handling - Whitespace seems to be largely handled by the entities proposal. Steve has been thinking about a way to introduce variable terminators to DFDL, which is discussed later in the meeting.

2. Expression Language
Alan distributed a new draft of the expression language proposal, and thanks Simon for his comments. As noted above, Mike will send review comments next week. The second part of the proposal document is text from the current specification draft, which will be replaced by the new proposal. This is included for "history" and is not intended to be reviewed.

3. Whitespace
The MIME header format allows for optional whitespace either side of a colon used to delimit the header name and its value. Steve felt that this justfied the need for the proposed %OWSP; entity (along with the %WSP; entity). Mike wondered whether this lead to a slippery slope where we try to handle other complex delimiters similarly, for example, case-insensitive delimiters. Steve would like to propose a more general way to handle complex delimiters in DFDL. Mike would have no objection to having both %WSP; and %OWSP; in the language if these could be described in terms of Steve's more general approach.

In PolarLake, the MIME header use case would be handled using a name field (terminated by a colon), with a optional field between the colon and the value which would consume any whitespace.

4. Recursive definitions for delimeters
Steve has proposed describing initiators in terms of named types. For example, an intiator could be defined using a simple enumeration type to list its possible values, or one using a pattern facet or assertions.

This would extend DFDL's current use of such facets, which are presently used only for validation. Mike distinguished between 'format' and 'content' data, and suggested that this conceptually means that we can interpret facets during parsing for format data, but only at validation for content. Simon suggested a similar concept of "system data" vs. "user data". We would need to revisit speculative parsing to deal with this issue.

An alternative, suggested by Simon, would be to relax the present restriction which allows an expression to only return one value. If an expression could return a sequence of values, as is allowed by XPath, then we could use expressions to describe delimiters with multiple possible values. On output, a DFDL unparser would write out the first value in the sequence. Mike observed that such a scheme would solve the 'quoting hell' problem present with simple, space-delimited, XML lists as presently allowed for the nullValues property.

If we adopt Steve's approach, we may not be able to access DFDL constructs (variables, expressions or entities) in facets. Steve pointed out that XSD processors would typically reject an enumerated type, restricted to length 1, where the enumeration includes a DFDL entity - the XSD processor will not recognise the DFDL entity and instead treat this as a string of length greater than 1.

Alan asked whether there are any dynamic cases, where the set of terminators is obtained from the document itself. Mike felt that this could be modelled using assertions, though this would leave terminator on output undefined; and that a solution where complex types may be used as terminators would probably allow us to handle most of these cases. Steve mentioned the EDI format, which allows a document-specified delimiter to be used with optional whitespace.

How, asked Simon, should we handle the output value of a complex delimiter? This must come either from the DFDL schema or from the infoset. Steve suggested we use default properties in the subelements of the type, and Mike suggested we could similarly use outputValueCalc. Steve and Mike agreed that a terminator wouldn't be present in the infoset, by analogy to the similar mechanism used for length prefixes. Further, this mechanism might allow us to remove a number of properties related to terminators.

Although Steve had intended this mechanism to be used with simple types or elements, Mike and Suman thought it would be appropriate to allow complex types. Elements would allow the use of the 'default' attribute for use on output. Mike contrasted this to the prefix length solution, where a simple type is used: the value under prefix length is treated as an integer, so it is appropriate to handle it as such. Here, however, we are modelling syntactic constructs. Simon felt that users will think in terms of elements.

Steve will prepare some examples and a proposal for inclusion in the "vX+2" draft.

5. Other business
Simon will email the group with some questions about the UML schema components description.

Meeting closed, 18:10 GMT

Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU