Open Grid Forum: Data Format Description
Language Working Group
Weekly Working Group Conference Call
17:00 GMT, 09 Jan 2008
Attendees
Mike Beckerle (Oco)
Geoff Judd (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)
Steve Hanson (IBM)
Suman Kalia (IBM)
Agenda
1. OGF 22 in Cambridge, MA
2. Level set on specification drafts
3. Expression Language
4. Nulls and defaults - can we drop
useNullForDefault?
5. Other business
1. OGF22
The next OGF conference will be held
February 25-29 in Cambridge, MA. As he is local, Mike is planning to attend
to represent DFDL. The working group should decide what we would like to
present at the conference, if anything, and Mike will enquire upon the
closing date for submissions. Could be Jan 11th?
2. Specification Drafts
Mike circulated draft 30 of the DFDL
specification before Christmas, and had prepared a plan covering the contents
of the next three drafts. The objective of the plan was to guide the group
to the stage where the specification was not a limiting factor to progress
and that implementations could proceed with a reasonable expectation that
the specification would not change significantly. Steve mentioned that
IBM are attempting to assign remaining workitems internally, and wanted
to coordinate this with the other working group members to avoid duplication
of effort.
Due to the demands of his new role,
Mike will need to pass some items that he had been hoping to tackle on
to other people. He suggested that editorship of the specification should
pass around the group with each draft, ideally to whoever would be making
the most significant changes in that draft.
For the next draft, number 31, Steve
suggested that Alan might be an appropriate editor as he is working on
the expression language, which is a key subject for the next draft. Simon
would also like to own a draft and would consider this, but that he could
not commit in the meeting.
3. Expression Language
The group has previously discussed difficulties
with forward/backward references in expressions. Mike observed that forward-referencing
expressions can occur in a DFDL schema but could only be used during unparse.
Discussing whether it is feasible to police this statically, Mike reckoned
that while it may be difficult to analyze an expression to see whether
it referred forward or not, this would probably be a decidable problem
(eg, follow the dfdl:outputValue chain).
Steve asked how we should specify the
data type to be returned from an expression, there being two candidates:
a) the XML Schema type of the DFDL property
b) the 'resolved' data type of the DFDL
property as needed by the parser
Take dfdl:length as an example. The
XML Schema type is 'string' because the field can accept numeric literals,
expressions, regular expression, etc. But the parser will always want an
integer.
Agreed that an expression should return
the 'resolved' data type.
Steve asked whether, in the property
descriptions, we should include the allowable return type from an expression.
Mike believed that we should, as it may be distinct from the DFDL type
for that field.
So, the dfdl:length property description
in the spec needs to say exactly what the options are - eg, "a literal
integer, or an expression that resolves to an integer, or a regular expression
that resolves to an integer".
Using the XSD "maxOccurs"
field as an example, which is normally an integer but may also be the token
'unbounded', Simon suggested that simply using the 'resolved' type may
not be sufficient and that a processor will need to be aware that, in some
cases, the result of an expression may not be the natural type. Mike concluded
that we would need to specify both types as above and also any 'distinguished
tokens'.
Finally, should a DFDL engine automatically
cast an expression result to the 'resolved' type, or instead strictly enforce
the return type of the expression. The group felt the latter option to
be preferable.
(Alan Powell joined the meeting)
4. Nulls and defaults
Steve would like to review his previous
correspondance with Mike before discussing this further. It will be included
in the agenda for next week's meeting.
5. Property Precedence
Geoff and Steve have been preparing
a proposal for precedence using a mind map. Steve will distribute this
initial proposal for wider review.
(Mike Beckerle and Suman Kalia left
the meeting)
6. Entity references
Alan has been looking at the use of
XML entity references to more easily allow non-printable characters to
be written into DFDL documents, and has distributed a proposal within IBM.
There are some issues around this at the moment (need DTD to define entities,
allowable characters in XML 1.0 docs). Alan is looking at these.
This discussion in IBM had led to the
concept of a mechanism to easily represent arbitrary whitespace, which
is a common feature of text formats but which causes problems when modelling.
Simon has experience with this concept and will send Steve a description
of how PolarLake handle this..
Steve suggested we could handle this
by allowing delimiters to be a list of allowable values, with the
first used as a default on unparse. (We already have this idea for dfdl:nullValue).
Simon observed that this could not handle arbitrary length whitespace.
Steve said that we should have entities that cover that - like <WSP>
and <OWSP> in IBM's WTX parser (the O meaning optional) - these are
extremely useful. So then you could say things like (ignore incorrect
entity syntax):
dfdl:separator ="x0Dx0A
x0D"
meaning allow the separator to default
to CRLF but allow LF on its own.
However, Steve also pointed out that
in the EDI data format the choice of delimiter comes from an expression,
adding to the complexity, because the allowable value of the delimiter
is then <value from expression> concatenated with <entity>.
Is that supported by current spec?. Eg:
dfdl:separator ="{..\delimiter}
{..\delimiter}x0Dx0A {..\delimiter}x0D"
Simon wondered if we could deal with
this situation in a different way by perhaps handling it as 'delimiter
padding' and having a DFDL option to allow/trim it. But he cautioned that
we must avoid ambiguity - for example, to handle whitespace at the end
of a delimiter which is followed by data which allows whitespace. Steve
said that in that situation you have no choice but to explictly model the
whitespace and not use the arbitrary entities.
Geoff thought that if we did go for
the trimming approach we may need to describe separate sets of rules for
whitespace handling for the markup region and for the data region.
Steve will take an action to come up
with a proposal.
7. Other business
- Steve would like to discuss a model
of ACORD AL3 length-prefixed data on the working group call, and will add
an item to next week's agenda. Mike and Geoff have been corresponding on
that.
- Within IBM, some changes have been proposed
to Mike's UML model of DFDL. This will be circulated to the working group
when IBM comments are complete.
Meeting closed, 17:45 GMT
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU