All,
Here are the minutes from Wednesday's
meeting - apologies for the delay. As ever please let me know of any corrections.
Cheers,
Ian
Open Grid Forum: Data Format Description
Language Working Group
Weekly Working Group Conference Call
17:00 GMT, 30 Jan 2008
Attendees
Mike Beckerle (Oco)
Steve Hanson (IBM)
Geoff Judd (IBM)
Suman Kalia (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)
1. Specification drafts
Alan has not yet received any updates
for the next draft of the specification, which is due soon. As Alan is
on vacation from the end of next week, he is looking for input as soon
as possible. The group recorded the following status for items targetting
for the next specification draft:
- Nulls/default/optionals - Mike
and Steve will collaborate on this and hope to have a draft ready for the
end of next week.
- Description of schema components
- Simon expects to have this ready by the end of the week..
- Regular expressions for lengths
- This is now targeted for the following specification draft ("vX+2")
- Expression language - Alan has
distributed a new draft for review and has asked for comments. Mike will
review this next week.
- valueCalc - Mike will write up
at least a first draft, and aims to have this ready for next week.
- Property precedence - Steve is
looking for review comments on the previously distributed mindmap. The
specification can include this information simply as a list, not as a "mindmap"
tree. It may be useful to combine this information with the schema components
diagram. The present proposal only handles parsing; Steve will extend this
to cover unparsing.
- Entities - The group has been
having good discussions regarding the entities proposal.
- White space handling - Whitespace
seems to be largely handled by the entities proposal. Steve has been thinking
about a way to introduce variable terminators to DFDL, which is discussed
later in the meeting.
2. Expression Language
Alan distributed a new draft of the
expression language proposal, and thanks Simon for his comments. As noted
above, Mike will send review comments next week. The second part of the
proposal document is text from the current specification draft, which will
be replaced by the new proposal. This is included for "history"
and is not intended to be reviewed.
3. Whitespace
The MIME header format allows for optional
whitespace either side of a colon used to delimit the header name and its
value. Steve felt that this justfied the need for the proposed %OWSP; entity
(along with the %WSP; entity). Mike wondered whether this lead to a slippery
slope where we try to handle other complex delimiters similarly, for example,
case-insensitive delimiters. Steve would like to propose a more general
way to handle complex delimiters in DFDL. Mike would have no objection
to having both %WSP; and %OWSP; in the language if these could be described
in terms of Steve's more general approach.
In PolarLake, the MIME header use case
would be handled using a name field (terminated by a colon), with a optional
field between the colon and the value which would consume any whitespace.
4. Recursive definitions for delimeters
Steve has proposed describing initiators
in terms of named types. For example, an intiator could be defined using
a simple enumeration type to list its possible values, or one using a pattern
facet or assertions.
This would extend DFDL's current use
of such facets, which are presently used only for validation. Mike distinguished
between 'format' and 'content' data, and suggested that this conceptually
means that we can interpret facets during parsing for format data, but
only at validation for content. Simon suggested a similar concept of "system
data" vs. "user data". We would need to revisit speculative
parsing to deal with this issue.
An alternative, suggested by Simon,
would be to relax the present restriction which allows an expression to
only return one value. If an expression could return a sequence of values,
as is allowed by XPath, then we could use expressions to describe delimiters
with multiple possible values. On output, a DFDL unparser would write out
the first value in the sequence. Mike observed that such a scheme would
solve the 'quoting hell' problem present with simple, space-delimited,
XML lists as presently allowed for the nullValues property.
If we adopt Steve's approach, we may
not be able to access DFDL constructs (variables, expressions or entities)
in facets. Steve pointed out that XSD processors would typically reject
an enumerated type, restricted to length 1, where the enumeration includes
a DFDL entity - the XSD processor will not recognise the DFDL entity and
instead treat this as a string of length greater than 1.
Alan asked whether there are any dynamic
cases, where the set of terminators is obtained from the document itself.
Mike felt that this could be modelled using assertions, though this would
leave terminator on output undefined; and that a solution where complex
types may be used as terminators would probably allow us to handle most
of these cases. Steve mentioned the EDI format, which allows a document-specified
delimiter to be used with optional whitespace.
How, asked Simon, should we handle the
output value of a complex delimiter? This must come either from the DFDL
schema or from the infoset. Steve suggested we use default properties in
the subelements of the type, and Mike suggested we could similarly use
outputValueCalc. Steve and Mike agreed that a terminator wouldn't be present
in the infoset, by analogy to the similar mechanism used for length prefixes.
Further, this mechanism might allow us to remove a number of properties
related to terminators.
Although Steve had intended this mechanism
to be used with simple types or elements, Mike and Suman thought it would
be appropriate to allow complex types. Elements would allow the use of
the 'default' attribute for use on output. Mike contrasted this to the
prefix length solution, where a simple type is used: the value under prefix
length is treated as an integer, so it is appropriate to handle it as such.
Here, however, we are modelling syntactic constructs. Simon felt that users
will think in terms of elements.
Steve will prepare some examples and
a proposal for inclusion in the "vX+2" draft.
5. Other business
Simon will email the group with some
questions about the UML schema components description.
Meeting closed, 18:10 GMT
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU