A small correction. with thanks to Simon
- it was Steve (rather than Simon) who had previously attracted a reasonable
audience at the OGF conference.
Ian
Open Grid Forum: Data Format Description
Language Working Group
Weekly Working Group Conference Call
17:00 GMT, 23 Jan 2008
Attendees
Mike Beckerle (Oco)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)
Apologies
Steve Hanson (IBM), Suman Kalia (IBM)
1. OGF22
The DFDL session at OGF22 is now booked
for the Monday afternoon, and Mike has registered to attend. Mike will
present our updated status, and Alan promised to upload the last set of
presented slides to GridForge so that Mike can update them. Alan asked
whether we should attempt to drum up interest in the DFDL session to encourage
attendence; Simon thought that advertising may not make much difference
and that Steve had a reasonable audience when he presented.
2. Specification drafts
Steve and Alan had previously assigned
ownership of individual items from Mike's plan of contents for the next
few drafts. Alan will assemble the next draft, due at the end of the month,
and asked for input as soon as possible.
Looking at the plan for the next, "vX+1",
draft, the group reported the following status:
- Nulls/default/optionals - Mike
reported no update.
- Description of schema components
- Simon is still working on this.
- Regular expressions for lengths
- Alan reported no progress.
- Expression language - Alan will
shortly distribute a new version of the proposal for review.
- valueCalc - Mike is still to
write this.
- Property precedence - Following
a discussion on the call last week, please provide review comments. Mike
will add this to the agenda for next week.
- Entities - Alan's recent proposal
is to be discussed on the current call.
- White space handling - Discussion
is ongoing, and Steve is to make a proposal.
The plan calls for subsequent versions
of the specification, including the following items with status:
- Supplements - Steve is working
to update the supplements
- Speculative parsing - IBM has
internally been discussing and reviewing WTX function, though no documentation
presently exists covering this.
3. UML diagrams
Simon is revising the UML diagrams which
describe the DFDL schema components. The previous meeting minutes included
a number of comments on these diagrams, and the group took this opportunity
to look at some of those comments:
"...I think it would be better
to use the open source XML schema model as source model and show relationship
of DFDL Annotations attached to the XSD schema model" - Mike noted
that DFDL makes use of annotations on objects which are absent from the
XSD schema model, and hence that it may be unnatural to base the DFDL schema
model directly on the XSD model. Simon suggested that it would be cleanest
to describe a modified version the XSD model including those XSD elements
that we need to annotate, and use this as a basis for the DFDL model.
"The current diagram suggests
that 'variable definition' can both be part of a format base or as a standalone
annotation (outside of a format). Is this true?" - Mike suggested
that variable definitions don't have to be part of a format block: so,
yes, this is true.
Mike agreed to respond further to the
set of comments by email.
4. Review of Entities proposal
Alan has distributed a proposal covering
entities in DFDL, intended to allow characters which are disallowed by
XML1.0 (or XML1.1) to be included in DFDL schemas. These follow a similar
syntax to XML, using % instead of & as an escape, with an additional
mechanism for specifying raw data. This latter is intended to supplant
the escaping mechanism described in current versions of the specification
(which also uses % as an escape).
The group felt that the description
of the raw data entities should not be cast in terms of characters and
character sets, but rather in terms of bytes. If treated as characters,
schemas may need to be written when moving from single-byte to double-byte
character sets; further, this incorrectly implies some codepage conversion
is involved.
The proposal also introduces a list
of predefined names for certain common control characters. Mike asked whether
these are the existing XML names - Alan replied that XML does not define
names for control characters.
Ian asked how we should represent the
literal % character in strings given this form of escaping. The present
draft of the specification uses "%%" to handle this; Simon suggested
a string like "%pc;". The meeting felt that %% might be marginally
preferable.
Finally, the proposal defines some labels
which aim to reduce the complexity of dealing with whitespace and newlines.
The %NL; entity represents a newline on "the target platform"
- Mike observed that DFDL presently does not have a concept of a target
platform. Alan felt it important that a single DFDL schema be able to generate
output documents targetted at different platforms. Mike proposed that we
introduce a new property, "generatedNewLine", which describes
the meaning of %NL; during unparse, and that %NL; should be tolerant of
any common new line representation during parse. The group discussed whether
this could instead be handled using a list of optional new line values,
however this would not support schema portability. Simon suggested we introduce
another new property to mean that %NL; should be the conventional new line
representation on the platform on which an engine is running, however Mike
pointed out that this simply requires appropriate configuration of the
generatedNewLine property.
%WSP; and %OWSP; are introduced to mean
any whitespace, and optional whitespace. This will be useful in describing
some formats which allow arbitrary whitespace, such as MIME. Mike pointed
out that we could model such whitespace using hidden fields, but that these
entities may make a schema clearer. PolarLake have found that only one
such label is necessary, which means, "one or more whitespace characters",
and that this needs only to be made available as a delimiter - Mike agreed
that this label may represent a special type of delimiter rather than a
general purpose entity. Alan would like to work through the potential use
cases to see if we can restrict it in this fashion, and will update the
proposal to specify that these relate to just one character. Simon suggested
we could introduce an extra label, perhaps %WPS*; to match multiple whitespace
characters.
Meeting closed, 18:15
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU