Open Grid Forum: Data Format Description
Language Working Group
OGF DFDL Working Group Call, January-13-2010
Attendees
Mike Beckerle (Oco)
Steve Hanson (IBM)
Alan Powell (IBM)
Steve Marting (Progeny)
Stephanie Fetzer (IBM)
Suman Kalia (IBM)
Peter Lambros (IBM)
Tim Kimber(IBM)
Apologies
1. 045 - Disciminators
Stephanie took us through her email Subject: [DFDL-WG] Bob & Steph's
WTX 'Discriminators' write-up
WTX Identifiers are similar to
DFDL discriminators
-Discriminators may only be placed on
the physical representation of a group. That is why we see them on
partition groups and sequence groups but not on choice groups (or unordered
groups – covered below).
In partitioned groups we have a subtype of each possible group – so each
possible group may have a discriminator.
When WTX expresses choice groups it expresses them as a group containing
all of the possible child groups – so at the top level ‘choice group’
there is no component of the actual group content- so no use for a discriminator.
But each choice which may itself be a group may have a discriminator. Choice
groups are special in that the choice model construct simply lists the
components and only one may occur...at this level a discriminator on one
of the choices may not be very useful. Inside of each choice’s components
a discriminator could be used to indicate the existence of that choice.
-The WTX UI does not allow discriminators on the components of Unordered
Groups. This may be due to the fact that the position of the discriminator
has significance (all rules at or above the discriminator must evaluate
to true). If the group is unordered it would be difficult to enforce.
Will need to discuss for DFDL.
-A group may have either zero or one discriminators. No group may
have more than one discriminator.
-The discriminator may have two significant parts
o it’s location (mandatory). The discriminator
is placed on a component of a group and makes all of the cardinality and
rules at that point and above become part of it's concept.
o it’s rule (optional)
A group with a component which has a discriminator should have some ‘rule’
associated with it. In WTX if there is no explicit rule then the implicit
rule is ‘PRESENT($)’. We will need to decide if such implied rules will
be allowed in DFDL.
-A group may only have a discriminator on a mandatory component. Once again,
this impacts a choice group where by definition all components are optional
– which will not have a discriminator.
This has been an issue of debate in WTX. We could have implemented checking
on optional elements quite easily Over the years this has been questioned
(as our UI allows them to be placed on optional elements) but once we explained
the way the engine worked no customers perceived this as a deficiency.
In DFDL we will need to determine if this is needed.
-In WTX we do allow a discriminator to be placed on a mandatory fixed size
array (a repeating mandatory component with n:n cardinality). It’s
component rule can either refer to the entirety of the array (PRESENT($)
meaning the whole of the array is present) or can call out a specific rule
against one if the iterations. This is not done often in practice.
-In WTX it is common to have multiple levels of discriminators when we
are working with nested groups.
We discussed whether DFDL should not
allow discriminators on unordered groups or groups with floating elements.
Agree that discriminators should be allowed
Also discussed whether timing 'before/after'
was required are WTX only has after. Decided to keep timing property.
Suggested should not be allowed on variable
length arrays to be consistent with not being allowed on optional elements.
Mike agreed to write up rules in dfdl
terms and extent to cover other points of uncertainty besides choices.
2. Zero length elements
Steve H took us through his email subject: [DFDL] zero length (was Re:
Fw: TDS length reference) ** updated **
This proposes that zero length fields
should not be a processing error
Proposal:
1. Parsing
Simple elements
1) It is not a schema definition
error nor a processing error if a length is being used to extract
data and it is zero. This covers dfdl:lengthKind implicit, explicit, prefixed
and endOfParent (when parent length is known). The result is 'empty content'.
(Note that for implicit, XSDL allows maxLength/length facet to be 0, so
disallowing it for others is not consistent).
2) It is not a processing error
if scanning for data and the length of the returned bytes is zero. This
applies to dfdl:lengthKind delimited, pattern and endOfParent (when
parent length is not known). The result is 'empty content'. (This is just
stating the obvious).
(The above two rules ensure that it
is possible to apply empty content to trigger optional, nil value or default
value processing regardless of data type and dfdl:lengthKind).
3) Optional, nil and default processing
are applied as per spec.
4) If the element is required, and nil
value or default value is not used, and empty string is not in the lexical
space of the element's type, then it is a processing error.
The two initiator related properties
dfdl:nilValueInitiatorPolicy and dfdl:defaultValueInitiatorPolicy define
whether nils and defaults are applied when initiated empty content is found,
they don't affect the definition of empty content or what it means for
the type.
[Note: If you recall, this discussion
was triggered by a customer that was using an expression to calculate the
length of a standard text decimal. He wanted 0 length to mean 0 ended up
in the infoset. He can achieve this by making the element required with
a default value of 0.]
Complex elements
It is possible to get returned empty
content for a complex element for cases 1) and 2) above.
1) If the complex element is optional
then it is not added to the infoset.
2) If the complex element does not have
an initiator specified & is required then it is added to the infoset.
3) If the element has an initiator specified
then dfdl:defaultValueInitiatorPolicy applies
-
required => element is added to infoset only if initiator is present
(processing error if no initiator & empty content)
-
prohibited => element is added to infoset only if initiator is not present
(initiator implies real content follows so processing error if initiator
& empty content)
4) If the complex element is added to
the infoset, then the parser processes the child content of the complex
type. This may or may not cause a processing error.
<tk>I presume a processing
error would be caused by
- any group having an initiator
or terminator ( same as 5. below )
- any group having a prefix
or postfix delimiter
- any group with more than
one member having an infix delimiter
- any required element within
the complex element having an initiator and dfdl:defaultValueInitiatorPolicy="required"
- any required element within
the complex element having a terminator
- any required element which
does not have a default value specified, and for which a zero-length representation
is illegal
- other error scenarios?
</tk>
<smh>Correct. Basically
you are going through the element's content (model group plus children)
and attempting to parse. When you extract the data you get back empty content.
This may or not cause a processing error. This was agreed on the call as
the correct behaviour. In summary, for empty content to be valid for the
complex element then it must also be valid for at least one content model</smh>
If it doesn't then default value
processing applies for required child elements. If we don't do this then
we will not create default values for all missing required simple elements,
and that would be wrong.
5) If the contained sequence or choice
has an initiator or terminator then it is a processing error.
<tk>
So it's OK to have a choice
among the children of the complex element? If so, the specification should
define the rules for picking a branch of the choice. The DFDL processer
*could* always pick the first branch, but what if the first branch triggers
a processing error and a different branch would not have done?
</tk>
<smh>I think it's the
same as with real content. Parser will start against the first branch of
the choice and see where it gets. Usual speculative parsing rules apply.
If it has not discriminated successfully and a processing error occurs
it will cause backtracking and the next branch will be tried. If it finds
a valid content model for the empty content we are ok. If it doesn't it's
a processing error.</smh>
2. Unparsing
Simple elements
Data in the infoset can result in empty
content being added to the bit stream (ie, nothing), with an accompanying
0 value in any length prefix or length expression field, if appropiate
to the dfdl:lengthKind.
Complex elements
The absence from the infoset of a required
complex element will cause any specified initiator to be output, plus if
there are required children then default values will be output for those
children. If we don't do this then we will not create default values for
nested missing required simple elements, and that would be wrong. This
enables creation of a sparse infoset containing just the elements with
explicit values, with the rest defaulting regardless of nesting.
3. Choices
Worth noting that the concept of 'required'
for the elements of a choice does not apply. Even if minOccurs > 0.
4. Outstanding Issues
Is it ok to reuse dfdl:defaultValueInitiatorPolicy
for complex elements? Should it be renamed? Should we add a separate property
for complex elements?
Steve H to propose new name for dfdl:defaultValueInitiatorPolicy
3. Difference between dfdl:lenghtKind= Delimited and endOfParent
'delimited' means the item is delimited by the item’s terminator (if specified)
or an enclosing construct’s separator or the end
of the enclosing construct designated by its known length
or its terminator.
the only difference with dfdl:lentghKind='endOfParent' is that the
latter includes the 'end of the data stream' and applies to binary fields.
We should either
- Add 'end of data stream' to delimited
and remove 'endOfParent'
- Make 'endOfParent' be specifically for
only 'end of data stream'
Short discussion. Alan agreed to try
to write up description of endOfParent for review
4. Go through remaining actions
No enough time
5 Draft 037 review
From comments:
a DFDL Subset of XML Schema
(TBD:
need means for an implementation to indicate it is using non-standard extensions?)
Believe that this was to allow users to indicate they are using unsupported
schema components. Agreed to defer fron DFDL v1
b. Question whether infoset MUST be in
schema order. Request for 'bitstream order'
Short discussion. Main reason for schema order is allow the infoset to
be validated against a schema. Agree to leave as schema order
c. Dealing with 'Grammar ambiguity' errors
Not discussed
6 Review Schedule
Activity
|
|
Schedule
|
Who
|
Complete Action items
|
|
- 18 Dec 2009
| WG
|
Complete Spec
| Write up work items
|
– 23 Dec 2009
| AP
|
Restructure and complete specification
|
- 23 Dec 2009
| AP
|
Issue Draft 038
|
23 Dec 2009
|
|
WG review
| WG review
|
7 Dec – 08 Jan 2010
| WG
|
Incorporate review comments
|
4 Jan - 29 Jan 2010
| AP +
|
Issue Draft 039
|
15 Jan 2010
|
|
Incorporate review comments
|
4 Jan - 29 Jan 2010
| AP +
|
Issue Draft 040
|
29 Jan 2010
|
|
Initial OGF Editor Review
| Initial Editor review
|
1 Feb - 1 Mar 2010
| OGF
|
Initial GFSG review
|
1 Feb - 1 Mar 2010
|
|
Issue Draft 041
|
1 Mar 2010
|
|
OGF Public Comment period (60 days)
|
|
1 Mar - 30 Apr 2010
| OGF
|
OGF 28 Munich
|
|
15-19 March 2010
|
|
Incorporate comments
| Incorporate comments
|
28 May 2010
|
|
Issue Draft 042
|
28 May 2010
|
|
Final OGF Editor Review
| Final Editor review
|
June 2010
| OGF
|
final GFSG review
|
June 2010
|
|
Issue Final specification
|
30 June 2010
|
|
Publish proposed recommendation
|
|
1 July 2010
|
|
|
|
|
|
Grid recommendation process
|
|
1 Jan - 1 April 2011
| |
Meeting closed, 15:20
Next call 20 January 2010 13:00
UK
Next action: 074
Actions raised at this meeting
Current Actions:
No
| Action
|
|
|
045
| 20/05 AP: Speculative Parsing
27/05: Psuedo code has been circulated.
Review for next call
03/06: Comments received and will be
incorporated
09/06: Progress but not discussed
17/06: Discussed briefly
24/06: No Progress
01/07: No Progress
15/07: No progress. MB not happy with
the way the algorithm is documented, need to find a better way.
29/07: No Progress
05/08: No Progress. Will document behaviour
as a set of rules.
12/08: No Progress
...
16/09: no progress
30/09: AP distributed proposal and others
commented. Brief discussion AP to incorporate update and reissue
07/10: Updated proposal was discussed.Comments
will be incorporated into the next version.
14/10: Alan to update proposal to include
array scenario where minOccurs > 0
21/10: Updated proposal reviewed
28/10: Updated proposal reviewed see
minutes
04/11: Discussed semantics of disciminators
on arrays. MB to produce examples
11/11: Absorbing action 033 into 045.
Maybe decorated discrminator kinds are needed after all. MB and SF
to continue with examples.
18/11: Went through WTX implementation
of example. SF to gather more documentation about WTX discriminator rules.
25/11: Further discussion. Will get
more WTX documentation. Need to confirm that no changes need to Resolving
Uncertainty doc.
04/11: Further discussion about arrays.
09/12: Reviewed proposed discriminator
semantic.
16/12: Reviewed discriminator examples
and WTX semantic.
23/12: SF to provide better description
of WTX behaviour and invite B Connolley to next call
06/01:B Connolly not available. SF to
provide more complete description.
13/01: Stephaine took us through a description
of WTX identifiers. Mike agreed to write up in DFDL terms.
|
049
| 20/05 AP Built-in specification description
and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these
from test cases)
15/07: No progress. Once available,
the examples in the spec should use the dfdl:defineFormat annotations they
provide.
...
14/10: no progress
21/10: Discussed the real need for this
being in the specification. It seemed that the main value is it define
a schema location for downloading 'known' defaults from the web.
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for
CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource
to complete this action it can be deferred
13/01:no progess
|
064
| MB/SH Request WG presentation at OGF
28
25/11: Session requested
04/12: no update
09/12: no update
16/12: SH has changed request to a general
session rather tha WG in the hope of attracting more people.
23/12: no update
06/01: not heard anything yet
13/01: no update
|
066
| Investigate format for defining test
cases
25/11:IBM to see if it is possible to
publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
|
068
| Should the roots of messages be designated.?
09/12: Yes. New dfdl:documentRoot property
Closed
16/12: reopened and decided to drop
property subject to agreement from SKK and SF
23/12: SKK review decision to drop dfdl:documentRoot
13/01: closed
|
071
| Semantics of length=0, nil handling
and defaults.
23/12:SH no update
06/01: SH has started
13/01: SH proposal review. Minor updates
to be made
|
073
| SH: Control of overpunching zoned positive
sign
13/01: no update
|
| |
Closed actions
No
| Action
|
056
| MB Resolve lengthUnits=bits including
fillbytes
12/08: No Progress
...
28/10: no progress
04/11: MB to look at lengthUnits = bits
11/11: no update
18/11: no update
25/11: no update
04/12: no update. ALan will set up a
separate call to progress this action.
09/12: no update. ALan will set up a
separate call to progress this action.
16/12: MB, SH and AP had a separate
call. MB to distribute proposal
23/12: Discussed proposal. MB will updated
06/01: V4 discussed and approved
13/01: Mike updated proposal. Closed
|
068
| Should the roots of messages be designated.?
09/12: Yes. New dfdl:documentRoot property
Closed
16/12: reopened and decided to drop
property subject to agreement from SKK and SF
23/12: SKK review decision to drop dfdl:documentRoot
13/01: closed
|
|
|
| |
Work items:
No
| Item
| target version
| status
|
005
| Improvements on property descriptions
|
| not started
|
011
| How speculative parsing works (combining
choice and variable-occurence - currently these are separate) (from
action 045)
|
| awaiting completion of actions 045
|
012
| Reordering the properties discussion: move
representation earlier, improve flow of topics
|
| not started
|
036
| Update dfdl schema with change properties
| ongoing
|
|
038
| Improve length section including bit
handling
|
| some improvement in 036
|
042
| Mapping of the DFDL infoset to XDM
| none
| not required for V1 specification
|
069
| ICU fractional seconds
|
|
|
070
| Write DFDL primer
|
|
|
071
| Write test cases.
|
|
|
072
| it is a processing error if the number
of occurrences in the data does not match the value of the expression or
prefix
|
|
|
073
| Rename dfdl:separatorPolicy="required"
to "always".
|
|
|
074
| - Last 'postFix' separator is not optional
- Terminators are mandatory.
- dfdl:documentFinalTerminatorCanBeMissing
- dfdl:documentFinalSeparatorCanBeMissing
(Action (70))
|
|
|
075
| Remove occursCountKind="useAvailableSpace".
|
|
|
076
| dfdl:documentRoot, will
be defined that can only be on global elements.
The DFDL spec does not have to define
the format of parameters to the DFDL processor but will indicate that it
must be possible to adresss any element.
Agreed that ANY element within the schema
cane be the starting point for parsing or unparsing.
dfdl:documentRoot no longer required
|
|
|
077
| 'delimited'
means the item is delimited by the item’s terminator (if specified) or
an enclosing construct’s separator or end of the enclosing construct designated
by its known length or its terminator.
The definition of EndOfParent also needs
improving.
|
|
|
078
| document UPA checks
|
|
|
079
| Restrictions on use of 'special'
entities in regular expressions
|
|
|
080
| LengthUnit=bits (A056)
|
|
|
|
|
|
|
|
|
| |
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU