January 2010 - dfdl-wg - lists.ogf.org

Minutes for OGF DFDL Working Group Call, January-13-2010
by Alan Powell 15 Jan '10

15 Jan '10

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, January-13-2010 Attendees Mike Beckerle (Oco) Steve Hanson (IBM) Alan Powell (IBM) Steve Marting (Progeny) Stephanie Fetzer (IBM) Suman Kalia (IBM) Peter Lambros (IBM) Tim Kimber(IBM) Apologies 1. 045 - Disciminators Stephanie took us through her email Subject: [DFDL-WG] Bob & Steph's WTX 'Discriminators' write-up WTX Identifiers are similar to DFDL discriminators -Discriminators may only be placed on the physical representation of a group. That is why we see them on partition groups and sequence groups but not on choice groups (or unordered groups ? covered below). In partitioned groups we have a subtype of each possible group ? so each possible group may have a discriminator. When WTX expresses choice groups it expresses them as a group containing all of the possible child groups ? so at the top level ?choice group? there is no component of the actual group content- so no use for a discriminator. But each choice which may itself be a group may have a discriminator. Choice groups are special in that the choice model construct simply lists the components and only one may occur...at this level a discriminator on one of the choices may not be very useful. Inside of each choice?s components a discriminator could be used to indicate the existence of that choice. -The WTX UI does not allow discriminators on the components of Unordered Groups. This may be due to the fact that the position of the discriminator has significance (all rules at or above the discriminator must evaluate to true). If the group is unordered it would be difficult to enforce. Will need to discuss for DFDL. -A group may have either zero or one discriminators. No group may have more than one discriminator. -The discriminator may have two significant parts o it?s location (mandatory). The discriminator is placed on a component of a group and makes all of the cardinality and rules at that point and above become part of it's concept. o it?s rule (optional) A group with a component which has a discriminator should have some ?rule? associated with it. In WTX if there is no explicit rule then the implicit rule is ?PRESENT($)?. We will need to decide if such implied rules will be allowed in DFDL. -A group may only have a discriminator on a mandatory component. Once again, this impacts a choice group where by definition all components are optional ? which will not have a discriminator. This has been an issue of debate in WTX. We could have implemented checking on optional elements quite easily Over the years this has been questioned (as our UI allows them to be placed on optional elements) but once we explained the way the engine worked no customers perceived this as a deficiency. In DFDL we will need to determine if this is needed. -In WTX we do allow a discriminator to be placed on a mandatory fixed size array (a repeating mandatory component with n:n cardinality). It?s component rule can either refer to the entirety of the array (PRESENT($) meaning the whole of the array is present) or can call out a specific rule against one if the iterations. This is not done often in practice. -In WTX it is common to have multiple levels of discriminators when we are working with nested groups. We discussed whether DFDL should not allow discriminators on unordered groups or groups with floating elements. Agree that discriminators should be allowed Also discussed whether timing 'before/after' was required are WTX only has after. Decided to keep timing property. Suggested should not be allowed on variable length arrays to be consistent with not being allowed on optional elements. Mike agreed to write up rules in dfdl terms and extent to cover other points of uncertainty besides choices. 2. Zero length elements Steve H took us through his email subject: [DFDL] zero length (was Re: Fw: TDS length reference) ** updated ** This proposes that zero length fields should not be a processing error Proposal: 1. Parsing Simple elements 1) It is not a schema definition error nor a processing error if a length is being used to extract data and it is zero. This covers dfdl:lengthKind implicit, explicit, prefixed and endOfParent (when parent length is known). The result is 'empty content'. (Note that for implicit, XSDL allows maxLength/length facet to be 0, so disallowing it for others is not consistent). 2) It is not a processing error if scanning for data and the length of the returned bytes is zero. This applies to dfdl:lengthKind delimited, pattern and endOfParent (when parent length is not known). The result is 'empty content'. (This is just stating the obvious). (The above two rules ensure that it is possible to apply empty content to trigger optional, nil value or default value processing regardless of data type and dfdl:lengthKind). 3) Optional, nil and default processing are applied as per spec. 4) If the element is required, and nil value or default value is not used, and empty string is not in the lexical space of the element's type, then it is a processing error. The two initiator related properties dfdl:nilValueInitiatorPolicy and dfdl:defaultValueInitiatorPolicy define whether nils and defaults are applied when initiated empty content is found, they don't affect the definition of empty content or what it means for the type. [Note: If you recall, this discussion was triggered by a customer that was using an expression to calculate the length of a standard text decimal. He wanted 0 length to mean 0 ended up in the infoset. He can achieve this by making the element required with a default value of 0.] Complex elements It is possible to get returned empty content for a complex element for cases 1) and 2) above. 1) If the complex element is optional then it is not added to the infoset. 2) If the complex element does not have an initiator specified & is required then it is added to the infoset. 3) If the element has an initiator specified then dfdl:defaultValueInitiatorPolicy applies - required => element is added to infoset only if initiator is present (processing error if no initiator & empty content) - prohibited => element is added to infoset only if initiator is not present (initiator implies real content follows so processing error if initiator & empty content) 4) If the complex element is added to the infoset, then the parser processes the child content of the complex type. This may or may not cause a processing error. <tk>I presume a processing error would be caused by - any group having an initiator or terminator ( same as 5. below ) - any group having a prefix or postfix delimiter - any group with more than one member having an infix delimiter - any required element within the complex element having an initiator and dfdl:defaultValueInitiatorPolicy="required" - any required element within the complex element having a terminator - any required element which does not have a default value specified, and for which a zero-length representation is illegal - other error scenarios? </tk> <smh>Correct. Basically you are going through the element's content (model group plus children) and attempting to parse. When you extract the data you get back empty content. This may or not cause a processing error. This was agreed on the call as the correct behaviour. In summary, for empty content to be valid for the complex element then it must also be valid for at least one content model</smh> If it doesn't then default value processing applies for required child elements. If we don't do this then we will not create default values for all missing required simple elements, and that would be wrong. 5) If the contained sequence or choice has an initiator or terminator then it is a processing error. <tk> So it's OK to have a choice among the children of the complex element? If so, the specification should define the rules for picking a branch of the choice. The DFDL processer *could* always pick the first branch, but what if the first branch triggers a processing error and a different branch would not have done? </tk> <smh>I think it's the same as with real content. Parser will start against the first branch of the choice and see where it gets. Usual speculative parsing rules apply. If it has not discriminated successfully and a processing error occurs it will cause backtracking and the next branch will be tried. If it finds a valid content model for the empty content we are ok. If it doesn't it's a processing error.</smh> 2. Unparsing Simple elements Data in the infoset can result in empty content being added to the bit stream (ie, nothing), with an accompanying 0 value in any length prefix or length expression field, if appropiate to the dfdl:lengthKind. Complex elements The absence from the infoset of a required complex element will cause any specified initiator to be output, plus if there are required children then default values will be output for those children. If we don't do this then we will not create default values for nested missing required simple elements, and that would be wrong. This enables creation of a sparse infoset containing just the elements with explicit values, with the rest defaulting regardless of nesting. 3. Choices Worth noting that the concept of 'required' for the elements of a choice does not apply. Even if minOccurs > 0. 4. Outstanding Issues Is it ok to reuse dfdl:defaultValueInitiatorPolicy for complex elements? Should it be renamed? Should we add a separate property for complex elements? Steve H to propose new name for dfdl:defaultValueInitiatorPolicy 3. Difference between dfdl:lenghtKind= Delimited and endOfParent 'delimited' means the item is delimited by the item?s terminator (if specified) or an enclosing construct?s separator or the end of the enclosing construct designated by its known length or its terminator. the only difference with dfdl:lentghKind='endOfParent' is that the latter includes the 'end of the data stream' and applies to binary fields. We should either Add 'end of data stream' to delimited and remove 'endOfParent' Make 'endOfParent' be specifically for only 'end of data stream' Short discussion. Alan agreed to try to write up description of endOfParent for review 4. Go through remaining actions No enough time 5 Draft 037 review >From comments: a DFDL Subset of XML Schema (TBD: need means for an implementation to indicate it is using non-standard extensions?) Believe that this was to allow users to indicate they are using unsupported schema components. Agreed to defer fron DFDL v1 b. Question whether infoset MUST be in schema order. Request for 'bitstream order' Short discussion. Main reason for schema order is allow the infoset to be validated against a schema. Agree to leave as schema order c. Dealing with 'Grammar ambiguity' errors Not discussed 6 Review Schedule Activity Schedule Who Complete Action items - 18 Dec 2009 WG Complete Spec Write up work items ? 23 Dec 2009 AP Restructure and complete specification - 23 Dec 2009 AP Issue Draft 038 23 Dec 2009 WG review WG review 7 Dec ? 08 Jan 2010 WG Incorporate review comments 4 Jan - 29 Jan 2010 AP + Issue Draft 039 15 Jan 2010 Incorporate review comments 4 Jan - 29 Jan 2010 AP + Issue Draft 040 29 Jan 2010 Initial OGF Editor Review Initial Editor review 1 Feb - 1 Mar 2010 OGF Initial GFSG review 1 Feb - 1 Mar 2010 Issue Draft 041 1 Mar 2010 OGF Public Comment period (60 days) 1 Mar - 30 Apr 2010 OGF OGF 28 Munich 15-19 March 2010 Incorporate comments Incorporate comments 28 May 2010 Issue Draft 042 28 May 2010 Final OGF Editor Review Final Editor review June 2010 OGF final GFSG review June 2010 Issue Final specification 30 June 2010 Publish proposed recommendation 1 July 2010 Grid recommendation process 1 Jan - 1 April 2011 Meeting closed, 15:20 Next call 20 January 2010 13:00 UK Next action: 074 Actions raised at this meeting No Action Current Actions: No Action 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms. 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progess 064 MB/SH Request WG presentation at OGF 28 25/11: Session requested 04/12: no update 09/12: no update 16/12: SH has changed request to a general session rather tha WG in the hope of attracting more people. 23/12: no update 06/01: not heard anything yet 13/01: no update 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 068 Should the roots of messages be designated.? 09/12: Yes. New dfdl:documentRoot property Closed 16/12: reopened and decided to drop property subject to agreement from SKK and SF 23/12: SKK review decision to drop dfdl:documentRoot 13/01: closed 071 Semantics of length=0, nil handling and defaults. 23/12:SH no update 06/01: SH has started 13/01: SH proposal review. Minor updates to be made 073 SH: Control of overpunching zoned positive sign 13/01: no update Closed actions No Action 056 MB Resolve lengthUnits=bits including fillbytes 12/08: No Progress ... 28/10: no progress 04/11: MB to look at lengthUnits = bits 11/11: no update 18/11: no update 25/11: no update 04/12: no update. ALan will set up a separate call to progress this action. 09/12: no update. ALan will set up a separate call to progress this action. 16/12: MB, SH and AP had a separate call. MB to distribute proposal 23/12: Discussed proposal. MB will updated 06/01: V4 discussed and approved 13/01: Mike updated proposal. Closed 068 Should the roots of messages be designated.? 09/12: Yes. New dfdl:documentRoot property Closed 16/12: reopened and decided to drop property subject to agreement from SKK and SF 23/12: SKK review decision to drop dfdl:documentRoot 13/01: closed Work items: No Item target version status 005 Improvements on property descriptions not started 011 How speculative parsing works (combining choice and variable-occurence - currently these are separate) (from action 045) awaiting completion of actions 045 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 038 Improve length section including bit handling some improvement in 036 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 069 ICU fractional seconds 070 Write DFDL primer 071 Write test cases. 072 it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix 073 Rename dfdl:separatorPolicy="required" to "always". 074 - Last 'postFix' separator is not optional - Terminators are mandatory. - dfdl:documentFinalTerminatorCanBeMissing - dfdl:documentFinalSeparatorCanBeMissing (Action (70)) 075 Remove occursCountKind="useAvailableSpace". 076 dfdl:documentRoot, will be defined that can only be on global elements. The DFDL spec does not have to define the format of parameters to the DFDL processor but will indicate that it must be possible to adresss any element. Agreed that ANY element within the schema cane be the starting point for parsing or unparsing. dfdl:documentRoot no longer required 077 'delimited' means the item is delimited by the item?s terminator (if specified) or an enclosing construct?s separator or end of the enclosing construct designated by its known length or its terminator. The definition of EndOfParent also needs improving. 078 document UPA checks 079 Restrictions on use of 'special' entities in regular expressions 080 LengthUnit=bits (A056) Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

2 1

Unparsing dfdl:lenghtKind='pattern'
by Alan Powell 14 Jan '10

14 Jan '10

The Specification currently says that on unparsing dfdl:lengthKind='pattern' behaves like dfdl:lenghtKind='implict' but since we limited 'implict' to certain logical/representation combinations this is no longer correct. Proposals 1. When unparsing complex elements with dfdl:lengthKind='pattern', the length is the length of the children (same as 'implicit') When unparsing simple elements with dfdl:lengthKind='pattern' the length is the implicit length for those that have one and length of the data supplied in the infoset converted to the representation with no padding/filling for the rest. For string/text and hexbinary/binary that is reasonable For number/text/standard and zoned it is governed by the numberPattern For number/binary/ binary use 'implicit' lengths For number/binary/packed or bcd use minimum number of bytes For Calendar/text it is governed by the calendarpattern For Calendar/binary/packed or bcd use minimum number of bytes For Calendar/binary/binarymilliseconds or binaryseconds use implicit lengths For Boolean/text use the length of the true/false rep. For Boolean/binary use implicit length (32) 2. A new property dfdl:patternOutputLengthKind 3. We could limit dfdl:lenghtKind 'pattern' to complex elements so that there is always a lengthKind for child simple elements but this would introduce 'unnecessary' complex items in the infoset when there is only one child. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Re: [DFDL-WG] dfdl-wg Digest, Vol 41, Issue 15
by Tim Kimber 14 Jan '10

14 Jan '10

A correction re: the meaning of lengthKind="delimited". Current text: 'delimited' means the item is delimited by the item’s terminator (if specified) or an enclosing construct’s separator or the end of the enclosing construct designated by its known length or its terminator. the only difference with dfdl:lentghKind='endOfParent' is that the latter includes the 'end of the data stream' and applies to binary fields. My proposed text: 'delimited' means the item is delimited by any of - the item’s terminator (if specified) - an enclosing construct’s separator or terminator - the end of an enclosing construct designated by its known length - the end of the data stream If applied to a field with representation='text', lengthKind='endOfParent' means the same as 'delimited' If applied to a field with representation='binary', lengthKind='endOfParent' means the item is terminated by either of - the end of an enclosing construct designated by its known length - the end of the data stream We must not exclude the 'end of data stream' case from delimited, otherwise some fairly basic infix-delimited scenarios will not work. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert(a)uk.ibm.com Tel. 01962-816742 Internal tel. 246742 From: dfdl-wg-request(a)ogf.org To: dfdl-wg(a)ogf.org Date: 14/01/2010 11:04 Subject: dfdl-wg Digest, Vol 41, Issue 15 Send dfdl-wg mailing list submissions to dfdl-wg(a)ogf.org To subscribe or unsubscribe via the World Wide Web, visit http://www.ogf.org/mailman/listinfo/dfdl-wg or, via email, send a message with subject or body 'help' to dfdl-wg-request(a)ogf.org You can reach the person managing the list at dfdl-wg-owner(a)ogf.org When replying, please edit your Subject line so it is more specific than "Re: Contents of dfdl-wg digest..." Today's Topics: 1. Minutes for OGF DFDL Working Group Call, January-13-2010 (Alan Powell) ----- Message from Alan Powell <alan_powell(a)uk.ibm.com> on Thu, 14 Jan 2010 11:03:42 +0000 ----- To: dfdl-wg(a)ogf.org Subject: [DFDL-WG] Minutes for OGF DFDL Working Group Call, January-13-2010 Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, January-13-2010 Attendees Mike Beckerle (Oco) Steve Hanson (IBM) Alan Powell (IBM) Steve Marting (Progeny) Stephanie Fetzer (IBM) Suman Kalia (IBM) Peter Lambros (IBM) Tim Kimber(IBM) Apologies 1. 045 - Disciminators Stephanie took us through her email Subject: [DFDL-WG] Bob & Steph's WTX 'Discriminators' write-up WTX Identifiers are similar to DFDL discriminators -Discriminators may only be placed on the physical representation of a group. That is why we see them on partition groups and sequence groups but not on choice groups (or unordered groups – covered below). In partitioned groups we have a subtype of each possible group – so each possible group may have a discriminator. When WTX expresses choice groups it expresses them as a group containing all of the possible child groups – so at the top level ‘choice group’ there is no component of the actual group content- so no use for a discriminator. But each choice which may itself be a group may have a discriminator. Choice groups are special in that the choice model construct simply lists the components and only one may occur...at this level a discriminator on one of the choices may not be very useful. Inside of each choice’s components a discriminator could be used to indicate the existence of that choice. -The WTX UI does not allow discriminators on the components of Unordered Groups. This may be due to the fact that the position of the discriminator has significance (all rules at or above the discriminator must evaluate to true). If the group is unordered it would be difficult to enforce. Will need to discuss for DFDL. -A group may have either zero or one discriminators. No group may have more than one discriminator. -The discriminator may have two significant parts o it’s location (mandatory). The discriminator is placed on a component of a group and makes all of the cardinality and rules at that point and above become part of it's concept. o it’s rule (optional) A group with a component which has a discriminator should have some ‘rule’ associated with it. In WTX if there is no explicit rule then the implicit rule is ‘PRESENT($)’. We will need to decide if such implied rules will be allowed in DFDL. -A group may only have a discriminator on a mandatory component. Once again, this impacts a choice group where by definition all components are optional – which will not have a discriminator. This has been an issue of debate in WTX. We could have implemented checking on optional elements quite easily Over the years this has been questioned (as our UI allows them to be placed on optional elements) but once we explained the way the engine worked no customers perceived this as a deficiency. In DFDL we will need to determine if this is needed. -In WTX we do allow a discriminator to be placed on a mandatory fixed size array (a repeating mandatory component with n:n cardinality). It’s component rule can either refer to the entirety of the array (PRESENT($) meaning the whole of the array is present) or can call out a specific rule against one if the iterations. This is not done often in practice. -In WTX it is common to have multiple levels of discriminators when we are working with nested groups. We discussed whether DFDL should not allow discriminators on unordered groups or groups with floating elements. Agree that discriminators should be allowed Also discussed whether timing 'before/after' was required are WTX only has after. Decided to keep timing property. Suggested should not be allowed on variable length arrays to be consistent with not being allowed on optional elements. Mike agreed to write up rules in dfdl terms and extent to cover other points of uncertainty besides choices. 2. Zero length elements Steve H took us through his email subject: [DFDL] zero length (was Re: Fw: TDS length reference) ** updated ** This proposes that zero length fields should not be a processing error Proposal: 1. Parsing Simple elements 1) It is not a schema definition error nor a processing error if a length is being used to extract data and it is zero. This covers dfdl:lengthKind implicit, explicit, prefixed and endOfParent (when parent length is known). The result is 'empty content'. (Note that for implicit, XSDL allows maxLength/length facet to be 0, so disallowing it for others is not consistent). 2) It is not a processing error if scanning for data and the length of the returned bytes is zero. This applies to dfdl:lengthKind delimited, pattern and endOfParent (when parent length is not known). The result is 'empty content'. (This is just stating the obvious). (The above two rules ensure that it is possible to apply empty content to trigger optional, nil value or default value processing regardless of data type and dfdl:lengthKind). 3) Optional, nil and default processing are applied as per spec. 4) If the element is required, and nil value or default value is not used, and empty string is not in the lexical space of the element's type, then it is a processing error. The two initiator related properties dfdl:nilValueInitiatorPolicy and dfdl:defaultValueInitiatorPolicy define whether nils and defaults are applied when initiated empty content is found, they don't affect the definition of empty content or what it means for the type. [Note: If you recall, this discussion was triggered by a customer that was using an expression to calculate the length of a standard text decimal. He wanted 0 length to mean 0 ended up in the infoset. He can achieve this by making the element required with a default value of 0.] Complex elements It is possible to get returned empty content for a complex element for cases 1) and 2) above. 1) If the complex element is optional then it is not added to the infoset. 2) If the complex element does not have an initiator specified & is required then it is added to the infoset. 3) If the element has an initiator specified then dfdl:defaultValueInitiatorPolicy applies - required => element is added to infoset only if initiator is present (processing error if no initiator & empty content) - prohibited => element is added to infoset only if initiator is not present (initiator implies real content follows so processing error if initiator & empty content) 4) If the complex element is added to the infoset, then the parser processes the child content of the complex type. This may or may not cause a processing error. <tk>I presume a processing error would be caused by - any group having an initiator or terminator ( same as 5. below ) - any group having a prefix or postfix delimiter - any group with more than one member having an infix delimiter - any required element within the complex element having an initiator and dfdl:defaultValueInitiatorPolicy="required" - any required element within the complex element having a terminator - any required element which does not have a default value specified, and for which a zero-length representation is illegal - other error scenarios? </tk> <smh>Correct. Basically you are going through the element's content (model group plus children) and attempting to parse. When you extract the data you get back empty content. This may or not cause a processing error. This was agreed on the call as the correct behaviour. In summary, for empty content to be valid for the complex element then it must also be valid for at least one content model</smh> If it doesn't then default value processing applies for required child elements. If we don't do this then we will not create default values for all missing required simple elements, and that would be wrong. 5) If the contained sequence or choice has an initiator or terminator then it is a processing error. <tk> So it's OK to have a choice among the children of the complex element? If so, the specification should define the rules for picking a branch of the choice. The DFDL processer *could* always pick the first branch, but what if the first branch triggers a processing error and a different branch would not have done? </tk> <smh>I think it's the same as with real content. Parser will start against the first branch of the choice and see where it gets. Usual speculative parsing rules apply. If it has not discriminated successfully and a processing error occurs it will cause backtracking and the next branch will be tried. If it finds a valid content model for the empty content we are ok. If it doesn't it's a processing error.</smh> 2. Unparsing Simple elements Data in the infoset can result in empty content being added to the bit stream (ie, nothing), with an accompanying 0 value in any length prefix or length expression field, if appropiate to the dfdl:lengthKind. Complex elements The absence from the infoset of a required complex element will cause any specified initiator to be output, plus if there are required children then default values will be output for those children. If we don't do this then we will not create default values for nested missing required simple elements, and that would be wrong. This enables creation of a sparse infoset containing just the elements with explicit values, with the rest defaulting regardless of nesting. 3. Choices Worth noting that the concept of 'required' for the elements of a choice does not apply. Even if minOccurs > 0. 4. Outstanding Issues Is it ok to reuse dfdl:defaultValueInitiatorPolicy for complex elements? Should it be renamed? Should we add a separate property for complex elements? Steve H to propose new name for dfdl:defaultValueInitiatorPolicy 3. Difference between dfdl:lenghtKind= Delimited and endOfParent 'delimited' means the item is delimited by the item’s terminator (if specified) or an enclosing construct’s separator or the end of the enclosing construct designated by its known length or its terminator. the only difference with dfdl:lentghKind='endOfParent' is that the latter includes the 'end of the data stream' and applies to binary fields. We should either Add 'end of data stream' to delimited and remove 'endOfParent' Make 'endOfParent' be specifically for only 'end of data stream' Short discussion. Alan agreed to try to write up description of endOfParent for review 4. Go through remaining actions No enough time 5 Draft 037 review From comments: a DFDL Subset of XML Schema (TBD: need means for an implementation to indicate it is using non-standard extensions?) Believe that this was to allow users to indicate they are using unsupported schema components. Agreed to defer fron DFDL v1 b. Question whether infoset MUST be in schema order. Request for 'bitstream order' Short discussion. Main reason for schema order is allow the infoset to be validated against a schema. Agree to leave as schema order c. Dealing with 'Grammar ambiguity' errors Not discussed 6 Review Schedule Activity Schedule Who Complete Action items - 18 Dec 2009 WG Complete Spec Write up work items – 23 Dec 2009 AP Restructure and complete specification - 23 Dec 2009 AP Issue Draft 038 23 Dec 2009 WG review WG review 7 Dec – 08 Jan 2010 WG Incorporate review comments 4 Jan - 29 Jan 2010 AP + Issue Draft 039 15 Jan 2010 Incorporate review comments 4 Jan - 29 Jan 2010 AP + Issue Draft 040 29 Jan 2010 Initial OGF Editor Review Initial Editor review 1 Feb - 1 Mar 2010 OGF Initial GFSG review 1 Feb - 1 Mar 2010 Issue Draft 041 1 Mar 2010 OGF Public Comment period (60 days) 1 Mar - 30 Apr 2010 OGF OGF 28 Munich 15-19 March 2010 Incorporate comments Incorporate comments 28 May 2010 Issue Draft 042 28 May 2010 Final OGF Editor Review Final Editor review June 2010 OGF final GFSG review June 2010 Issue Final specification 30 June 2010 Publish proposed recommendation 1 July 2010 Grid recommendation process 1 Jan - 1 April 2011 Meeting closed, 15:20 Next call 20 January 2010 13:00 UK Next action: 074 Actions raised at this meeting No Action Current Actions: No Action 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms. 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progess 064 MB/SH Request WG presentation at OGF 28 25/11: Session requested 04/12: no update 09/12: no update 16/12: SH has changed request to a general session rather tha WG in the hope of attracting more people. 23/12: no update 06/01: not heard anything yet 13/01: no update 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 068 Should the roots of messages be designated.? 09/12: Yes. New dfdl:documentRoot property Closed 16/12: reopened and decided to drop property subject to agreement from SKK and SF 23/12: SKK review decision to drop dfdl:documentRoot 13/01: closed 071 Semantics of length=0, nil handling and defaults. 23/12:SH no update 06/01: SH has started 13/01: SH proposal review. Minor updates to be made 073 SH: Control of overpunching zoned positive sign 13/01: no update Closed actions No Action 056 MB Resolve lengthUnits=bits including fillbytes 12/08: No Progress ... 28/10: no progress 04/11: MB to look at lengthUnits = bits 11/11: no update 18/11: no update 25/11: no update 04/12: no update. ALan will set up a separate call to progress this action. 09/12: no update. ALan will set up a separate call to progress this action. 16/12: MB, SH and AP had a separate call. MB to distribute proposal 23/12: Discussed proposal. MB will updated 06/01: V4 discussed and approved 13/01: Mike updated proposal. Closed 068 Should the roots of messages be designated.? 09/12: Yes. New dfdl:documentRoot property Closed 16/12: reopened and decided to drop property subject to agreement from SKK and SF 23/12: SKK review decision to drop dfdl:documentRoot 13/01: closed Work items: No Item target version status 005 Improvements on property descriptions not started 011 How speculative parsing works (combining choice and variable-occurence - currently these are separate) (from action 045) awaiting completion of actions 045 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 038 Improve length section including bit handling some improvement in 036 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 069 ICU fractional seconds 070 Write DFDL primer 071 Write test cases. 072 it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix 073 Rename dfdl:separatorPolicy="required" to "always". 074 - Last 'postFix' separator is not optional - Terminators are mandatory. - dfdl:documentFinalTerminatorCanBeMissing - dfdl:documentFinalSeparatorCanBeMissing (Action (70)) 075 Remove occursCountKind="useAvailableSpace". 076 dfdl:documentRoot, will be defined that can only be on global elements. The DFDL spec does not have to define the format of parameters to the DFDL processor but will indicate that it must be possible to adresss any element. Agreed that ANY element within the schema cane be the starting point for parsing or unparsing. dfdl:documentRoot no longer required 077 'delimited' means the item is delimited by the item’s terminator (if specified) or an enclosing construct’s separator or end of the enclosing construct designated by its known length or its terminator. The definition of EndOfParent also needs improving. 078 document UPA checks 079 Restrictions on use of 'special' entities in regular expressions 080 LengthUnit=bits (A056) Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg(a)ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Fw: [DFDL] zero length (was Re: Fw: TDS length reference) ** updated **
by Steve Hanson 13 Jan '10

13 Jan '10

For discussion on today'call..... Regards Steve Hanson Programming Model Architect, WebSphere Message Brokers, OGF DFDL WG Co-Chair, Hursley, UK, Internet: smh(a)uk.ibm.com, Phone (+44)/(0) 1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 13/01/2010 12:46 ----- ** Complex element design added. Please review ** ------------------------------------------------------------------------------------------------------ Proposal extends the earlier work done in this area & described by spec section 15.13 and 5.7. To paraphrase those sections: 5.7 The 'default' attribute is used to provide the logical value of a required element while parsing when the representation is empty (content length is zero). 15.13 When we get 'empty content' from an element, and the element is optional, then it is not present and is not added to the infoset. When we get empty content from an element, and the element is required, then we start to look at nil handling and default handling properties. - If the properties are such that the empty string is a nil value then the infoset value is the special value nil. - If the properties are such that there is a default value specified then the infoset value is the default value. - Otherwise if empty string is valid for the type (ie, is derived from xs:string) then the infoset value is a zero length string. So we know what empty content is and how it is applied to simple elements. We need to define when it is possible to get empty content and what it means to elements of complex type or of non-string simple type. Proposal: 1. Parsing Simple elements 1) It is not a schema definition error nor a processing error if a length is being used to extract data and it is zero. This covers dfdl:lengthKind implicit, explicit, prefixed and endOfParent (when parent length is known). The result is 'empty content'. (Note that for implicit, XSDL allows maxLength/length facet to be 0, so disallowing it for others is not consistent). 2) It is not a processing error if scanning for data and the length of the returned bytes is zero. This applies to dfdl:lengthKind delimited, pattern and endOfParent (when parent length is not known). The result is 'empty content'. (This is just stating the obvious). (The above two rules ensure that it is possible to apply empty content to trigger optional, nil value or default value processing regardless of data type and dfdl:lengthKind). 3) Optional, nil and default processing are applied as per spec. 4) If the element is required, and nil value or default value is not used, and empty string is not in the lexical space of the element's type, then it is a processing error. The two initiator related properties dfdl:nilValueInitiatorPolicy and dfdl:defaultValueInitiatorPolicy define whether nils and defaults are applied when initiated empty content is found, they don't affect the definition of empty content or what it means for the type. [Note: If you recall, this discussion was triggered by a customer that was using an expression to calculate the length of a standard text decimal. He wanted 0 length to mean 0 ended up in the infoset. He can achieve this by making the element required with a default value of 0.] Complex elements It is possible to get returned empty content for a complex element for cases 1) and 2) above. 1) If the complex element is optional then it is not added to the infoset. 2) If the complex element does not have an initiator specified & is required then it is added to the infoset. 3) If the element has an initiator specified then dfdl:defaultValueInitiatorPolicy applies - required => element is added to infoset only if initiator is present (processing error if no initiator & empty content) - prohibited => element is added to infoset only if initiator is not present (initiator implies real content follows so processing error if initiator & empty content) 4) If the complex element is added to the infoset, then the parser processes the child content of the complex type. This may or may not cause a processing error. If it doesn't then default value processing applies for required child elements. If we don't do this then we will not create default values for all missing required simple elements, and that would be wrong. 5) If the contained sequence or choice has an initiator or terminator then it is a processing error. 2. Unparsing Simple elements Data in the infoset can result in empty content being added to the bit stream (ie, nothing), with an accompanying 0 value in any length prefix or length expression field, if appropiate to the dfdl:lengthKind. Complex elements The absence from the infoset of a required complex element will cause any specified initiator to be output, plus if there are required children then default values will be output for those children. If we don't do this then we will not create default values for nested missing required simple elements, and that would be wrong. This enables creation of a sparse infoset containing just the elements with explicit values, with the rest defaulting regardless of nesting. 3. Choices Worth noting that the concept of 'required' for the elements of a choice does not apply. Even if minOccurs > 0. 4. Outstanding Issues Is it ok to reuse dfdl:defaultValueInitiatorPolicy for complex elements? Should it be renamed? Should we add a separate property for complex elements? Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Bob & Steph's WTX 'Discriminators' write-up
by Stephanie Fetzer 13 Jan '10

13 Jan '10

All: Here are the WTX identification/discrimination behaviours 'on-paper' written with the help of Bob Connolly our WTX core engine expert. I had attempted to create a single description of the concept but ended up needing to split the core behaviour from the modelling options in order to get some of the wrinkles discussed. The attempt here is to share information on the use of this concept within WTX with the (outside of IBM) DFDL WG in such a way that it does not go into the specifics of the WTX implementation. ___________________________________________________________________________________________________ WTX use of IDENTIFIERS for distinguishing data This is a brief description of the use of WTX identifiers to describe limitations that could be imposed on the implementation of DFDL discriminators. We will attempt to word this in terms that are not specific to WTX and are a bit more XML-centric than we would normally describe WTX processing. This may dilute some of the specificity of the descriptions – but the spirit of the concepts is the important thing here. WHAT IS A DISCRIMINATOR?: Discriminators allow us to evaluate a data component and determine if it 'is known to exist'. The distinction that we are making is that there are situations when parsing when we need to determine if some data is an 'invalid instance' of one data component as opposed to 'not that data component'. This concept has multiple advantages: it lets us be more specific when modeling our data than we could be without the concept, and it allows us to terminate one branch of speculative parsing quicker than we might otherwise be able to without the concept. HOW ARE DISCRIMINATORS USED IN WTX CORE: The core of WTX uses discriminators in the parsing of data in three main ways: 1-We are parsing a group which is identifiable in some other way – for example, it is a group partitioned by initiator. In this case we functionally have a choice between two or more groups. So the decision here is which path or partition/choice group (if any) will be found to exist. If we don’t find the initiator in the data then this partition/choice group is not ‘known to exist’. If we find the initiator in the data and... -If we find the initiator in the data and we do not have a discriminator on a component of the partition/choice group then this type is ‘know to exist’ -If we find the initiator in the data and if we have a discriminator set on a component of the partition group and if the rule on the discriminator and all rules on earlier components evaluate to true then this group is 'known to exist' Note: There is no requirement in WTX that the component on which the identifier attribute is set has to have a specified rule - all mandatory instances of components have an implied rule of PRESENT($) in WTX. In this document when I say that the rule evaluates to true there is really more to this concept - we could say that everything up to that point is 'valid' meaning that all rules and cardinality constraints and other facets enforced (restrictions, size, presentation checks) pass. In WTX the lines between validation and parsing are blurred so these distinctions may be a bit different in a DFDL implementation. But we will at minimum need to work this in such a way as we can specify that not only the rules evaluate to true but that the cardinality checks also pass. In DFDL we may want to make having a rule a requirement if we choose to use the currently documented discriminator construct. -If we find the initiator in the data and if we have a discriminator set on a component of the partition group and if the rule on the discriminator evaluates to false or if any rule on a component prior to the component carrying the discriminator evaluates to false then this group is ‘known to not exist’ 2-We are evaluating a group and have determined that something is wrong with it and the group we are evaluating has a discriminator and we have not found (or evaluated its rule) yet - so we say that it is ‘known to not exist’. 3-A rule on a component in the group has evaluated to false and that rule is on or above the component with the discriminator. This group is ‘known to not exist’. Note: this is similar to number 2 but number 2 is another reason for failure such as missing a mandatory component. 4-A rule on a component in the group has evaluated to true and that rule is on the component with the discriminator. This group is ‘known to exist’. Note: as we are processing the component with the discriminator the assumption is that we would not be processing this rule if all previous occurring checks and rules checks had not evaluated to true. 5-A rule on a component in the group has evaluated to false and that rule is after the component with the discriminator. The rules on the component with the discriminator and above rules evaluated to true. This group is ‘known to exist’ but is invalid. 6-All rules on all components in the group have evaluated to true including the rule on the component with the discriminator. This group is ‘known to exist’ and is valid. 7-All rules on all components in the group have evaluated to true – there is no component with a discriminator. This group is ‘known to exist’ and is valid. Without the identifier we process the group to the end (last component) before determining that it is ‘known to exist’. HOW ARE DISCRIMINATORS IMPLEMENTED IN WTX MODELS?: WTX imposes many limitations to the expression of identifiers on the model. -Discriminators may only be placed on the physical representation of a group. That is why we see them on partition groups and sequence groups but not on choice groups (or unordered groups – covered below). In partitioned groups we have a subtype of each possible group – so each possible group may have a discriminator. When WTX expresses choice groups it expresses them as a group containing all of the possible child groups – so at the top level ‘choice group’ there is no component of the actual group content- so no use for a discriminator. But each choice which may itself be a group may have a discriminator. Choice groups are special in that the choice model construct simply lists the components and only one may occur...at this level a discriminator on one of the choices may not be very useful. Inside of each choice’s components a discriminator could be used to indicate the existence of that choice. -The WTX UI does not allow discriminators on the components of Unordered Groups. This may be due to the fact that the position of the discriminator has significance (all rules at or above the discriminator must evaluate to true). If the group is unordered it would be difficult to enforce. Will need to discuss for DFDL. -A group may have either zero or one discriminators. No group may have more than one discriminator. -The discriminator may have two significant parts o it’s location (mandatory). The discriminator is placed on a component of a group and makes all of the cardinality and rules at that point and above become part of it's concept. o it’s rule (optional) A group with a component which has a discriminator should have some ‘rule’ associated with it. In WTX if there is no explicit rule then the implicit rule is ‘PRESENT($)’. We will need to decide if such implied rules will be allowed in DFDL. -A group may only have a discriminator on a mandatory component. Once again, this impacts a choice group where by definition all components are optional – which will not have a discriminator. This has been an issue of debate in WTX. We could have implemented checking on optional elements quite easily Over the years this has been questioned (as our UI allows them to be placed on optional elements) but once we explained the way the engine worked no customers perceived this as a deficiency. In DFDL we will need to determine if this is needed. -In WTX we do allow a discriminator to be placed on a mandatory fixed size array (a repeating mandatory component with n:n cardinality). It’s component rule can either refer to the entirety of the array (PRESENT($) meaning the whole of the array is present) or can call out a specific rule against one if the iterations. This is not done often in practice. -In WTX it is common to have multiple levels of discriminators when we are working with nested groups. Stephanie Fetzer WebSphere Common Transformation Industry Packs - Software Engineer

1 0

Comments on DFDL Spec V0.37 - Remove dfdl:format annotation from complex type
by Suman Kalia 13 Jan '10

13 Jan '10

Chapter 1 : When the scoping rules were simplified in V 0.37 of spec, we removed dfd:format annotation from complex types but most of our examples in chapter 1 haven't been updated to to reflect this.. The examples should be updated to show this annotation at schema level.. <xs:complexType name="example1"> <xs:annotation> <xs:appinfo> <dfdl:format representation="binary" byteOrder="bigEndian" lengthKind="implicit" binaryFloatRepresentation="ieee" /> </xs:appinfo> </xs:annotation> <xs:sequence> <xs:element name="w" type="int"/> <xs:element name="x" type="int "/> <xs:element name="y" type="double"/> <xs:element name="z" type="float" /> </xs:sequence> </xs:complexType> Section 7.1 ( Page 47) -- Table should be updated to remove dfdl:format annotation from complex type.. Schema object DFDL annotation xs:choice dfdl:choice xs:complexType dfdl:format Suman Kalia IBM Toronto Lab WMB Toolkit Architect and Development Lead WebSphere Business Integration Application Connectivity Tools http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.h… Tel : 905-413-3923 T/L 969-3923 Fax : 905-413-4850 T/L 969-4850 Internet ID : kalia(a)ca.ibm.com

1 0

Agenda for OGF DFDL WG call 13 January 2010- 13:00 UK (8:00 ET)
by Alan Powell 12 Jan '10

12 Jan '10

1. 045 - Disciminators Discussion with Stephanie 2. Zero length elements Review Steve H email 3. Difference between dfdl:lenghtKind= Delimited and endOfParent 'delimited' means the item is delimited by the item?s terminator (if specified) or an enclosing construct?s separator or the end of the enclosing construct designated by its known length or its terminator. the only difference with dfdl:lentghKind='endOfParent' is that the latter includes the 'end of the data stream' and applies to binary fields. We should either Add 'end of data stream' to delimited and remove 'endOfParent' Make 'endOfParent' be specifically for only 'end of data stream' 4. Go through remaining actions 5 Draft 037 review >From comments: a DFDL Subset of XML Schema (TBD: need means for an implementation to indicate it is using non-standard extensions?) b. Question whether infoset MUST be in schema order. Request for 'bitstream order' c. Dealing with 'Grammar ambiguity' errors 6 Review Schedule Activity Schedule Who Complete Action items - 18 Dec 2009 WG Complete Spec Write up work items ? 23 Dec 2009 AP Restructure and complete specification - 23 Dec 2009 AP Issue Draft 038 23 Dec 2009 WG review WG review 7 Dec ? 08 Jan 2010 WG Incorporate review comments 4 Jan - 29 Jan 2010 AP + Issue Draft 039 15 Jan 2010 Incorporate review comments 4 Jan - 29 Jan 2010 AP + Issue Draft 040 29 Jan 2010 Initial OGF Editor Review Initial Editor review 1 Feb - 1 Mar 2010 OGF Initial GFSG review 1 Feb - 1 Mar 2010 Issue Draft 041 1 Mar 2010 OGF Public Comment period (60 days) 1 Mar - 30 Apr 2010 OGF OGF 28 Munich 15-19 March 2010 Incorporate comments Incorporate comments 28 May 2010 Issue Draft 042 28 May 2010 Final OGF Editor Review Final Editor review June 2010 OGF final GFSG review June 2010 Issue Final specification 30 June 2010 Publish proposed recommendation 1 July 2010 Grid recommendation process 1 Jan - 1 April 2011 Current Actions: No Action 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 056 MB Resolve lengthUnits=bits including fillbytes 12/08: No Progress ... 28/10: no progress 04/11: MB to look at lengthUnits = bits 11/11: no update 18/11: no update 25/11: no update 04/12: no update. ALan will set up a separate call to progress this action. 09/12: no update. ALan will set up a separate call to progress this action. 16/12: MB, SH and AP had a separate call. MB to distribute proposal 23/12: Discussed proposal. MB will updated 06/01: V4 discussed and approved 064 MB/SH Request WG presentation at OGF 28 25/11: Session requested 04/12: no update 09/12: no update 16/12: SH has changed request to a general session rather tha WG in the hope of attracting more people. 23/12: no update 06/01: not heard anything yet 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 068 Should the roots of messages be designated.? 09/12: Yes. New dfdl:documentRoot property Closed 16/12: reopened and decided to drop property subject to agreement from SKK and SF 23/12: SKK review decision to drop dfdl:documentRoot 071 Semantics of length=0, nil handling and defaults. 23/12:SH no update 06/01: SH has started 073 SH: Control of overpunching zoned positive sign Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Floating elements and unordered groups
by Tim Kimber 12 Jan '10

12 Jan '10

Hi all, I know this area of the specification was only recently resolved, and I think there may be an inconsistency in the v0.37 wording. Section 16, re: sequenceKind says: "The children of an unordered sequence must be xs:element." Section 16.5 Floating Elements says: "An ordered sequence of n element children with either n or n-1 of those children with dfdl:floating="true" is equivalent to an unordered sequence with the same n element children with dfdl:floating="false". A complex element with dfdl:floating="true" can have as its content model a sequence with elements that also have dfdl:floating="true". " Now suppose that, instead of N element children, there are N-1 floating element children + one non-floating group. This group will be equivalent to an unordered group with a non-element member. If the specification was intending to make life easy for implementers, then it should probably disallow groups in any non-ordered context, including when sequenceKind='ordered' and there is at least one floating component. But I think that would be too restrictive. I would be happy for the restriction to be lifted entirely. Given that unordered groups can have dfdl:initiated="false", it will sometimes be necessary to find the correct member by trial and error ( speculative parsing ) anyway. I don't think it's any more difficult to speculatively parse a group than to speculatively parse a complex element. If I've missed something, and it turns out that the restriction is useful, then we should a) tighten up the wording to say that if a group with N members has N or N-1 floating members, then it must be validated as if it was an unordered group. b) consider lifting the restriction in cases where dfdl:initiated="true" ( because it makes things so much easier for the DFDL processor ) regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert(a)uk.ibm.com Tel. 01962-816742 Internal tel. 246742 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

3 2

Re: [DFDL-WG] dfdl-wg Digest, Vol 41, Issue 8
by Tim Kimber 08 Jan '10

08 Jan '10

My understanding was different from what is quoted below. 'delimited' can always be terminated by the end of the data or the end of an enclosing known-length element. Without that rule, the simplest scenario involving an infix delimiter will not work correctly ( or will force the user to set lengthKind='endOfParent' on the final group member, which would be very unintuitive ). There is a difference between endOfParent and delimited though. Delimited is not allowed when representation='binary'. 'endOfParent' is. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert(a)uk.ibm.com Tel. 01962-816742 Internal tel. 246742 From: dfdl-wg-request(a)ogf.org To: dfdl-wg(a)ogf.org Date: 08/01/2010 18:02 Subject: dfdl-wg Digest, Vol 41, Issue 8 Send dfdl-wg mailing list submissions to dfdl-wg(a)ogf.org To subscribe or unsubscribe via the World Wide Web, visit http://www.ogf.org/mailman/listinfo/dfdl-wg or, via email, send a message with subject or body 'help' to dfdl-wg-request(a)ogf.org You can reach the person managing the list at dfdl-wg-owner(a)ogf.org When replying, please edit your Subject line so it is more specific than "Re: Contents of dfdl-wg digest..." Today's Topics: 1. dfdl:lentgKind= 'delimited' and 'endOfParent' (Alan Powell) ----- Message from Alan Powell <alan_powell(a)uk.ibm.com> on Fri, 8 Jan 2010 17:10:51 +0000 ----- To: dfdl-wg(a)ogf.org Subject: [DFDL-WG] dfdl:lentgKind= 'delimited' and 'endOfParent' Since we extended the meaning of dfdl:lengthKind= 'delimited' to include 'delimited' means the item is delimited by the item’s terminator (if specified) or an enclosing construct’s separator or the end of the enclosing construct designated by its known length or its terminator. the only difference with dfdl:lentghKind='endOfParent' is that the latter includes the 'end of the data stream'. We should either Add 'end of data stream' to delimited and remove 'endOfParent' Make 'endOfParent' be specifically for only 'end of data stream' Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg(a)ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

dfdl:lentgKind= 'delimited' and 'endOfParent'
by Alan Powell 08 Jan '10

08 Jan '10

Since we extended the meaning of dfdl:lengthKind= 'delimited' to include 'delimited' means the item is delimited by the item?s terminator (if specified) or an enclosing construct?s separator or the end of the enclosing construct designated by its known length or its terminator. the only difference with dfdl:lentghKind='endOfParent' is that the latter includes the 'end of the data stream'. We should either Add 'end of data stream' to delimited and remove 'endOfParent' Make 'endOfParent' be specifically for only 'end of data stream' Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0