November 2009 - dfdl-wg - lists.ogf.org

Array clarifications (was Fw: Minutes for OGF DFDL Working Group Call, November-25-2009)
by Steve Hanson 30 Nov '09

30 Nov '09

Discussed points 1-3 with Alan. 1) Alan thinks we didn't decide to limit discrimnators to choices - specifically that the optional scenario needs it. Steve thought this was just handled by speculative parsing, and that we limited discriminators to choices to simplify things for 1.0. More discussion needed in WG. 2) The minuted definition of dfdl:separatotPolicy="required" is in fact correct - it causes the parser to expect maxOccurs separators, and the unparser to output maxOccurs separators. Otherwise it is not possible for dfdl:occursCountKind="parsed" to handle extra separators that are included to make parsing unambiguous. Combination "required" and maxOccurs="unbounded" is therefore an error. Keeps the rule simple, but you can end up with some slightly odd combinations (eg, "required" plus stop value is possible). 3) A correction is needed to my revised dfdl:occursCountKind description, due to work item 061. 061 Change maxOccurs violations from processing error to validation error (if not 'fixed') 037 --------------------------------------------------------- Further discussion on array processing: 4) occursCountKind="expression". Is it a processing error if the number of occurrences in the data does not match the value of the expression? It was noted that the dfdl:outputValueCalc expression of a count field should use the dfdl:countWithDefault() function to ensure default values are taken into account. 5) occursCountKind="useAvailableSpace". On unparsing, unused space should be padded with dfdl:fillByte (added below). But if the number in the infoset is a lot less than the box can hold, how do you know when re-parsing how many are in the box? Also, if we are trying to fit things into a box, does it matter if items are left over? I suggested this was an error below. Need Mike's input as he has seen the use cases for this. 6) Rename dfdl:separatorPolicy="required" to "always". 7) Noted that dfdl:separatorPolicy="suppress" and "suppressAtEnd" have the same behaviour for an array. Regards Steve Hanson Programming Model Architect, WebSphere Message Brokers, OGF DFDL WG Co-Chair, Hursley, UK, Internet: smh(a)uk.ibm.com, Phone (+44)/(0) 1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 30/11/2009 10:02 ----- From: Steve Hanson/UK/IBM To: dfdl-wg(a)ogf.org Date: 27/11/2009 09:41 Subject: Re: [DFDL-WG] Minutes for OGF DFDL Working Group Call, November-25-2009 Three things: 1) I thought we had decided to limit dfdl:discriminator to choices? Apologies if I didn't keep up with the discussion. 2) I don't think we got the rules right for dfdl:separatorPolicy="required" and arrays. Yes, separators must be output for all items, but 'all' depends on dfdl:occursCountKind and not just maxOccurs. In fact, the only time that maxOccurs has significance when parsing/unparsing is when dfdl:occursCountKind="fixed" - and in that case maxOccurs can not be unbounded. It should therefore not be a schema definition error when maxOccurs="unbounded" & dfdl:separatorPolicy="required". occursCountKind Enum Specifies how the actual number of occurrences is to be established. Valid values 'fixed', ‘expression’, 'parsed’ ,‘stopValue’ or ‘useAvailableSpace’ ‘fixed’ means use the value of the maxOccurs on the declaration. It is a schema definition error if the value for minOccurs is not equal to maxOccurs. ‘expression’ means use the value of the dfdl:occursCount property. 'parsed' means that the number of occurences is determined by normal speculative parsing such as discriminating by the initiator. ‘useAvailableSpace’ means the occurrences fill the available space which is limited by a containing construct. ‘stopValue’ means look for a logical stop value which signifies the end of the occurrences. Annotation: dfdl:element 3) I think the dfdl:occursCountKind entry needs beefing up to address explicitly the parsing and unparsing behaviour for each enum. I know some of this stuff is elsewhere but I think it should also be here for clarity. Here's my first pass. I have highlighted where I'd like WG clarification. occursCountKind Enum Specifies how the actual number of occurrences is to be established. Valid values 'fixed', ‘expression’, 'parsed’ ,‘stopValue’ or ‘useAvailableSpace’ ‘fixed’. Use the value of maxOccurs on the element declaration. It is a schema definition error if the value for minOccurs is not equal to maxOccurs. On parsing, maxOccurs are expected. On unparsing, maxOccurs are output. ‘expression’ means use the value of the dfdl:occursCount property. On parsing, dfdl:occursCount are expected. On unparsing, dfdl:occursCount are output. 'parsed' means that the number of occurences is determined by the data itself. On parsing, this is established using normal speculative parsing such as discriminating by the initiator. On unparsing, all infoset items are output. ‘useAvailableSpace’ means the occurrences fill the available space which is limited by a containing construct. On parsing, this is established using normal parsing rules. On unparsing, infoset items are output until the available space is exhausted or no items remain. Any unfilled space is filled with the dfdl:fillByte property value. ‘stopValue’ means use the value(s) of the dfdl:occursStopValue property. On parsing, look for a logical stop value which signifies the end of the occurrences. On unparsing, all infoset items are output followed by a logical stop value. The stop value does not appear in the infoset, and does not contribute to the occurrences count. On parsing and unparsing, after default rules are applied, it is a processing error (if 'fixed') or a validation error (otherwise) if the number of occurrences does not lie between minOccurs and maxOccurs inclusive. On unparsing, it is a processing error if items remain in the infoset after the designated occurrences have been output. Annotation: dfdl:element Regards Steve Hanson Programming Model Architect, WebSphere Message Brokers, OGF DFDL WG Co-Chair, Hursley, UK, Internet: smh(a)uk.ibm.com, Phone (+44)/(0) 1962-815848 From: Alan Powell/UK/IBM@IBMGB To: dfdl-wg(a)ogf.org Date: 26/11/2009 16:59 Subject: [DFDL-WG] Minutes for OGF DFDL Working Group Call, November-25-2009 Sent by: dfdl-wg-bounces(a)ogf.org Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, November-25-2009 Attendees Suman Kalia (IBM) Steve Hanson (IBM) Mike Beckerle (Oco) Alan Powell (IBM) Steve Marting (Progeny) Peter Lambros (IBM) Tim Kimber(IBM) Apologies Stephanie Fetzer (IBM) 1. 045 Resolving points of uncertainty - Disciminators Continued discussion of semantics of discriminators and arrays. Briefly reviewed Alan's update of the example to show discriminators only propagating to their parent. Flet that following WTX example was the best approach. IBM will more fully document WTX behaviour. Discussed whether discriminators should only resolve choices but decided against as they are valuable to, for example, find an optional element that subsequently has a parsing error. Also discussed that discriminators should only resolve the element they are defined on and whether that meant paths were not needed. Decided paths were needed for other purposes anyway. Alan noted that the order of evaluation had not been defined for floating elements. Agreed this should be - the element in that position - followed by the floating elements in the order they are defined in the schema. 2. 045 - parsing rules for determining length Discussed Tim's suggestion that dfdl:lengthKind alone should indicate how an item is extracted from the data explicit The parser extracts a fixed number of characters/bytes from the input document as directed by dfdl:length ( which may be a DFDL expression, and may resolve to the value of the previous sibling ) prefixed The parser extracts a fixed number of characters/bytes from the input document as directed by the prefix length. Note the similarity with the DFDL expression scenario above. implicit The parser extracts a fixed number of characters/bytes from the input document as directed by the implicit length of the element. delimited The parser extracts from the input document all characters between the current buffer position and the next unescaped item of in-scope terminating markup. pattern The parser extracts from the input document all characters which match the specified pattern endOfParent. The parser extracts from the input document all remaining characters/bytes allowed by the representation properties of its parent groups/elements. The terminating markup is only scanned for when lengtkind is 'delimited' and 'EndOfParent' when the end of the parent is delineated by markup. This means that for formats such as the Swift 52A segment that distinguishes fields according to the length of data found, the lengthKind is delimited rather than explicit and a dfdl:assert is needed to test the length. dfdl:length is only examined when dfdl:lengthKind is explicit. It was agreed to change the function names of dfdl:length to dfdl:representationLength and dfdl:lengthWithoutPadding to RepresentationLengthWithoutParsing to avoid confusion with the XPATH length. function. Didn't discuss if this changes escaping rules. 3. SeparatorPolicy=require and defaulting arrays and sequences Discussed the semantics of dfdl:separatorPolicy with variable length arrays. Agreed that the behaviour should be the same for arrays as for sequences. 'supressAtEnd' - separators must be output up to the last required item. That is up to the xs:minOccurs of a variable array. For simple items a default value will be output for missing required items. 'required' - separators must be output for all items. That is up to xs:maxOccurs. It is a schema definition error if required and xs:maxOccurs= unbounded is specified. 'supress' - separators are not output for missing items but for arrays there are no missing items so this is the same as dfdl:supressAtEnd. Note: a group is always 'required' so at least one member must be present. However a group could be wrapped in an element which could be optional. 4. Clarification of postfix separators, terminators,finalTerminatorCanBeMissing Agreed to postpone to next call due to lack of time. 5. Go through remaining actions Updated below 6. Test suite for DFDL DFDL will be much more usable if it is accompanied by a set of tests that provide dfdl schema, sample data and expected results. IBM will investigate whether it will be possible to publish the format of the test cases that it is developing. 7. OGF28 Call for papers Steve H has sent a request for a slot in the agenda 8. Plan to finish DFDL v1 MB agreed to investigate the tool available on gridforge for tracking problems with the spec. Agreed to start using this after version 37. Meeting closed, 15:10 Next call 02 December 13:00 UK Next action: 068 Actions raised at this meeting No Action 066 Investigate format for defining test cases IBM to see if it is possible to publish its test case format. 067 Investigate problem tracking tools. Current Actions: No Action 012 AP/SH: Update decimalCalendarScheme 10/9: Not allocated yet 17/9: No update 24/9: Add calendar binary formats to actions 22/10: No progress 16/1: proposal distributed and discussed. Will be redistributed 21/1: add locale, 04/02: changed from locale to specific properties 18/2: Need more investigation of ICU strict/lax behaviour. 08/04: Not discussed 22/04: AP to complete asap once the ICU strict/lax behaviour is understood. 29/04: No progress 06/05: No progress 13/05: Calendar has been added to latest spec version v034 but still a few details to clarify. 20/05: No Progress ... 09/06: No Progress (low priority) 17/06: SH to check ICU code for lax calendar behaviour 24/06: no progress ... 12/08: no progress 19/08: Inconsistencies are being found in ICU behaviour so Calendars need reviewing again. 26/08: Specific three character short time zones may not be maintained during round tripping when there is more than one short form for a time zone offset. Because dates and datetimes in the infoset only maintain a time zone offset so on unparsing it isn't possible to say which short form will be selected for a particular offset when there is more than one possible. Need to document. 09/09: no progress ... 14/10: no progress 21/10: Will produce a list of known issues. 28/10: Discussed ICU farctional seconds behaviour. SF to send latest understanding. 04/11: no progress 11/11: no update 18/11: no update 25/11: no update 037 All: Approach for XML Schema 1.0 UPA checks. 22/04: Several non-XML models, when expressed in their most obvious DFDL Schema form, would fail XML Schema 1.0 Unique Particle Attribution checks that police model ambiguity. And even re-jigging the model sometimes fails to fix this. Note this is equally applicable to XMl Schema 1.1 and 1.0. While the DFDL parser/unparser can happily resolve the ambiguities, the issue is one of definition. If an XSD editor that implements UPA checks is used to create DFDL Schema, then errors will be flagged. DFDL may have to adopt the position that: a)DFDL parser/unparser will not implement some/all UPA checks (exact checks tbd) b) XML Schema editors that implement UPA checks will not be suitable for all DFDL models c) If DFDL annotations are removed, the resulting pure XSD will not always be valid (ie, the equivalent XML is ambiguous and can't be modelled by XML Schema 1.0) Ongoing in case another solution can be found. 29/04: Will ask DG and S Gao for opinion before closing 06/05: Discussed S Gao email and suggestions. Decided need to review all XML UPA rules and decide which apply to dfdl. 20/05: SH or SKK to investigate 27/05: No Progress 03/06: The concern is that some dfdl schemas will fail UPA check when validation is turned on or when editted using tooling that enforces UPA checks. Renaming fields will resolve some/most issues. Need documentation that describes issue and best practice. 17/06: no change 24/06: no change 01/07: no progress ... 12/08: No Progress (lower priority) 19/08: Clarify that this action is to go through the XML UPA checks to assess impact on dfdl schemas and advice best practice. Name clashes is just one example. SH or SKK 26/08: No Progress (lower priority) 09/09: no progress ... 04/11: no progress 11/11: Steve has started to look at this. He has requested a 'consumable' definition of the UPA rules from the XSD WG members. Even non-normative Appendix H in the XSD 1.0 spec is hard to consume. 18/11: no update 25/11: Steve H has not found simpler definition so may just go through them. 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 056 MB Resolve lengthUnits=bits including fillbytes 12/08: No Progress ... 28/10: no progress 04/11: MB to look at lengthUnits = bits 11/11: no update 18/11: no update 25/11: no update 063 Write DFDL primer and test cases. 11/11: no update 25/11: no update 064 MB/SH Request WG presentation at OGF 28 25/11: Session requested 065 Resolve parsing rules for various lengthKinds 25/11: Agreed dfdl:lengthKind define how to extract the data. Didn'r t discuss if this changes escaping. 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 067 25/11:Investigate problem tracking tools. Closed actions No Action Work items: No Item target version status 005 Improvements on property descriptions not started 011 How speculative parsing works (combining choice and variable-occurence - currently these are separate) (from action 045) awaiting completion of actions 045 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 033 Numeric data - what physical reps are allowed for what logical types (from action 020) 037 ensure all behaviour documented 036 Update dfdl schema with change properties ongoing 038 Improve length section including bit handling some improvement in 036 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 051 Revised scoping rules (from action 051) 037 058 textPadCharacter %#rxx limitation and split to textxxxxPadCharacter 037 059 limit terminatorCanBeMissing to last element in schema. Ignore elsewhere. 037 060 New empty string semantic for dfdl:binaryBooleanTrueRep 037 061 Change maxOccurs violations from processing error to validation error (if not 'fixed') 037 062 Drop calendarUseZForUTC. describe zU, IU and TU symbols 037 063 DefineFormat can contain only one active format. Drop baseFormat 037 064 Define how encoding, byteorder and floating point format externally 037 065 Refactor dfdl:textNumberFormat to remove dfdl:numberBase. 037 066 document scope of selectors 037 067 document floating evaluation order 037 068 change dfdl:length to dfdl:representationLength and dfdl:lengthWithoutPadding to RepresentationLengthWithoutParsing Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg(a)ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

2 1

Using gridforge trackers for comments/corrections and formal feedback
by Mike Beckerle 27 Nov '09

27 Nov '09

I took the action item to examine the gridforge facilities for more formalizing public commentary and feedback. The complexity is that there are 4 mechanisms: "trackers", "tasks", "discussions", and "wiki". Another workgroup I used to participate in (DAIS) appears to have settled on trackers and limited wiki, and I'm not sure about the wiki part. Based on this I kicked off use of the trackers by creating one for us, and adding one item to track which is the lengthKind='bits' action item. Take a look here: http://forge.gridforum.org/sf/go/projects.dfdl-wg/tracker.specification_ver… There is richness here which would allow attachment of draft doc sections, test case files, etc. if needed. One can have many trackers - which are like master organizing categories of items to track. I am not sure it is helpful to have more than one overall containing "tracker". DAIS WG does have more than one, but I suggest we start with only one and add more only if necessary. The DFDL Wiki is hopelessly obsolete. I will delete the bogus content there so that it is at least not misleading.

2 1

Minutes for OGF DFDL Working Group Call, November-25-2009
by Alan Powell 27 Nov '09

27 Nov '09

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, November-25-2009 Attendees Suman Kalia (IBM) Steve Hanson (IBM) Mike Beckerle (Oco) Alan Powell (IBM) Steve Marting (Progeny) Peter Lambros (IBM) Tim Kimber(IBM) Apologies Stephanie Fetzer (IBM) 1. 045 Resolving points of uncertainty - Disciminators Continued discussion of semantics of discriminators and arrays. Briefly reviewed Alan's update of the example to show discriminators only propagating to their parent. Flet that following WTX example was the best approach. IBM will more fully document WTX behaviour. Discussed whether discriminators should only resolve choices but decided against as they are valuable to, for example, find an optional element that subsequently has a parsing error. Also discussed that discriminators should only resolve the element they are defined on and whether that meant paths were not needed. Decided paths were needed for other purposes anyway. Alan noted that the order of evaluation had not been defined for floating elements. Agreed this should be - the element in that position - followed by the floating elements in the order they are defined in the schema. 2. 045 - parsing rules for determining length Discussed Tim's suggestion that dfdl:lengthKind alone should indicate how an item is extracted from the data explicit The parser extracts a fixed number of characters/bytes from the input document as directed by dfdl:length ( which may be a DFDL expression, and may resolve to the value of the previous sibling ) prefixed The parser extracts a fixed number of characters/bytes from the input document as directed by the prefix length. Note the similarity with the DFDL expression scenario above. implicit The parser extracts a fixed number of characters/bytes from the input document as directed by the implicit length of the element. delimited The parser extracts from the input document all characters between the current buffer position and the next unescaped item of in-scope terminating markup. pattern The parser extracts from the input document all characters which match the specified pattern endOfParent. The parser extracts from the input document all remaining characters/bytes allowed by the representation properties of its parent groups/elements. The terminating markup is only scanned for when lengtkind is 'delimited' and 'EndOfParent' when the end of the parent is delineated by markup. This means that for formats such as the Swift 52A segment that distinguishes fields according to the length of data found, the lengthKind is delimited rather than explicit and a dfdl:assert is needed to test the length. dfdl:length is only examined when dfdl:lengthKind is explicit. It was agreed to change the function names of dfdl:length to dfdl:representationLength and dfdl:lengthWithoutPadding to RepresentationLengthWithoutParsing to avoid confusion with the XPATH length. function. Didn't discuss if this changes escaping rules. 3. SeparatorPolicy=require and defaulting arrays and sequences Discussed the semantics of dfdl:separatorPolicy with variable length arrays. Agreed that the behaviour should be the same for arrays as for sequences. 'supressAtEnd' - separators must be output up to the last required item. That is up to the xs:minOccurs of a variable array. For simple items a default value will be output for missing required items. 'required' - separators must be output for all items. That is up to xs:maxOccurs. It is a schema definition error if required and xs:maxOccurs= unbounded is specified. 'supress' - separators are not output for missing items but for arrays there are no missing items so this is the same as dfdl:supressAtEnd. Note: a group is always 'required' so at least one member must be present. However a group could be wrapped in an element which could be optional. 4. Clarification of postfix separators, terminators,finalTerminatorCanBeMissing Agreed to postpone to next call due to lack of time. 5. Go through remaining actions Updated below 6. Test suite for DFDL DFDL will be much more usable if it is accompanied by a set of tests that provide dfdl schema, sample data and expected results. IBM will investigate whether it will be possible to publish the format of the test cases that it is developing. 7. OGF28 Call for papers Steve H has sent a request for a slot in the agenda 8. Plan to finish DFDL v1 MB agreed to investigate the tool available on gridforge for tracking problems with the spec. Agreed to start using this after version 37. Meeting closed, 15:10 Next call 02 December 13:00 UK Next action: 068 Actions raised at this meeting No Action 066 Investigate format for defining test cases IBM to see if it is possible to publish its test case format. 067 Investigate problem tracking tools. Current Actions: No Action 012 AP/SH: Update decimalCalendarScheme 10/9: Not allocated yet 17/9: No update 24/9: Add calendar binary formats to actions 22/10: No progress 16/1: proposal distributed and discussed. Will be redistributed 21/1: add locale, 04/02: changed from locale to specific properties 18/2: Need more investigation of ICU strict/lax behaviour. 08/04: Not discussed 22/04: AP to complete asap once the ICU strict/lax behaviour is understood. 29/04: No progress 06/05: No progress 13/05: Calendar has been added to latest spec version v034 but still a few details to clarify. 20/05: No Progress ... 09/06: No Progress (low priority) 17/06: SH to check ICU code for lax calendar behaviour 24/06: no progress ... 12/08: no progress 19/08: Inconsistencies are being found in ICU behaviour so Calendars need reviewing again. 26/08: Specific three character short time zones may not be maintained during round tripping when there is more than one short form for a time zone offset. Because dates and datetimes in the infoset only maintain a time zone offset so on unparsing it isn't possible to say which short form will be selected for a particular offset when there is more than one possible. Need to document. 09/09: no progress ... 14/10: no progress 21/10: Will produce a list of known issues. 28/10: Discussed ICU farctional seconds behaviour. SF to send latest understanding. 04/11: no progress 11/11: no update 18/11: no update 25/11: no update 037 All: Approach for XML Schema 1.0 UPA checks. 22/04: Several non-XML models, when expressed in their most obvious DFDL Schema form, would fail XML Schema 1.0 Unique Particle Attribution checks that police model ambiguity. And even re-jigging the model sometimes fails to fix this. Note this is equally applicable to XMl Schema 1.1 and 1.0. While the DFDL parser/unparser can happily resolve the ambiguities, the issue is one of definition. If an XSD editor that implements UPA checks is used to create DFDL Schema, then errors will be flagged. DFDL may have to adopt the position that: a)DFDL parser/unparser will not implement some/all UPA checks (exact checks tbd) b) XML Schema editors that implement UPA checks will not be suitable for all DFDL models c) If DFDL annotations are removed, the resulting pure XSD will not always be valid (ie, the equivalent XML is ambiguous and can't be modelled by XML Schema 1.0) Ongoing in case another solution can be found. 29/04: Will ask DG and S Gao for opinion before closing 06/05: Discussed S Gao email and suggestions. Decided need to review all XML UPA rules and decide which apply to dfdl. 20/05: SH or SKK to investigate 27/05: No Progress 03/06: The concern is that some dfdl schemas will fail UPA check when validation is turned on or when editted using tooling that enforces UPA checks. Renaming fields will resolve some/most issues. Need documentation that describes issue and best practice. 17/06: no change 24/06: no change 01/07: no progress ... 12/08: No Progress (lower priority) 19/08: Clarify that this action is to go through the XML UPA checks to assess impact on dfdl schemas and advice best practice. Name clashes is just one example. SH or SKK 26/08: No Progress (lower priority) 09/09: no progress ... 04/11: no progress 11/11: Steve has started to look at this. He has requested a 'consumable' definition of the UPA rules from the XSD WG members. Even non-normative Appendix H in the XSD 1.0 spec is hard to consume. 18/11: no update 25/11: Steve H has not found simpler definition so may just go through them. 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 056 MB Resolve lengthUnits=bits including fillbytes 12/08: No Progress ... 28/10: no progress 04/11: MB to look at lengthUnits = bits 11/11: no update 18/11: no update 25/11: no update 063 Write DFDL primer and test cases. 11/11: no update 25/11: no update 064 MB/SH Request WG presentation at OGF 28 25/11: Session requested 065 Resolve parsing rules for various lengthKinds 25/11: Agreed dfdl:lengthKind define how to extract the data. Didn'r t discuss if this changes escaping. 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 067 25/11:Investigate problem tracking tools. Closed actions No Action Work items: No Item target version status 005 Improvements on property descriptions not started 011 How speculative parsing works (combining choice and variable-occurence - currently these are separate) (from action 045) awaiting completion of actions 045 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 033 Numeric data - what physical reps are allowed for what logical types (from action 020) 037 ensure all behaviour documented 036 Update dfdl schema with change properties ongoing 038 Improve length section including bit handling some improvement in 036 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 051 Revised scoping rules (from action 051) 037 058 textPadCharacter %#rxx limitation and split to textxxxxPadCharacter 037 059 limit terminatorCanBeMissing to last element in schema. Ignore elsewhere. 037 060 New empty string semantic for dfdl:binaryBooleanTrueRep 037 061 Change maxOccurs violations from processing error to validation error (if not 'fixed') 037 062 Drop calendarUseZForUTC. describe zU, IU and TU symbols 037 063 DefineFormat can contain only one active format. Drop baseFormat 037 064 Define how encoding, byteorder and floating point format externally 037 065 Refactor dfdl:textNumberFormat to remove dfdl:numberBase. 037 066 document scope of selectors 037 067 document floating evaluation order 037 068 change dfdl:length to dfdl:representationLength and dfdl:lengthWithoutPadding to RepresentationLengthWithoutParsing Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

2 1

Fw: Omitted array occurrences
by Steve Hanson 26 Nov '09

26 Nov '09

As discussed on the call, here is some more in-line below in green and tagged SMH+. A couple of other things when writing this up. 1) I think separatorPolicy="required" is misleading and I'm sure contributed to Tim's questions about behaviour. Here we are using "required" to mean that all delimiters are needed, even when the data itself is not required. I think we should use "always". 2) I'd forgotten that there is also separatorPolicy="suppress". In this case, any missing element does not get a separator. The spec states this " implies the children of the sequence must have dfdl:initiator specified. " but it does not say whether the omission of an initiator is a schema definition error. Should it be? Regards Steve Hanson Programming Model Architect, WebSphere Message Brokers, OGF DFDL WG Co-Chair, Hursley, UK, Internet: smh(a)uk.ibm.com, Phone (+44)/(0) 1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 25/11/2009 16:11 ----- From: Steve Hanson/UK/IBM To: dfdl-wg(a)ogf.org Date: 25/11/2009 12:37 Subject: Re: [DFDL-WG] Omitted array occurrences Tim, Alan - my thoughts on this in blue (SMH). Regards Steve Hanson Programming Model Architect, WebSphere Message Brokers, OGF DFDL WG Co-Chair, Hursley, UK, Internet: smh(a)uk.ibm.com, Phone (+44)/(0) 1962-815848 From: Alan Powell/UK/IBM@IBMGB To: Tim Kimber/UK/IBM@IBMGB Cc: dfdl-wg(a)ogf.org, dfdl-wg-bounces(a)ogf.org Date: 19/11/2009 17:30 Subject: Re: [DFDL-WG] Omitted array occurrences Sent by: dfdl-wg-bounces(a)ogf.org Tim Comments below Need more discussion on this Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 From: Tim Kimber/UK/IBM@IBMGB To: dfdl-wg(a)ogf.org Date: 19/11/2009 12:07 Subject: [DFDL-WG] Omitted array occurrences What should the DFDL unparser do when some or all of the elements of an array are missing? I have found the following statements in v0.36 which seem relevant: Section 5.2.1 The minOccurs value is used: · to determine if an element declaration or reference is scalar or array · to determine the required minimum number of occurrences of an array both when parsing and unparsing Section 16.13 ( Note : this definition of 'required' is a repeat of the defintion in section 3 ) Definition: 'required' We define the term 'required' as follows: · A scalar element is required. · An element of a fixed-occurrence array is required. · An element of a variable-occurrence array is required if its index is less than or equal to the value of minOccurs. All other elements are not required. ... On unparsing, if an element is required, and is not part of the logical data and the element has a default value specified then it is used, otherwise it is a processing error. Section 17.3.1 : Sequence groups and separators re: the combination of separatorPolicy="suppressAtEnd" and sequenceKind="ordered": All separators must be found in the data except that when the sequence has trailing optional items, the separators are suppressed for any final missing items. Note suppressAtEnd can only be used when there is no clash with delimiters from the containing structure. My interpretation of the specification is: a) if separatorPolicy="require" then the unparser should output a separator for all missing required elements ( whether array members or not ) Is this an additional definition of a 'required' element? In which case the default value should be output. (interestingly because default is a schema property rather than a dfdl property you cannot set a default default.) SMH: The definition of 'required' relates to the data. Here we are talking about whether to output syntax. Strike 'required' from Tim's interpretation and you have the correct interpretation. b) if separatorPolicy="suppressAtEnd" then the unparser should output a separator for all non-trailing missing required elements Should set the default for any required element so it won't be missing. "On unparsing, if an element is required, and is not part of the logical data and the element has a default value specified then it is used, otherwise it is a processing error. " SMH: Tim's interpretation is not complete. The correct interpretation is "...then the unparser should output a separator for all missing elements in the sequence up to and including the last required element.". It is only optional elements beyond the last required element that benefit from this property. c) separators for missing elements must be output regardless of whether the element is required/optional, simple/complex, does/does not have a default value etc. I assume this because the term 'missing' is used rather than the very clearly-defined term 'required'. Missing just means not in the infoset and is orthogonal to optional/required. If you accept this is an additional definition of required then no. But it then forces you to set defaults for minOccurs=0 elements which will only be used in this circumstance. I'm not sure what the default for complex elements would be: all the children must have a default? . SMH: If c) is trying to say that once you have decided, via a) and b), that a separator is needed, then whether it is simple/complex, does/does not have a default, is irrelevant, then I agree. Reading between the lines, I also infer the following rules: d) if an array has maxOccurs="unbounded" and it is missing from the infoset then the unparser will not output any separators for the array e) if an array has maxOccurs!="unbounded" and it is missing from the infoset then the unparser will output a separator for each missing occurrence ( so it will output maxOccurs separators ). If minOccurs > 0 then use default. If minOccurs= 0 then output nothing. I don't think maxOccurs has any effect. SMH+: The behaviour when dealing with a repeating element (minOccurs, maxOccurs) is analogous to dealing with a sequence. You treat up to and including minOccurs as 'required', and anything beyond as 'optional'. Then you apply separatorPolicy property. So "suppressAtEnd" means you only output delimiters up to an including minOccurs, and "required" means you output delimiters up to and including maxOccurs. There's clearly a problem with the combination of maxOccurs="unbounded" and separatorPolicy="required" - this should be a schema definition error. SMH+: It is possible that some models are pretty ambiguous, and that we could be outputting something that is very difficult to parse. If it is possible to use the full DFDL armoury of parsing techniques (speculation, backtracking, data patterns, remodelling as choice and discriminators, etc) then that is a 1.0 limitation. f) if an element contains a child group, and none of the group members are present in the infoset, then the group is 'missing' and the unparser will output a separator for it. Not sure SMH: This is establishing 'missing' for a local group. Sounds right to me. The separator will be output according to a) and b). But because a local group is (1:1) in DFDL, in practice you will always get a separator. SMH+: If a local group needs to be optional it must be wrapped in a complex element. Suggested changes to the specification: - As a minimum, I think it would be useful for the specification to include a definition of 'missing'. 'Not in the infoset' SMH: That's fine for unparsing only. - DFDL does not allow min/maxOccurs on groups, so they implicitly have cardinality 1:1. Specification should specify the behaviour of the unparser when none of a group's members are present in the infoset. Agree. - The wording in 17.3.1 could be more accurate. I don't think the word 'optional' should be there ( if validation is off then the unparser will tolerate missing required elements -No. 'required' is not part of vaildation). I think the words 'trailing' and 'final' are intended to mean the same - we should standardize on 'trailing'. SMH: I agree the words could be improved. See my b) words above for example. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert(a)uk.ibm.com Tel. 01962-816742 Internal tel. 246742 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg(a)ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg(a)ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

2 1

OGF document approval process
by Alan Powell 26 Nov '09

26 Nov '09

After failing to get a response from both our area leads I have been in touch with the OGF editor who has pointed me at the OGF process document. http://www.ogf.org/documents/GFD.152.pdf. For recommendations such as DFDL there are two stages. Processing for Proposed Recommendation Documents Processing for Grid Recommendation Documents Note that to become an OGF Recommendation there must be an expert review written summarizing operational experience. "Typically, this will mean that at least two interoperable implementations should be demonstrated" Summary of Document Processing for Proposed Recommendation Documents A document may be returned to an earlier phase of the document process, if deemed necessary. Pre-submission check: Includes consensus within the OGF group, adherence to intellectual property guidelines, assignment of one or more corresponding authors, and group mailing list last call. At this point, OGF group chairs and the appropriate ADs should be informed of the intention to submit Submission: Suitably formatted document, with attention to required elements and intellectual property issues, is submitted to the OGF Editor. Initial Editor review: The OGF Editor reviews the document for completeness, general content, formatting, etc. The OGF Editor will typically confirm the pre-submission check with the AD, who will shepherd the document through the GFSG review. Initial GFSG review: Through the appropriate GFSG council, the GFSG will be given 15 days to read and comment on the document. At the end of that period, the AD will gain consensus from the GFSG as to whether the document is acceptable for advancement to Public Comment. Public Comment: The document enters a 60-day Public Comment, with notification to the OGF community and general public (i.e., through the OGF?s Web site and mailing lists). Review of comments: Authors/editors are asked to respond to Public Comments, and may elect to prepare a new version of the document. If substantial revisions are made, a further Public Comment will be sought. Final GFSG review: The same process as the initial GFSG review, with attention to Public Comments and any further changes to the document. Final Editor review: The OGF Editor will confirm the document is ready for publication. Publication: The OGF Editor will assign a GFD-R-P document number and inform the OGF community of the new document. Summary of Document Processing for Grid Recommendation Documents A document may be returned to an earlier phase of the document process, if deemed necessary. Passage of time: At least 6 months since publication as a GFD-R-P must pass. Process check: When document authors or other interested parties inform the OGF Editor of the desire to move the document to Grid Recommendation status, the Editor will check that requirements are met and seek consensus from the AD that the document is ready for advancement to Grid Recommendation. Document review: A written expert review, summarizing operational experience and documents published that reflect the readiness of the document to become a Grid Recommendation. Other parties may solicit this on behalf of the document authors. It may be submitted to the document authors, the AD, or directly to the OGF Editor. Final document preparation: If small changes are sought to the final document, the submitters must propose them in the form of a replacement document per the Errata process described herein. Public notice: The Editor will inform the OGF community of the intention to move the document to Grid Recommendation, including a summary of or link to the expert review. Final review: The AD will present the result of the review and all other evidence (such as Experimental documents) to the appropriate Council, with a recommendation for whether to change the status to a Grid Recommendation. The Council will be given 15 days to read and comment on the document. At the end of that period, the AD will gain consensus as to whether the document is acceptable for advancement. Republication: The OGF Editor will replace GFD-R-P with GFD-R in the document, and apply any other needed changes. The Editor will inform the OGF community of the new document. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

How to determine the length of an element which has text representation
by Tim Kimber 25 Nov '09

25 Nov '09

I'd like to record what was discussed and raise another point which Alan pointed out after meeting, Discussions in the meeting - dfdl:lengthKind applies only to the element on which it is specified. It has no effect whatever on the parsing of child elements/groups. - there may be some value in tolerating simple elements of type xs:string with dfdl:representation="binary". Might be useful for schemas where dfdl:representation="binary" throughout. - Currently, the position of the WG is that parsers should *always* scan to extract the text representation if there is any terminating markup in scope. Even if lengthKind='explicit'. - TK proposed the scheme outlined in his previous email, in which dfdl:lengthKind alone specifies how the parser should extract the text representation. If lengthKind="explicit", scanning is switched off and dfdl:length is used. If lengthKind="delimited" the text rep is extracted by scanning and length is ignored. - A refinement was discussed whereby dfdl:length would be checked after a scan has been performed if dfdl:lengthKind="delimited". This would make the modeling of some common formats simpler, and avoid the need for a dfdl:assert to enforce the length constraint. - MB raised the possibility that we could actually disallow dfdl:length if lengthKind='delimited'. This is the most conservative position, but general opinion was that it would be too restrictive. There still might be some value in disallowing dfdl:length for other lengthKinds. Discussions after the meeting - Alan pointed out that lengthKind="explicit" does not necessarily mean that the length of the field is fixed. dfdl:length might be specified as a DFDL expression. A common reason for doing that would be to obtain the element's length from an earlier integer field. As currently specified, if there was any markup in scope, the text rep would be extracted by scanning. Restatement of my position after today's meeting: I'm now even more convinced that dfdl:lengthKind="explicit" should switch off scanning. Here's why: a) The enumerations of lengthKind are explicit, implicit, prefixed, delimited, pattern, endOfParent. The presence of 'delimited' in that list means that in some users' minds, the other enumerations are going to be interpreted as *alternatives* to 'delimited'. b) If there's markup in scope, scanning cannot be switched off by any means. Not even by setting lengthKind='explicit' AND obtaining dfdl:length from a previous integer field. I think that's very counter-intuitive. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert(a)uk.ibm.com Tel. 01962-816742 Internal tel. 246742 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

6 7

Omitted array occurrences
by Tim Kimber 25 Nov '09

25 Nov '09

What should the DFDL unparser do when some or all of the elements of an array are missing? I have found the following statements in v0.36 which seem relevant: Section 5.2.1 The minOccurs value is used: · to determine if an element declaration or reference is scalar or array · to determine the required minimum number of occurrences of an array both when parsing and unparsing Section 16.13 ( Note : this definition of 'required' is a repeat of the defintion in section 3 ) Definition: 'required' We define the term 'required' as follows: · A scalar element is required. · An element of a fixed-occurrence array is required. · An element of a variable-occurrence array is required if its index is less than or equal to the value of minOccurs. All other elements are not required. ... On unparsing, if an element is required, and is not part of the logical data and the element has a default value specified then it is used, otherwise it is a processing error. Section 17.3.1 : Sequence groups and separators re: the combination of separatorPolicy="suppressAtEnd" and sequenceKind="ordered": All separators must be found in the data except that when the sequence has trailing optional items, the separators are suppressed for any final missing items. Note suppressAtEnd can only be used when there is no clash with delimiters from the containing structure. My interpretation of the specification is: a) if separatorPolicy="require" then the unparser should output a separator for all missing required elements ( whether array members or not ) b) if separatorPolicy="suppressAtEnd" then the unparser should output a separator for all non-trailing missing required elements c) separators for missing elements must be output regardless of whether the element is required/optional, simple/complex, does/does not have a default value etc. I assume this because the term 'missing' is used rather than the very clearly-defined term 'required'. Reading between the lines, I also infer the following rules: d) if an array has maxOccurs="unbounded" and it is missing from the infoset then the unparser will not output any separators for the array e) if an array has maxOccurs!="unbounded" and it is missing from the infoset then the unparser will output a separator for each missing occurrence ( so it will output maxOccurs separators ). f) if an element contains a child group, and none of the group members are present in the infoset, then the group is 'missing' and the unparser will output a separator for it. Suggested changes to the specification: - As a minimum, I think it would be useful for the specification to include a definition of 'missing'. - DFDL does not allow min/maxOccurs on groups, so they implicitly have cardinality 1:1. Specification should specify the behaviour of the unparser when none of a group's members are present in the infoset. - The wording in 17.3.1 could be more accurate. I don't think the word 'optional' should be there ( if validation is off then the unparser will tolerate missing required elements ). I think the words 'trailing' and 'final' are intended to mean the same - we should standardize on 'trailing'. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert(a)uk.ibm.com Tel. 01962-816742 Internal tel. 246742 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

3 2

Fw: Agenda for OGF DFDL WG call 25 November 2009 - 13:00UK (8:00 ET)
by Alan Powell 24 Nov '09

24 Nov '09

With current actions attached Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 ----- Forwarded by Alan Powell/UK/IBM on 24/11/2009 17:18 ----- From: Alan Powell/UK/IBM@IBMGB To: dfdl-wg(a)ogf.org Date: 24/11/2009 17:05 Subject: [DFDL-WG] Agenda for OGF DFDL WG call 25 November 2009 - 13:00UK (8:00 ET) 1. 045 Resolving points of uncertainty - Disciminators Continue discussion of semantics of discriminators and arrays. See email from Alan 2. 045 - parsing rules for determining length Continue discussion on how to 'turn off scanning' for text elements. 3. SeparatorPolicy=require and defaulting arrays and sequences Tim has asked for clarification of the rules for when elements are missing and separator policy is required. See email 4. Clarification of postfix separators, terminators,finalTerminatorCanBeMissing see email from Tim 5. Go through remaining actions 6. Test suite for DFDL Discuss Mike's e-mail and what IBM is doing as part of its implementation work 7. OGF28 Call for papers 8. Plan to finish DFDL v1 How to track spec issues. Updated straw man schedule Activity Schedule Who Resolve Action items - 23 Nov 2009 WG Write up work items 16 Nov - 4 Dec 2009 AP Restructure and complete specification 23 Nov - 4 Dec 2009 AP WG review 7 Dec - 18 Dec 2009 WG Incorporate review comments 4 Jan - 29 Jan 2010 AP + OGF Editor Review / Incorporate changes 1 Feb - 1 Mar 2010 OGF OGF Public Comment period (60 days) 1 Mar - 30 Apr 2010 OGF OGF 28 Munich 15-19 March 2010 Current Actions: No Action 012 AP/SH: Update decimalCalendarScheme 10/9: Not allocated yet 17/9: No update 24/9: Add calendar binary formats to actions 22/10: No progress 16/1: proposal distributed and discussed. Will be redistributed 21/1: add locale, 04/02: changed from locale to specific properties 18/2: Need more investigation of ICU strict/lax behaviour. 08/04: Not discussed 22/04: AP to complete asap once the ICU strict/lax behaviour is understood. 29/04: No progress 06/05: No progress 13/05: Calendar has been added to latest spec version v034 but still a few details to clarify. 20/05: No Progress ... 09/06: No Progress (low priority) 17/06: SH to check ICU code for lax calendar behaviour 24/06: no progress ... 12/08: no progress 19/08: Inconsistencies are being found in ICU behaviour so Calendars need reviewing again. 26/08: Specific three character short time zones may not be maintained during round tripping when there is more than one short form for a time zone offset. Because dates and datetimes in the infoset only maintain a time zone offset so on unparsing it isn't possible to say which short form will be selected for a particular offset when there is more than one possible. Need to document. 09/09: no progress ... 14/10: no progress 21/10: Will produce a list of known issues. 28/10: Discussed ICU farctional seconds behaviour. SF to send latest understanding. 04/11: no progress 11/11: no update 18/11: no update 037 All: Approach for XML Schema 1.0 UPA checks. 22/04: Several non-XML models, when expressed in their most obvious DFDL Schema form, would fail XML Schema 1.0 Unique Particle Attribution checks that police model ambiguity. And even re-jigging the model sometimes fails to fix this. Note this is equally applicable to XMl Schema 1.1 and 1.0. While the DFDL parser/unparser can happily resolve the ambiguities, the issue is one of definition. If an XSD editor that implements UPA checks is used to create DFDL Schema, then errors will be flagged. DFDL may have to adopt the position that: a)DFDL parser/unparser will not implement some/all UPA checks (exact checks tbd) b) XML Schema editors that implement UPA checks will not be suitable for all DFDL models c) If DFDL annotations are removed, the resulting pure XSD will not always be valid (ie, the equivalent XML is ambiguous and can't be modelled by XML Schema 1.0) Ongoing in case another solution can be found. 29/04: Will ask DG and S Gao for opinion before closing 06/05: Discussed S Gao email and suggestions. Decided need to review all XML UPA rules and decide which apply to dfdl. 20/05: SH or SKK to investigate 27/05: No Progress 03/06: The concern is that some dfdl schemas will fail UPA check when validation is turned on or when editted using tooling that enforces UPA checks. Renaming fields will resolve some/most issues. Need documentation that describes issue and best practice. 17/06: no change 24/06: no change 01/07: no progress ... 12/08: No Progress (lower priority) 19/08: Clarify that this action is to go through the XML UPA checks to assess impact on dfdl schemas and advice best practice. Name clashes is just one example. SH or SKK 26/08: No Progress (lower priority) 09/09: no progress ... 04/11: no progress 11/11: Steve has started to look at this. He has requested a 'consumable' definition of the UPA rules from the XSD WG members. Even non-normative Appendix H in the XSD 1.0 spec is hard to consume. 18/11: no update 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 056 MB Resolve lengthUnits=bits including fillbytes 12/08: No Progress ... 28/10: no progress 04/11: MB to look at lengthUnits = bits 11/11: no update 18/11: no update 059 9/9: SH Define how encoding, byteorder and floating point format externally 16/09: no progress 07/10: no progress 14/10: no progress 21/10: SH to investigate 28/10: no progress 04/11: no progress 11/11: SH proposal accepted. One open issue - what is the full list of built-in variables? 18/11: added dfdl:binaryFloatRepresentation and dfdl:OutputNewLine. Action Closed 061 AP Refactor dfdl:textNumberFormat to remove dfdl:numberBase. 14/10: Base 2, 8, 16 numbers are invariably integers without formatting, use of pattern etc is overkill 21/10: no progress 28/10: no progress 04/11: no progress 11/11: Reviewed AP proposal, some comments to incorporate. 18/11: Approved latest draft subject to minor comments from SF and SKK. Closed 063 Write DFDL primer and test cases. 11/11: no update 064 MB/SH Request WG presentation at OGF 28 065 Resolve parsing rules for various lengthKinds Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg(a)ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Agenda for OGF DFDL WG call 25 November 2009 - 13:00UK (8:00 ET)
by Alan Powell 24 Nov '09

24 Nov '09

045 Resolving points of uncertainty - Disciminators Continue discussion of semantics of discriminators and arrays. See email from Alan 045 - parsing rules for determining length Continue discussion on how to 'turn off scanning' for text elements. SeparatorPolicy=require and defaulting arrays and sequences Tim has asked for clarification of the rules for when elements are missing and separator policy is required. See email Clarification of postfix separators, terminators,finalTerminatorCanBeMissing see email from Tim Go through remaining actions Test suite for DFDL Discuss Mike's e-mail and what IBM is doing as part of its implementation work OGF28 Call for papers Plan to finish DFDL v1 How to track spec issues. Updated straw man schedule Activity Schedule Who Resolve Action items - 23 Nov 2009 WG Write up work items 16 Nov - 4 Dec 2009 AP Restructure and complete specification 23 Nov - 4 Dec 2009 AP WG review 7 Dec - 18 Dec 2009 WG Incorporate review comments 4 Jan - 29 Jan 2010 AP + OGF Editor Review / Incorporate changes 1 Feb - 1 Mar 2010 OGF OGF Public Comment period (60 days) 1 Mar - 30 Apr 2010 OGF OGF 28 Munich 15-19 March 2010 Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

DFDL Discriminators discussion
by Alan Powell 24 Nov '09

24 Nov '09

Following on from Stephanie's WTX example on last weeks call I think that restricting discriminator to resolving their parents provides the level of control that we need. I have updated Mike's example with this syntax Note I have flattened the schema as I find that easier to read and 'enhanced' some the the dfdl properties. To illustrate the flexibility if 'L' had a test="{ fn:exists( . ) }" discriminator then if an L record had been found then the parse would fail rather than backtracking to Blob. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell(a)uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0