September 2010 - dfdl-wg

DFDL V1 draft 43 is available
by Alan Powell 30 Sep '10

30 Sep '10

DFDL V1 draft 43 is available from gridforge. http://forge.gridforum.org/sf/docman/do/downloadDocument/projects.dfdl-wg/d… Latest entry at the top please Version Author/ Contributor History Date(yyyy-mm-dd) 043 Alan Powell Changed calendar pattern character range to (0-24) Added description of lengthPattern property Xs:fixed is for validation only Removed dfdl:hidden annotation and added hiddenGroupRef property to sequence Improved property syntax form description Added test pattern to assert and discriminator. Added message property to discriminator Changed long form of assert and discriminator to be consistent with format properties Changed regular expression language to Java or PERL. Xs:fixed is not used during parsing except to provide a default value. It is a schema definition error is the empty sequence is the content of a complex type. Added dfdl:UTF16Width to say if UTF-16 is fixed or variable width Removed appendix A (UTF-16 is variable width encoding unless UFT16Width is fixed) Clarified syntax of default value expression in defineVariable and newVariableInstances. Added Conformance and Optional Features sections Changed dfdl functions teston/off, seton/off to testbits and setbits Clarifies schema definition errors on emptyValueDelimiterPolicy and nilValueDelimiterPolicy Added prefixLength region to grammar. Added 'none' to dfdl:textNumberRoundingMode Added testValueKind to assert and discriminator Added 'suppressAtEndLax' and 'suppressedAtEndStrict' to separatorPolicy 2010-09-30 Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Minutes for OGF DFDL Working Group Call, September 29 2010
by Alan Powell 30 Sep '10

30 Sep '10

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, September 29 2010 Attendees Stephanie Fetzer (IBM) Steve Hanson (IBM) Alan Powell (IBM) Tim Kimber(IBM) Apologies Mike Beckerle (Oco) Suman Kalia (IBM) Bob McGrath (National Center for Supercomputing Applications) Alejandro Rodriguez (National Center for Supercomputing Applications) 1. Current Actions Updated Below 2. textNumberRoundingMode. 29/09: The interaction between textNumberRoundingMode and the rounding number in the numberPattern is not clearly described. It was agreed to make textNumberRoundingMode the controlling switch and add 'none' to the enumerations. 3. Syntax of assert/discriminator 29/09: Alan suggested that the value form of assert/discriminator be made the same as the element form of representation properties. Steve felt that they were not the same as format properties (eg they can have defaults) so should not have the same syntax. Agreed there will be a new property dfdl:testKind 'expression' 'pattern' Meeting closed, 16:35 Next call Wednesday 06 October 2010 15:00 UK (10:00 ET) Next action: 123 Actions raised at this meeting No Action 121 2. textNumberRoundingMode. 29/09: The interaction between textNumberRoundingMode and the rounding number in the numberPattern is not clearly described. It was agreed to make textNumberRoundingMode the controlling switch and add 'none' to the enumerations. Closed 122 Syntax of assert/discriminator 29/09: Alan suggested that the value form of assert/discriminator be made the same as the element form of representation properties. Steve felt that the assert attributes were not the same as format properties (eg they can have defaults) so should not have the same syntax. Agreed there will be a new property dfdl:testKind 'expression' 'pattern' Closed Current Actions: No Action 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update ... 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 03/03: Discussions have been taking place on the subset of tests that will be provided. 10/03: work is progressing 17/03: work is progressing 31/03: work is progressing 14/04: And XML test case format has been defined and is being tested. 21/04. Schema for TDML defined. Need to define how this and the test cases will be made public 05/05: Work still progressing 12/05: Work still progressing 02/06: Work still progressing on technical and legal considerations ... 25/08: Will chase to allow Daffodil access to test cases. The WG should define how implementation confirm that they 'conform to DFDL v1' 01/09: IBM still progressing the legal aspect. Intends to publish 100 or so tests as soon as it can, ahead of a full compliance suite. 08/09: IBM still progressing 15/09: IBM still progressing, expect tests to be available within a few weeks 22/09: IBM still progressing, expect tests to be available within a few weeks 29/09:Test cases are being prepared. 085 ALL: publicise Public comments phase to ensure a good review.. 14/04: see minutes 21/04: Press release, OMG and other standards bodies. 05/05: Alan and Steve H have contacted other standards bodies. Will ask them to add comments on spec 15/05: still no public comments 02/06: No public comments 16/06: Public comments period has ended with no external comments. Alan had posted changes made in draft 041. Steve suggested send a note to the WG highlighting these changes. Steve also suggested requesting an extension as other IBM groups may review. We discussed whether this was necessary as changes will need to be made during the implementation phase anyway. Alan to ask OGF what the process is for changes post public comment. 23/06: Still no comments. Alan will contact OGF to understand the rest of the process. 30/06: Alan has emailed Joel asking what the process is now public comment period is over and can we update the published version with WG updates. No response yet. 07/07: No response. Alan will chase up 14/07: No response from Joel. Sent email to Greg Newby by no response. 21/07: Still no response. 04/08: Joel has responded that it is up to the WG to decide if the changes are significant enough to need additional review. Alan to contact David Martin and Erwin Laure for guidance if we split the specification. 11/08: Received a response from Joel that the WG can decide if a re- public review is necessary before becoming a 'proposed recommendation'. Alan responded that the WG agreed that a re-review was not necessary. The next stage is for OGF review committee to approve publication. 11/08: Specification is now 'awaiting author changes' before being submitted to the OGF technical committee for approval as a 'proposed specification'. Alan would like to have the updated specification complete by Sept 10th. The WG needs to complete all actions by then or decide that they do not need to be included in this phase of the process. 01/09: Alan and Steve have discussed and propose Sept 30th for completion of draft 43 and closure of all actions. 08/09: Target for completion September 30. 15/09: as above 22/09: as above 29/09: Draft 43 will be published this week for WG review prior to submitting to OGF 111 Daffodil DFDL parser 11/08: Bob and Alejandro described the new implementation that they have developed. It is a new code base and is not based on the Deffudle prototype. It is written in scala and implements approximately 80% of the features in the public comments draft of DFDL V1. Alejandro will send a list of the features not implemented. We discussed the scenarios that motivated the development which was to extract data from various sources and transform into canonical formats. Bob offered to make Daffodil available for the WG to assess the functionality. IBM WG members will get approval the company to allow them to receive Daffodil. Bob raised the question that if Daffodil becomes the public implementation of DFDL then we will need to work out how that would be funded and managed. It would be helpful if IBM test cases were available to Daffodil. IBM will investigate 25/08: Alejandro had sent a list of the functions that he has implemented and Steve ahd responding indicating the extra functions he thought were essential. Since then Alejandro has implemented some of the missing functions, such as escape schemes, pre-defined variables, binary decimal numbers, etc, and will update his list. Bob is planning to make the parser available on the internet to allow testing. His organisation is being reorganised and he doesn't know what the priority of Daffodill will be so it is essential that we move quickly. It would help if IBM could indicate its support for Daffodil in some semi-formal way. 01/09: Alejandro updating Daffodil to include escape schemes, unordered sequences and ignoreCase. Daffodil being placed under formal source control in anticipation of external release. Bob has a start October deadline to create a report on what has been done for his sponsors. It would be great if we could get Daffodil on the web and have run some IBM tests so it could be highlighted at OGF 30 at end October. 08/09: Alejandro is marking up Spec draft 42 to indicate which features Daffodil implement. Bob expects Daffodil to be available on the web soon. 15/09: Alejandro had indicated in the specification which functions were implemented in Daffodill. Steve had reviewed and identified which function need to be implemented and which could be considered optional (see action 099). Alejandro is implementing the missing core functions. There was some discussion about the limitations on unordered groups. (stop value and expression not supported). It was agreed that it should be a schema definition error if dfdl:occursCountKind is 'stopValue' on any element within an unordered sequence and a floating element. 22/09: not discussed 29/09: not discussed 112 DFDL certification process 25/08: Discussed how to certify DFDL implementations. Alan to investigate if OGF have a defined process. 01/09: In progress, spec needs to state what conformance means, as part of this work 08/09: Discussed what needs to be said in the spec and agreed that details of a conformance test suite should be in another document. Alan to draft conformance section. 15/09: Alan had look at the conformance sections in XML and Schema specifications both of which indicate sections which must be implemented. None just say 'execute the test suite'. They talk in terms of conformance of document, schema and processors.. 22/09: no progress 22/09: Alan has added short Conformance and Optional Features sections to spec which was briefly discussed. Discussed naming for processors that don't implement optional features and those that implement all features. 114 OGF 30 25/08: OGF30 takes place on October 25-29 in Brussels. Should we have a WG session? 09/01: Given emergence of NCSA implementation and spec completion target of 30th Sept it makes sense to host a session at OGF 30. 08/09: Steve to request permission to go 15/09: Travel request has be submitted 22/09: DFDL session is scheduled at 11:00am Monday Oct 25th. Closed actions No Action 107 teston/testoff dfdl expression functions. Are these functions still needed. They were introduced to allow individual bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that use existence flags to see if they are still required. 04/08: Not discussed 11/08: Not discussed 25/08: Not discussed 01/09: Steve to progress by Sept 30th 08/09: Steve to progress by Sept 30th 15/09: The ISO 8583 standard has existence flags at the beginning that are encoded so cannot be defined as an array of bits. Therefore DFDL needs the ability to set individual bits within an unsigned int. However the functions, particualry SetOn/Off, as currently defined are not correct. SetOn returns a byte with the relevant bit set on. This must then be combined with other bytes which isn't very usable. Steve to circulate example of use and suggested improvements. 22/09: Steve had documented the why the functions were required to parse ISO 8583 messages. He had suggested the following improvements xs:boolean dfdl:testBit(xs:unsignedByte, xs:unsignedByte) Returns Boolean true if the bit number given by arg #2 is set on in the byte given by arg #1, otherwise returns Boolean false. xs:unsignedByte dfdl:setBits(xs:boolean+) Returns an byte being the value of the bit positions provided by the Boolean arguments, where true=1, false=0. The # of args must 8. Note that the bit numbering goes from left to right, in accordance with section 12.3.7.2 of the spec. The type was changed from unSignedLong to unSignedByte to avoid problems with padding when not enough bits are provided. 29/09: Syntax agreed. Some errors in existing function descriptions need fixing. Closed 118 2. Rules for 'missing' elements lengthKind='implicit' and xs:maxLength or xs:length is "0": element is missing lengthKind='implicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='explicit' and length is an expression : element is missing if the expression evaluates to zero. lengthKind='explicit' and length is "0": element is missing lengthKind='explicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='pattern' : element is missing if the length of the pattern match is zero lengthKind='prefixed' : element is missing if the prefixed length region parses as zero lengthKind='delimited' and delimiterPolicy='suppressed' and the end of the group has not been encountered : element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='delimited' and delimiterPolicy='suppressed' or 'suppressedAtEnd' and the end of the group has been encountered : element is missing lengthKind='delimited', all other cases : element is missing if its scanned length is zero lengthKind='endOfParent': element is missing if its scanned length is zero It is a schema definition error to specify emptyValueDelimiterPolicy 'initiator' for lengthKind 'explicit' or 'implicit'. 22/09: We discussed if these rules could be simplified to say an element is missing if it's initiator is missing or its content region is empty. Need further discussion. 22/09: Tim suggested simplifying the above rules with An element is missing 1. if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data 2. else if the content region is empty Also: It is a schema definition error if a sequence has 'initiated content and one of its children has emptyValueDelimiterPolicy or nilValueInitiatorPolicy set to 'none' or 'terminator'. It is a schema definition error to specify emptyValueDelimiterPolicy 'initiator' or 'both' for lengthKind 'explicit' or 'implicit'. Closed 119 In passing we noted that the position of the prefix length relative to the initiator was not defined in the grammar and whether the prefix could have an initiator and terminator. Need further discussion 22/09: The prefix length region is between the initiator and the content region ( leftFraming prefixLengthRegion simple/complexContent) The simpleType for the length prefix can specify any dfdl property with the exception of lengthKind 'prefixed' and 'endOfParent' Closed Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 070 Write DFDL primer 071 Write test cases. 083 Implement RFC2116 109 Add 'message' attribute to dfdl:discriminator 01/09: Closed: Conclusion was that this is genuinely useful, and has low implementation cost. Will add a 'message' attribute to dfdl:discriminator. 43 110 Clarify expression limitations for defineVariable, newVariableInstance and setVariable 01/09: Closed: Spec should distinguish newVariableInstance defaultValue from setVariable value. For newVariableInstance defaultValue, disallow downward references and references to self (must be usable from the point of declaration) For setVariable allow downward references and references to self, and always evaluate at end of component. (defineVariable defaultValue should be same as newVariableInstance) 43 113 Be specific about regular expression syntax 43 108 Updates to hidden mechanism 43 99 Updates to reflect subsetting and unparser optionality 43 112 Define what conformance to spec means 43 115 Clarify allowed lengths for signed binary integers 43 116 2. xs:minLength The spec currently states When an element declaration specifies a default value, and has type xs:string, then xs:minLength must be specified and must be 1 or greater. It is a schema definition error otherwise. The process for defaults and nils means this restriction is no longer needed. Agreed 43 117 3. Is UTF-16 a fixed width or variable width encoding Proposal -UCS2 is a fixed length encoding -UTF-16 is a variable width encoding. - A new property dfdl:UTF16Fixed 'yes ¦ no' treat UTF-16 as a fixed width encoding 15/09: Closed 43 118 2. Document that an empty sequence that is the content of complex type is ignored even when it has annotations It is a schema definition error if an empty sequence is the content of a complex type 43 099 Splitting the specification in simpler sections. 22/09: We reviewed the proposed list of optional features and approved. These will be documented by adding a section that lists these features rather than making them inline. It will be closely related to the conformance section.. Closed 43 101 Semantics of 'fixed' Proposal: - xs:fixed will not be used for parsing but only for validation and for providing a default value on unparsing. - A new dfdl function will be defined that applies only to simple element and tests whether the element exists including applying all the schema facets and other constraints. 22/09: Discussed whether dfdl:checkConstaints should included exists function. It isn't obvious what the return code should be for elements that don't exist. checkConstarints will check that element does exist. 'true' means the element exists and is valid, 'false' means doesn't exist or exists but doesn't meet constraints. The parameter is a path to a simple or complex element. If complex and it exists return 'true' Closed 43 108 dfdl:hidden Global group approach Summary: Particle to hide can be a local element, element ref, local sequence, local choice or group ref Particle is removed from its parent into a dedicated global group of composition sequence and replaced in the parent by a new empty local sequence The new empty local sequence carries a dfdl:hiddenGroupRef property, other DFDL properties are not allowed Pros: Removal of all DFDL annotations and use of the resultant pure XSD results in same infoset Global group can be reused Cons: Making something hidden is a refactor operation Global group sequence needs DFDL properties setting correctly Alejandro had implemented extensions to the hidden function. 1. Allow hidden sequence to reference a global element. Decided against as Suman had identified some problems with namesapces. 2. Allow the reference global group to contain a choice in addition to a sequence. It was agreed this was a useful extension. 22/09: Closed 43 113 Regular Expressions. 15/09: Agreed that should just say that either JAva or PERL regular expressions can be used and for portability the common subset of functions should be used. 22/09: Closed 43 113b Regular Expressions for Assert/Discriminator. Allowed as alternative to expression on dfdl:assert and dfdl:discriminator Pattern may be specified as attribute or element value Attribute: new testPattern attribute Element value: braces ( ) indicate pattern instead of expression 15/09: Do not need the braces as expressions start with '{'. Need to state rules for where the patter matching starts in the data stream. 22/09: Closed 43 115 Clarify allowed lengths for signed integer types when rep is binary integer (ie, two's complement) 01/09: No technical reason to restrict lengths to 2^x bytes, could be odd, could be bits. But rare in practise so if we do relax, limit any core subset to 2^x bytes. 22/09: Agreed that there should be not restrictions on lengths. Closed 43 107 teston/testoff dfdl expression functions. 22/09: Steve had documented the why the functions were required to parse ISO 8583 messages. He had suggested the following improvements xs:boolean dfdl:testBit(xs:unsignedByte, xs:unsignedByte) Returns Boolean true if the bit number given by arg #2 is set on in the byte given by arg #1, otherwise returns Boolean false. xs:unsignedByte dfdl:setBits(xs:boolean+) Returns an byte being the value of the bit positions provided by the Boolean arguments, where true=1, false=0. The # of args must 8. Note that the bit numbering goes from left to right, in accordance with section 12.3.7.2 of the spec. The type was changed from unSignedLong to unSignedByte to avoid problems with padding when not enough bits are provided. 29/09: Syntax agreed. Some errors in existing function descriptions need fixing. Closed 118 2. Rules for 'missing' elements 22/09: Tim suggested simplifying the above rules with An element is missing 1. if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data 2. else if the content region is empty Also: It is a schema definition error if a sequence has 'initiated content and one of its children has emptyValueDelimiterPolicy or nilValueInitiatorPolicy set to 'none' or 'terminator'. It is a schema definition error to specify emptyValueDelimiterPolicy 'initiator' or 'both' for lengthKind 'explicit' or 'implicit'. Closed 119 In passing we noted that the position of the prefix length relative to the initiator was not defined in the grammar and whether the prefix could have an initiator and terminator. Need further discussion 22/09: The prefix length region is between the initiator and the content region ( leftFraming prefixLengthRegion simple/complexContent) The simpleType for the length prefix can specify any dfdl property with the exception of lengthKind 'prefixed' and 'endOfParent' Closed 121 2. textNumberRoundingMode. 29/09: The interaction between textNumberRoundingMode and the rounding number in the numberPattern is not clearly described. It was agreed to make textNumberRoundingMode the controlling switch and add 'none' to the enumerations. 43 122 Syntax of assert/discriminator 29/09: Alan suggested that the value form of assert/discriminator be made the same as the element form of representation properties. Steve felt that the assert attributes were not the same as format properties (eg they can have defaults) so should not have the same syntax. Agreed there will be a new property dfdl:testKind 'expression' 'pattern' Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Detailed questions re: emptyValuelimiterPolicy
by Tim Kimber 29 Sep '10

29 Sep '10

1) Must an implementation verify that emptyValuedelimiterPolicy = 'none' for all suppressed elements that have initiators/terminators? 2) Must an implementation verify that emptyValuedelimiterPolicy = 'none' for all children of a missing required element that have initiators/terminators? 3) Must an implementation verify that emptyValuedelimiterPolicy != 'required' for all groups within the scope of a missing required complex element? I think the answer to all three questions will be 'yes'. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert(a)uk.ibm.com Tel. 01962-816742 Internal tel. 246742 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Agenda for OGF DFDL WG call 29 September 2010 15:00UK (10:00 ET)
by Alan Powell 28 Sep '10

28 Sep '10

1. Current Actions Current Actions: No Action 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update ... 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 03/03: Discussions have been taking place on the subset of tests that will be provided. 10/03: work is progressing 17/03: work is progressing 31/03: work is progressing 14/04: And XML test case format has been defined and is being tested. 21/04. Schema for TDML defined. Need to define how this and the test cases will be made public 05/05: Work still progressing 12/05: Work still progressing 02/06: Work still progressing on technical and legal considerations ... 25/08: Will chase to allow Daffodil access to test cases. The WG should define how implementation confirm that they 'conform to DFDL v1' 01/09: IBM still progressing the legal aspect. Intends to publish 100 or so tests as soon as it can, ahead of a full compliance suite. 08/09: IBM still progressing 15/09: IBM still progressing, expect tests to be available within a few weeks 22/09: IBM still progressing, expect tests to be available within a few weeks 085 ALL: publicise Public comments phase to ensure a good review.. 14/04: see minutes 21/04: Press release, OMG and other standards bodies. 05/05: Alan and Steve H have contacted other standards bodies. Will ask them to add comments on spec 15/05: still no public comments 02/06: No public comments 16/06: Public comments period has ended with no external comments. Alan had posted changes made in draft 041. Steve suggested send a note to the WG highlighting these changes. Steve also suggested requesting an extension as other IBM groups may review. We discussed whether this was necessary as changes will need to be made during the implementation phase anyway. Alan to ask OGF what the process is for changes post public comment. 23/06: Still no comments. Alan will contact OGF to understand the rest of the process. 30/06: Alan has emailed Joel asking what the process is now public comment period is over and can we update the published version with WG updates. No response yet. 07/07: No response. Alan will chase up 14/07: No response from Joel. Sent email to Greg Newby by no response. 21/07: Still no response. 04/08: Joel has responded that it is up to the WG to decide if the changes are significant enough to need additional review. Alan to contact David Martin and Erwin Laure for guidance if we split the specification. 11/08: Received a response from Joel that the WG can decide if a re- public review is necessary before becoming a 'proposed recommendation'. Alan responded that the WG agreed that a re-review was not necessary. The next stage is for OGF review committee to approve publication. 11/08: Specification is now 'awaiting author changes' before being submitted to the OGF technical committee for approval as a 'proposed specification'. Alan would like to have the updated specification complete by Sept 10th. The WG needs to complete all actions by then or decide that they do not need to be included in this phase of the process. 01/09: Alan and Steve have discussed and propose Sept 30th for completion of draft 43 and closure of all actions. 08/09: Target for completion September 30. 15/09: as above 22/09: as above 107 teston/testoff dfdl expression functions. Are these functions still needed. They were introduced to allow individual bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that use existence flags to see if they are still required. 04/08: Not discussed 11/08: Not discussed 25/08: Not discussed 01/09: Steve to progress by Sept 30th 08/09: Steve to progress by Sept 30th 15/09: The ISO 8583 standard has existence flags at the beginning that are encoded so cannot be defined as an array of bits. Therefore DFDL needs the ability to set individual bits within an unsigned int. However the functions, particualry SetOn/Off, as currently defined are not correct. SetOn returns a byte with the relevant bit set on. This must then be combined with other bytes which isn't very usable. Steve to circulate example of use and suggested improvements. 22/09: Steve had documented the why the functions were required to parse ISO 8583 messages. He had suggested the following improvements xs:boolean dfdl:testBit(xs:unsignedByte, xs:unsignedByte) Returns Boolean true if the bit number given by arg #2 is set on in the byte given by arg #1, otherwise returns Boolean false. xs:unsignedByte dfdl:setBits(xs:boolean+) Returns an byte being the value of the bit positions provided by the Boolean arguments, where true=1, false=0. The # of args must 8. Note that the bit numbering goes from left to right, in accordance with section 12.3.7.2 of the spec. The type was changed from unSignedLong to unSignedByte to avoid problems with padding when not enough bits are provided. 111 Daffodil DFDL parser 11/08: Bob and Alejandro described the new implementation that they have developed. It is a new code base and is not based on the Deffudle prototype. It is written in scala and implements approximately 80% of the features in the public comments draft of DFDL V1. Alejandro will send a list of the features not implemented. We discussed the scenarios that motivated the development which was to extract data from various sources and transform into canonical formats. Bob offered to make Daffodil available for the WG to assess the functionality. IBM WG members will get approval the company to allow them to receive Daffodil. Bob raised the question that if Daffodil becomes the public implementation of DFDL then we will need to work out how that would be funded and managed. It would be helpful if IBM test cases were available to Daffodil. IBM will investigate 25/08: Alejandro had sent a list of the functions that he has implemented and Steve ahd responding indicating the extra functions he thought were essential. Since then Alejandro has implemented some of the missing functions, such as escape schemes, pre-defined variables, binary decimal numbers, etc, and will update his list. Bob is planning to make the parser available on the internet to allow testing. His organisation is being reorganised and he doesn't know what the priority of Daffodill will be so it is essential that we move quickly. It would help if IBM could indicate its support for Daffodil in some semi-formal way. 01/09: Alejandro updating Daffodil to include escape schemes, unordered sequences and ignoreCase. Daffodil being placed under formal source control in anticipation of external release. Bob has a start October deadline to create a report on what has been done for his sponsors. It would be great if we could get Daffodil on the web and have run some IBM tests so it could be highlighted at OGF 30 at end October. 08/09: Alejandro is marking up Spec draft 42 to indicate which features Daffodil implement. Bob expects Daffodil to be available on the web soon. 15/09: Alejandro had indicated in the specification which functions were implemented in Daffodill. Steve had reviewed and identified which function need to be implemented and which could be considered optional (see action 099). Alejandro is implementing the missing core functions. There was some discussion about the limitations on unordered groups. (stop value and expression not supported). It was agreed that it should be a schema definition error if dfdl:occursCountKind is 'stopValue' on any element within an unordered sequence and a floating element. 22/09: not discussed 112 DFDL certification process 25/08: Discussed how to certify DFDL implementations. Alan to investigate if OGF have a defined process. 01/09: In progress, spec needs to state what conformance means, as part of this work 08/09: Discussed what needs to be said in the spec and agreed that details of a conformance test suite should be in another document. Alan to draft conformance section. 15/09: Alan had look at the conformance sections in XML and Schema specifications both of which indicate sections which must be implemented. None just say 'execute the test suite'. They talk in terms of conformance of document, schema and processors.. 22/09: no progress 114 OGF 30 25/08: OGF30 takes place on October 25-29 in Brussels. Should we have a WG session? 09/01: Given emergence of NCSA implementation and spec completion target of 30th Sept it makes sense to host a session at OGF 30. 08/09: Steve to request permission to go 15/09: Travel request has be submitted 118 2. Rules for 'missing' elements lengthKind='implicit' and xs:maxLength or xs:length is "0": element is missing lengthKind='implicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='explicit' and length is an expression : element is missing if the expression evaluates to zero. lengthKind='explicit' and length is "0": element is missing lengthKind='explicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='pattern' : element is missing if the length of the pattern match is zero lengthKind='prefixed' : element is missing if the prefixed length region parses as zero lengthKind='delimited' and delimiterPolicy='suppressed' and the end of the group has not been encountered : element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='delimited' and delimiterPolicy='suppressed' or 'suppressedAtEnd' and the end of the group has been encountered : element is missing lengthKind='delimited', all other cases : element is missing if its scanned length is zero lengthKind='endOfParent': element is missing if its scanned length is zero It is a schema definition error to specify emptyValueDelimiterPolicy 'initiator' for lengthKind 'explicit' or 'implicit'. 22/09: We discussed if these rules could be simplified to say an element is missing if it's initiator is missing or its content region is empty. Need further discussion. 119 In passing we noted that the position of the prefix length relative to the initiator was not defined in the grammar and whether the prefix could have an initiator and terminator. Need further discussion Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Minutes for OGF DFDL Working Group Call, September 22 2010
by Alan Powell 23 Sep '10

23 Sep '10

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, September 22 2010 Attendees Stephanie Fetzer (IBM) Steve Hanson (IBM) Alan Powell (IBM) Tim Kimber(IBM) Apologies Mike Beckerle (Oco) Suman Kalia (IBM) Bob McGrath (National Center for Supercomputing Applications) Alejandro Rodriguez (National Center for Supercomputing Applications) 1. Current Actions Updated Below 2. Rules for 'missing' elements lengthKind='implicit' and xs:maxLength or xs:length is "0": element is missing lengthKind='implicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='explicit' and length is an expression : element is missing if the expression evaluates to zero. lengthKind='explicit' and length is "0": element is missing lengthKind='explicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='pattern' : element is missing if the length of the pattern match is zero lengthKind='prefixed' : element is missing if the prefixed length region parses as zero lengthKind='delimited' and delimiterPolicy='suppressed' and the end of the group has not been encountered : element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='delimited' and delimiterPolicy='suppressed' or 'suppressedAtEnd' and the end of the group has been encountered : element is missing lengthKind='delimited', all other cases : element is missing if its scanned length is zero lengthKind='endOfParent': element is missing if its scanned length is zero It is a schema definition error to specify emptyValueDelimiterPolicy 'initiator' for lengthKind 'explicit' or 'implicit'. 22/09: We discussed if these rules could be simplified to say an element is missing if it's initiator is missing or its content region is empty. Need further discussion. In passing we noted that the position of the prefix length relative to the initiator was not defined in the grammar and whether the prefix could have an initiator and terminator. Need further discussion Meeting closed, 16:35 Next call Wednesday 29 September 2010 15:00 UK (10:00 ET) Next action: 120 Actions raised at this meeting No Action 118 2. Rules for 'missing' elements lengthKind='implicit' and xs:maxLength or xs:length is "0": element is missing lengthKind='implicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='explicit' and length is an expression : element is missing if the expression evaluates to zero. lengthKind='explicit' and length is "0": element is missing lengthKind='explicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='pattern' : element is missing if the length of the pattern match is zero lengthKind='prefixed' : element is missing if the prefixed length region parses as zero lengthKind='delimited' and delimiterPolicy='suppressed' and the end of the group has not been encountered : element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='delimited' and delimiterPolicy='suppressed' or 'suppressedAtEnd' and the end of the group has been encountered : element is missing lengthKind='delimited', all other cases : element is missing if its scanned length is zero lengthKind='endOfParent': element is missing if its scanned length is zero It is a schema definition error to specify emptyValueDelimiterPolicy 'initiator' for lengthKind 'explicit' or 'implicit'. 22/09: We discussed if these rules could be simplified to say an element is missing if it's initiator is missing or its content region is empty. Need further discussion. 119 Clarify position and restriction on prefix length element. In passing we noted that the position of the prefix length relative to the initiator was not defined in the grammar and whether the prefix could have an initiator and terminator. Need further discussion Current Actions: No Action 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update ... 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 03/03: Discussions have been taking place on the subset of tests that will be provided. 10/03: work is progressing 17/03: work is progressing 31/03: work is progressing 14/04: And XML test case format has been defined and is being tested. 21/04. Schema for TDML defined. Need to define how this and the test cases will be made public 05/05: Work still progressing 12/05: Work still progressing 02/06: Work still progressing on technical and legal considerations ... 25/08: Will chase to allow Daffodil access to test cases. The WG should define how implementation confirm that they 'conform to DFDL v1' 01/09: IBM still progressing the legal aspect. Intends to publish 100 or so tests as soon as it can, ahead of a full compliance suite. 08/09: IBM still progressing 15/09: IBM still progressing, expect tests to be available within a few weeks 22/09: IBM still progressing, expect tests to be available within a few weeks 085 ALL: publicise Public comments phase to ensure a good review.. 14/04: see minutes 21/04: Press release, OMG and other standards bodies. 05/05: Alan and Steve H have contacted other standards bodies. Will ask them to add comments on spec 15/05: still no public comments 02/06: No public comments 16/06: Public comments period has ended with no external comments. Alan had posted changes made in draft 041. Steve suggested send a note to the WG highlighting these changes. Steve also suggested requesting an extension as other IBM groups may review. We discussed whether this was necessary as changes will need to be made during the implementation phase anyway. Alan to ask OGF what the process is for changes post public comment. 23/06: Still no comments. Alan will contact OGF to understand the rest of the process. 30/06: Alan has emailed Joel asking what the process is now public comment period is over and can we update the published version with WG updates. No response yet. 07/07: No response. Alan will chase up 14/07: No response from Joel. Sent email to Greg Newby by no response. 21/07: Still no response. 04/08: Joel has responded that it is up to the WG to decide if the changes are significant enough to need additional review. Alan to contact David Martin and Erwin Laure for guidance if we split the specification. 11/08: Received a response from Joel that the WG can decide if a re- public review is necessary before becoming a 'proposed recommendation'. Alan responded that the WG agreed that a re-review was not necessary. The next stage is for OGF review committee to approve publication. 11/08: Specification is now 'awaiting author changes' before being submitted to the OGF technical committee for approval as a 'proposed specification'. Alan would like to have the updated specification complete by Sept 10th. The WG needs to complete all actions by then or decide that they do not need to be included in this phase of the process. 01/09: Alan and Steve have discussed and propose Sept 30th for completion of draft 43 and closure of all actions. 08/09: Target for completion September 30. 15/09: as above 22/09: as above 107 teston/testoff dfdl expression functions. Are these functions still needed. They were introduced to allow individual bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that use existence flags to see if they are still required. 04/08: Not discussed 11/08: Not discussed 25/08: Not discussed 01/09: Steve to progress by Sept 30th 08/09: Steve to progress by Sept 30th 15/09: The ISO 8583 standard has existence flags at the beginning that are encoded so cannot be defined as an array of bits. Therefore DFDL needs the ability to set individual bits within an unsigned int. However the functions, particualry SetOn/Off, as currently defined are not correct. SetOn returns a byte with the relevant bit set on. This must then be combined with other bytes which isn't very usable. Steve to circulate example of use and suggested improvements. 22/09: Steve had documented the why the functions were required to parse ISO 8583 messages. He had suggested the following improvements xs:boolean dfdl:testBit(xs:unsignedByte, xs:unsignedByte) Returns Boolean true if the bit number given by arg #2 is set on in the byte given by arg #1, otherwise returns Boolean false. xs:unsignedByte dfdl:setBits(xs:boolean+) Returns an byte being the value of the bit positions provided by the Boolean arguments, where true=1, false=0. The # of args must 8. Note that the bit numbering goes from left to right, in accordance with section 12.3.7.2 of the spec. The type was changed from unSignedLong to unSignedByte to avoid problems with padding when not enough bits are provided. 111 Daffodil DFDL parser 11/08: Bob and Alejandro described the new implementation that they have developed. It is a new code base and is not based on the Deffudle prototype. It is written in scala and implements approximately 80% of the features in the public comments draft of DFDL V1. Alejandro will send a list of the features not implemented. We discussed the scenarios that motivated the development which was to extract data from various sources and transform into canonical formats. Bob offered to make Daffodil available for the WG to assess the functionality. IBM WG members will get approval the company to allow them to receive Daffodil. Bob raised the question that if Daffodil becomes the public implementation of DFDL then we will need to work out how that would be funded and managed. It would be helpful if IBM test cases were available to Daffodil. IBM will investigate 25/08: Alejandro had sent a list of the functions that he has implemented and Steve ahd responding indicating the extra functions he thought were essential. Since then Alejandro has implemented some of the missing functions, such as escape schemes, pre-defined variables, binary decimal numbers, etc, and will update his list. Bob is planning to make the parser available on the internet to allow testing. His organisation is being reorganised and he doesn't know what the priority of Daffodill will be so it is essential that we move quickly. It would help if IBM could indicate its support for Daffodil in some semi-formal way. 01/09: Alejandro updating Daffodil to include escape schemes, unordered sequences and ignoreCase. Daffodil being placed under formal source control in anticipation of external release. Bob has a start October deadline to create a report on what has been done for his sponsors. It would be great if we could get Daffodil on the web and have run some IBM tests so it could be highlighted at OGF 30 at end October. 08/09: Alejandro is marking up Spec draft 42 to indicate which features Daffodil implement. Bob expects Daffodil to be available on the web soon. 15/09: Alejandro had indicated in the specification which functions were implemented in Daffodill. Steve had reviewed and identified which function need to be implemented and which could be considered optional (see action 099). Alejandro is implementing the missing core functions. There was some discussion about the limitations on unordered groups. (stop value and expression not supported). It was agreed that it should be a schema definition error if dfdl:occursCountKind is 'stopValue' on any element within an unordered sequence and a floating element. 22/09: not discussed 112 DFDL certification process 25/08: Discussed how to certify DFDL implementations. Alan to investigate if OGF have a defined process. 01/09: In progress, spec needs to state what conformance means, as part of this work 08/09: Discussed what needs to be said in the spec and agreed that details of a conformance test suite should be in another document. Alan to draft conformance section. 15/09: Alan had look at the conformance sections in XML and Schema specifications both of which indicate sections which must be implemented. None just say 'execute the test suite'. They talk in terms of conformance of document, schema and processors.. 22/09: no progress 114 OGF 30 25/08: OGF30 takes place on October 25-29 in Brussels. Should we have a WG session? 09/01: Given emergence of NCSA implementation and spec completion target of 30th Sept it makes sense to host a session at OGF 30. 08/09: Steve to request permission to go 15/09: Travel request has be submitted 115 Clarify allowed lengths for signed integer types when rep is binary integer (ie, two's complement) 01/09: No technical reason to restrict lengths to 2^x bytes, could be odd, could be bits. But rare in practise so if we do relax, limit any core subset to 2^x bytes. 08/09: not discussed 15/09: not discussed 22/09: Agreed that there should be not restrictions on lengths. Closed 118 2. Rules for 'missing' elements lengthKind='implicit' and xs:maxLength or xs:length is "0": element is missing lengthKind='implicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='explicit' and length is an expression : element is missing if the expression evaluates to zero. lengthKind='explicit' and length is "0": element is missing lengthKind='explicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='pattern' : element is missing if the length of the pattern match is zero lengthKind='prefixed' : element is missing if the prefixed length region parses as zero lengthKind='delimited' and delimiterPolicy='suppressed' and the end of the group has not been encountered : element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='delimited' and delimiterPolicy='suppressed' or 'suppressedAtEnd' and the end of the group has been encountered : element is missing lengthKind='delimited', all other cases : element is missing if its scanned length is zero lengthKind='endOfParent': element is missing if its scanned length is zero It is a schema definition error to specify emptyValueDelimiterPolicy 'initiator' for lengthKind 'explicit' or 'implicit'. 22/09: We discussed if these rules could be simplified to say an element is missing if it's initiator is missing or its content region is empty. Need further discussion. 119 In passing we noted that the position of the prefix length relative to the initiator was not defined in the grammar and whether the prefix could have an initiator and terminator. Need further discussion Closed actions No Action 117 3. Is UTF-16 a fixed width or variable width encoding Appendix A: About UTF-16 and Unicode Character Codes above 0xFFFF When we define UTF-16 to be a fixed-width double-byte wide character set we say that each UTF-16 codepoint is represented by 2 bytes. Notice the careful use of the term 'codepoint' here. Unicode/ISO10646 characters can have character codes as large as 0x10FFFF which requires 3 bytes to store (21 bits actually); however in UTF-16 characters with more than 2 bytes of code are encoded as two codepoints, called a surrogate pair; hence, UTF-16 is fixed-width, 2 bytes per codepoint. It is not 2 bytes per Unicode character. UTF-16 is really a variable-width encoding, but the characters that require the surrogate-pair treatment are so infrequently used that UTF-16 is most often treated like a 16-bit fixed-width character set. It is the acknowledgement of the existence of surrogate pairs that leads to the ?codepoint? vs. ?character code? distinction. UTF-32 is a fixed width encoding with a full 4-bytes per character code. It represents all of Unicode with the same width per character. Hence, when we refer to lengths in character strings we will often refer to length in characters, but we qualify that it means 2-byte codepoints when the character set encoding is UTF-16. Hence, when the property lengthUnitKind is 'characters' and the charset is 'UTF-16', then the units are actually 16-bit codepoints, not Unicode characters. Proposal -UCS2 is a fixed length encoding -UTF-16 is a variable width encoding. - A new property dfdl:UTF16Fixed 'yes ¦ no' treat UTF-16 as a fixed width encoding 15/09: Proposal agreed. Closed 099 Splitting the specification in simpler sections. 07/07: Steve sent a proposal but not discussed. Alan will arrange a separate call. 14/07:Discussed Steve's proposal and Suman's and Alan's comments. Need to add choice, validation, facets. Also how does an implementation declare which subsets it supports. Suggested levels and/or profiles. Steve highlighted a problem when a DFDL schema from an implementation of just the core functions was moved to a full DFDL implementation what should happen about the missing properties. Does the full implementation need to be aware of subsets of functions? Should it raise a schema definition error for use of a function not in the subset. 21/07: no progress 04/08: Steve had updated proposed groups of function. (Subset_proposal_v2.ppt). We discussed whether its is better to have discrete sets of functions or expanding levels of function. Purpose of subsetting is: 1. Allow simpler implementations. (main purpose) 2. Simplify tooling 3. Simplify specification. Steve to contact previous members of WG to check if we have the correct subsets 11/08: Steve sent an email to previous members of the WG asking for opinions on splitting the specification. Bob McGrath from National Center For Supercomputing responded that they had implemented about 80% of the function. Alejandro will send a description of the function they have implemented. Action will be raised to track the Daffodil implementation 11/08: not discussed 01/09: NCSA implementation description received. Making the unparser optional is a good idea (NCSA do not need one) . Work will progress on the subsets. 08/09: No progress 15/09: Steve proposed making 'obscure' properties optional rather than subsetting parts of the specification. See minutes for proposed list of optional properties. 22/09: We reviewed the proposed list of optional features and approved. These will be documented by adding a section that lists these features rather than making them inline. It will be closely related to the conformance section.. Closed 101 Semantics of 'fixed' 21/07: Discussed whether not matching the 'fixed' value should be a validation error or processing error. Decided that for consistency it should be a validation error. It would be useful however to avoid having to duplication of facet information in an assert which could become unwieldy for, say, a large enumeration. Suggestions - a parser option that 'converted all validation errors to processing errors' - a dfdl expression function that 'applied all facets' or 'applied specific facet' to a particular element. Stephanie will produce some examples of how this could be used.. 04/08: Stephanie had produced examples but they were not discussed due to lack of time 11/08: We started to discuss Stephanie's HIPPA example but ran out of time. 25/08: Not discussed 01/09: Discuss next week 08/09: Stephanie sent an example of an X12 document showing how an element with the same name was defined in different groups with different enumerations. Proposal: - xs:fixed will not be used for parsing but only for validation and for providing a default value on unparsing. - A new dfdl function will be defined that applies only to simple element and tests whether the element exists including applying all the schema facets. (need to check with Tim why he wanted to only apply enumerations) dfdl:exists( xpath , true ¦ false) true means apply facets, false means don'e apply facets. <xs:element ref="REF_BillingProviderTaxIdentification_2010AA"> <xs:annotation> <xs:documentation>Discrimination needed to distinguish REF segments</xs:documentation> <xs:appinfo source=" http://www.ogf.org/dfdl/"> <dfdl:discriminator test="{dfdl:exists(./REF01__ReferenceIdentificationQualifier, true)}"/> </xs:appinfo> </xs:annotation> 15/09: Decided that a separate dfdl:checkConstraints (or other suitable name) function, that checks all constraints not just facets, was simpler than extending fn:exists. It applies to both simple and complex element and to groups 22/09: Discussed whether dfdl:checkConstaints should included exists function. It isn't obvious what the return code should be for elements that don't exist. checkConstarints will check that element does exist. 'true' means the element exists and is valid, 'false' means doesn't exist or exists but doesn't meet constraints. The parameter is a path to a simple or complex element. If complex and it exists return 'true' Closed 108 dfdl:hidden There has been some discussion on whether the 'hidden' global group should be indicated in some way. 04/08: A lively discussion. The specification is works as currently defined so whether changes need to be made to make tooling easier. There shouldn't be 'conventions' in particular tooling as they must be able to properly deal with schema from other tools that would not obey those conventions. Steve stated that it is often dangerous to hide too much from users when they can see they underlying schema. To be continued. 25/08: there has been some offline discussions about simplifying how hidden elements are implemented. The proposal is dfdl:hidden property on xs:element only xs:minOccurs and xs:maxOccurs MUST be 0 when hidden dfdl:minOccurs and dfdl:maxOccurs for hidden elements only. An element is 'required' when dfdl:minOccurs >0 and normal default processing occurs. The schema, without dfdl annotations, must match the infoset so assumption is that non-DFDL tools, such as mappers, will ignore/not show elements with xs:minOccurs and xs:maxOccurs = '0' 01/09: The above proposal is flawed due to use of maxOccurs = 0 (this was identified back in 2008 hence current spec). Bob confirmed that NCSA models use hidden in a big way, so punting hidden beyond 1.0 is not an option. Two candidates: - As per spec but with syntactic improvements to make it clear that the two xs:sequences do not take any dfdl:sequence properties - Place a flag directly on a local element and force minOccurs to be 0. Simpler syntax but the semantic changes, as the element *could* be legally in the infoset, although a DFDL parser would never put it there. Steve will circulate the two proposals for next week. Bob to talk to Alejandro as the NCSA implementation is currently more flexible than the spec, allowing the groupref to point to a choice, and an elementref. Are these really needed? 08/09: Discussed the Global Group and Hidden Flag approaches. Decided to stay with Global Group with dfdl:sequence properties rather than the dfdl:hidden annotation. It was agreed that there would be no extra properties on the 'hidden' global group as the syntax was messy as it should really be on the sequence and there are currently no dfdl properties on global groups. Global group approach Summary: Particle to hide can be a local element, element ref, local sequence, local choice or group ref Particle is removed from its parent into a dedicated global group of composition sequence and replaced in the parent by a new empty local sequence The new empty local sequence carries a dfdl:hiddenGroupRef property, other DFDL properties are not allowed Pros: Removal of all DFDL annotations and use of the resultant pure XSD results in same infoset Global group can be reused Cons: Making something hidden is a refactor operation Global group sequence needs DFDL properties setting correctly The Daffodil parser allows the hidden annotation to reference global elements in addition to global groups. It was noted that this lost the particle properties but we need to discuss with Alejandro. 15/09: Alan circulated section describing Hidden Sequence Groups. Noted that hidden is now only described in the sequence sections of the specficiation. Noted some editorial changes. Alejandro had implemented extensions to the hidden function. 1. Allow hidden sequence to reference a global element. Decided against as Suman had identified some problems with namesapces. 2. Allow the reference global group to contain a choice in addition to a sequence. It was agreed this was a useful extension. 22/09: Closed 113 Regular Expressions. 25/08: The DFDL regular expressions should provide lookahead and backreferences. Is the current regular expression language sufficient? a. Is the XML regular expression language the correct one to use. Tim asked if DFDL needs to specify an language at all and should leave it to implementers to pick one. That would inhibit portability of schema. 01/09: There are many variations of regexp language, it seems wise to specify one that we know contains functions like lookaround, which makes it easy to say things like 'give me everything up to but not including x'. This rules out XML Schema and POSIX, it needs Perl 5 or Java. 08/09: Agreed that specification should define the regular expression language (if only by referring to other specifications) . Should allow a common subset of PERL and Java expressions languages. Alan to update regular expression section. 15/09: Agreed that should just say that either JAva or PERL regular expressions can be used and for portability the common subset of functions should be used. 22/09: Closed 113b Regular Expressions for Assert/Discriminator. 25/08: The DFDL regular expressions should provide lookahead and backreferences. Is the current regular expression language sufficient? b. A regular expression property on an assert/discriminator as an alternative to the test expression. Either a DFDL expression or a regular expression could be specified but not both. 01/09: Tim to convince Steve (via example) that use of regexp in asserts is needed in 1.0. 08/09: Agreed that this is a useful function Allowed as alternative to expression on dfdl:assert and dfdl:discriminator Pattern may be specified as attribute or element value Attribute: new testPattern attribute Element value: braces ( ) indicate pattern instead of expression 15/09: Do not need the braces as expressions start with '{'. Need to state rules for where the patter matching starts in the data stream. 22/09: Closed Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 070 Write DFDL primer 071 Write test cases. 083 Implement RFC2116 109 Add 'message' attribute to dfdl:discriminator 01/09: Closed: Conclusion was that this is genuinely useful, and has low implementation cost. Will add a 'message' attribute to dfdl:discriminator. 43 not started 110 Clarify expression limitations for defineVariable, newVariableInstance and setVariable 01/09: Closed: Spec should distinguish newVariableInstance defaultValue from setVariable value. For newVariableInstance defaultValue, disallow downward references and references to self (must be usable from the point of declaration) For setVariable allow downward references and references to self, and always evaluate at end of component. (defineVariable defaultValue should be same as newVariableInstance) 43 not started 113 Be specific about regular expression syntax 43 not started 108 Updates to hidden mechanism 43 not started 99 Updates to reflect subsetting and unparser optionality 43 not started 112 Define what conformance to spec means 43 not started 115 Clarify allowed lengths for signed binary integers 43 not started 116 2. xs:minLength The spec currently states When an element declaration specifies a default value, and has type xs:string, then xs:minLength must be specified and must be 1 or greater. It is a schema definition error otherwise. The process for defaults and nils means this restriction is no longer needed. Agreed 117 3. Is UTF-16 a fixed width or variable width encoding Proposal -UCS2 is a fixed length encoding -UTF-16 is a variable width encoding. - A new property dfdl:UTF16Fixed 'yes ¦ no' treat UTF-16 as a fixed width encoding 15/09: Closed 118 2. Document that an empty sequence that is the content of complex type is ignored even when it has annotations It is a schema definition error if an empty sequence is the content of a complex type 099 Splitting the specification in simpler sections. 22/09: We reviewed the proposed list of optional features and approved. These will be documented by adding a section that lists these features rather than making them inline. It will be closely related to the conformance section.. Closed 101 Semantics of 'fixed' Proposal: - xs:fixed will not be used for parsing but only for validation and for providing a default value on unparsing. - A new dfdl function will be defined that applies only to simple element and tests whether the element exists including applying all the schema facets and other constraints. 22/09: Discussed whether dfdl:checkConstaints should included exists function. It isn't obvious what the return code should be for elements that don't exist. checkConstarints will check that element does exist. 'true' means the element exists and is valid, 'false' means doesn't exist or exists but doesn't meet constraints. The parameter is a path to a simple or complex element. If complex and it exists return 'true' Closed 108 dfdl:hidden Global group approach Summary: Particle to hide can be a local element, element ref, local sequence, local choice or group ref Particle is removed from its parent into a dedicated global group of composition sequence and replaced in the parent by a new empty local sequence The new empty local sequence carries a dfdl:hiddenGroupRef property, other DFDL properties are not allowed Pros: Removal of all DFDL annotations and use of the resultant pure XSD results in same infoset Global group can be reused Cons: Making something hidden is a refactor operation Global group sequence needs DFDL properties setting correctly Alejandro had implemented extensions to the hidden function. 1. Allow hidden sequence to reference a global element. Decided against as Suman had identified some problems with namesapces. 2. Allow the reference global group to contain a choice in addition to a sequence. It was agreed this was a useful extension. 22/09: Closed 113 Regular Expressions. 15/09: Agreed that should just say that either JAva or PERL regular expressions can be used and for portability the common subset of functions should be used. 22/09: Closed 113b Regular Expressions for Assert/Discriminator. Allowed as alternative to expression on dfdl:assert and dfdl:discriminator Pattern may be specified as attribute or element value Attribute: new testPattern attribute Element value: braces ( ) indicate pattern instead of expression 15/09: Do not need the braces as expressions start with '{'. Need to state rules for where the patter matching starts in the data stream. 22/09: Closed 115 Clarify allowed lengths for signed integer types when rep is binary integer (ie, two's complement) 01/09: No technical reason to restrict lengths to 2^x bytes, could be odd, could be bits. But rare in practise so if we do relax, limit any core subset to 2^x bytes. 22/09: Agreed that there should be not restrictions on lengths. Closed Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Agenda for OGF DFDL WG call 22 Septembeer 2010 15:00UK (10:00 ET)
by Alan Powell 21 Sep '10

21 Sep '10

1. Current Actions 2. Rules for 'missing' elements lengthKind='implicit' and xs:maxLength or xs:length is "0": element is missing lengthKind='implicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='explicit' and length is an expression : element is missing if the expression evaluates to zero. lengthKind='explicit' and length is "0": element is missing lengthKind='explicit' and length is not "0": element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='pattern' : element is missing if the length of the pattern match is zero lengthKind='prefixed' : element is missing if the prefixed length region parses as zero lengthKind='delimited' and delimiterPolicy='suppressed' and the end of the group has not been encountered : element is missing if it has an initiator AND emptyValueDelimiterPolicy excludes the initiator AND the initiator is not found in the data ( regardless of discriminators or initiatedContent ) lengthKind='delimited' and delimiterPolicy='suppressed' or 'suppressedAtEnd' and the end of the group has been encountered : element is missing lengthKind='delimited', all other cases : element is missing if its scanned length is zero lengthKind='endOfParent': element is missing if its scanned length is zero It is a schema definition error to specify emptyValueDelimiterPolicy 'initiator' for lengthKind 'explicit' or 'implicit'. Current Actions: No Action 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update ... 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 03/03: Discussions have been taking place on the subset of tests that will be provided. 10/03: work is progressing 17/03: work is progressing 31/03: work is progressing 14/04: And XML test case format has been defined and is being tested. 21/04. Schema for TDML defined. Need to define how this and the test cases will be made public 05/05: Work still progressing 12/05: Work still progressing 02/06: Work still progressing on technical and legal considerations ... 25/08: Will chase to allow Daffodil access to test cases. The WG should define how implementation confirm that they 'conform to DFDL v1' 01/09: IBM still progressing the legal aspect. Intends to publish 100 or so tests as soon as it can, ahead of a full compliance suite. 08/09: IBM still progressing 15/09: IBM still progressing, expect tests to be available within a few weeks 085 ALL: publicise Public comments phase to ensure a good review.. 14/04: see minutes 21/04: Press release, OMG and other standards bodies. 05/05: Alan and Steve H have contacted other standards bodies. Will ask them to add comments on spec 15/05: still no public comments 02/06: No public comments 16/06: Public comments period has ended with no external comments. Alan had posted changes made in draft 041. Steve suggested send a note to the WG highlighting these changes. Steve also suggested requesting an extension as other IBM groups may review. We discussed whether this was necessary as changes will need to be made during the implementation phase anyway. Alan to ask OGF what the process is for changes post public comment. 23/06: Still no comments. Alan will contact OGF to understand the rest of the process. 30/06: Alan has emailed Joel asking what the process is now public comment period is over and can we update the published version with WG updates. No response yet. 07/07: No response. Alan will chase up 14/07: No response from Joel. Sent email to Greg Newby by no response. 21/07: Still no response. 04/08: Joel has responded that it is up to the WG to decide if the changes are significant enough to need additional review. Alan to contact David Martin and Erwin Laure for guidance if we split the specification. 11/08: Received a response from Joel that the WG can decide if a re- public review is necessary before becoming a 'proposed recommendation'. Alan responded that the WG agreed that a re-review was not necessary. The next stage is for OGF review committee to approve publication. 11/08: Specification is now 'awaiting author changes' before being submitted to the OGF technical committee for approval as a 'proposed specification'. Alan would like to have the updated specification complete by Sept 10th. The WG needs to complete all actions by then or decide that they do not need to be included in this phase of the process. 01/09: Alan and Steve have discussed and propose Sept 30th for completion of draft 43 and closure of all actions. 08/09: Target for completion September 30. 15/09: as above 099 Splitting the specification in simpler sections. 07/07: Steve sent a proposal but not discussed. Alan will arrange a separate call. 14/07:Discussed Steve's proposal and Suman's and Alan's comments. Need to add choice, validation, facets. Also how does an implementation declare which subsets it supports. Suggested levels and/or profiles. Steve highlighted a problem when a DFDL schema from an implementation of just the core functions was moved to a full DFDL implementation what should happen about the missing properties. Does the full implementation need to be aware of subsets of functions? Should it raise a schema definition error for use of a function not in the subset. 21/07: no progress 04/08: Steve had updated proposed groups of function. (Subset_proposal_v2.ppt). We discussed whether its is better to have discrete sets of functions or expanding levels of function. Purpose of subsetting is: 1. Allow simpler implementations. (main purpose) 2. Simplify tooling 3. Simplify specification. Steve to contact previous members of WG to check if we have the correct subsets 11/08: Steve sent an email to previous members of the WG asking for opinions on splitting the specification. Bob McGrath from National Center For Supercomputing responded that they had implemented about 80% of the function. Alejandro will send a description of the function they have implemented. Action will be raised to track the Daffodil implementation 11/08: not discussed 01/09: NCSA implementation description received. Making the unparser optional is a good idea (NCSA do not need one) . Work will progress on the subsets. 08/09: No progress 15/09: Steve proposed making 'obscure' properties optional rather than subsetting parts of the specification. See minutes for proposed list of optional properties. 101 Semantics of 'fixed' 21/07: Discussed whether not matching the 'fixed' value should be a validation error or processing error. Decided that for consistency it should be a validation error. It would be useful however to avoid having to duplication of facet information in an assert which could become unwieldy for, say, a large enumeration. Suggestions - a parser option that 'converted all validation errors to processing errors' - a dfdl expression function that 'applied all facets' or 'applied specific facet' to a particular element. Stephanie will produce some examples of how this could be used.. 04/08: Stephanie had produced examples but they were not discussed due to lack of time 11/08: We started to discuss Stephanie's HIPPA example but ran out of time. 25/08: Not discussed 01/09: Discuss next week 08/09: Stephanie sent an example of an X12 document showing how an element with the same name was defined in different groups with different enumerations. Proposal: - xs:fixed will not be used for parsing but only for validation and for providing a default value on unparsing. - A new dfdl function will be defined that applies only to simple element and tests whether the element exists including applying all the schema facets. (need to check with Tim why he wanted to only apply enumerations) dfdl:exists( xpath , true ¦ false) true means apply facets, false means don'e apply facets. <xs:element ref="REF_BillingProviderTaxIdentification_2010AA"> <xs:annotation> <xs:documentation>Discrimination needed to distinguish REF segments</xs:documentation> <xs:appinfo source=" http://www.ogf.org/dfdl/"> <dfdl:discriminator test="{dfdl:exists(./REF01__ReferenceIdentificationQualifier, true)}"/> </xs:appinfo> </xs:annotation> 15/09: Decided that a separate dfdl:checkConstraints (or other suitable name) function, that checks all constraints not just facets, was simpler than extending fn:exists. It applies to both simple and complex element and to groups 107 teston/testoff dfdl expression functions. Are these functions still needed. They were introduced to allow individual bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that use existence flags to see if they are still required. 04/08: Not discussed 11/08: Not discussed 25/08: Not discussed 01/09: Steve to progress by Sept 30th 08/09: Steve to progress by Sept 30th 15/09: The ISO 8583 standard has existence flags at the beginning that are encoded so cannot be defined as an array of bits. Therefore DFDL needs the ability to set individual bits within an unsigned int. However the functions, particualry SetOn/Off, as currently defined are not correct. SetOn returns a byte with the relevant bit set on. This must then be combined with other bytes which isn't very usable. Steve to circulate example of use and suggested improvements. 108 dfdl:hidden There has been some discussion on whether the 'hidden' global group should be indicated in some way. 04/08: A lively discussion. The specification is works as currently defined so whether changes need to be made to make tooling easier. There shouldn't be 'conventions' in particular tooling as they must be able to properly deal with schema from other tools that would not obey those conventions. Steve stated that it is often dangerous to hide too much from users when they can see they underlying schema. To be continued. 25/08: there has been some offline discussions about simplifying how hidden elements are implemented. The proposal is dfdl:hidden property on xs:element only xs:minOccurs and xs:maxOccurs MUST be 0 when hidden dfdl:minOccurs and dfdl:maxOccurs for hidden elements only. An element is 'required' when dfdl:minOccurs >0 and normal default processing occurs. The schema, without dfdl annotations, must match the infoset so assumption is that non-DFDL tools, such as mappers, will ignore/not show elements with xs:minOccurs and xs:maxOccurs = '0' 01/09: The above proposal is flawed due to use of maxOccurs = 0 (this was identified back in 2008 hence current spec). Bob confirmed that NCSA models use hidden in a big way, so punting hidden beyond 1.0 is not an option. Two candidates: - As per spec but with syntactic improvements to make it clear that the two xs:sequences do not take any dfdl:sequence properties - Place a flag directly on a local element and force minOccurs to be 0. Simpler syntax but the semantic changes, as the element *could* be legally in the infoset, although a DFDL parser would never put it there. Steve will circulate the two proposals for next week. Bob to talk to Alejandro as the NCSA implementation is currently more flexible than the spec, allowing the groupref to point to a choice, and an elementref. Are these really needed? 08/09: Discussed the Global Group and Hidden Flag approaches. Decided to stay with Global Group with dfdl:sequence properties rather than the dfdl:hidden annotation. It was agreed that there would be no extra properties on the 'hidden' global group as the syntax was messy as it should really be on the sequence and there are currently no dfdl properties on global groups. Global group approach Summary: Particle to hide can be a local element, element ref, local sequence, local choice or group ref Particle is removed from its parent into a dedicated global group of composition sequence and replaced in the parent by a new empty local sequence The new empty local sequence carries a dfdl:hiddenGroupRef property, other DFDL properties are not allowed Pros: Removal of all DFDL annotations and use of the resultant pure XSD results in same infoset Global group can be reused Cons: Making something hidden is a refactor operation Global group sequence needs DFDL properties setting correctly The Daffodil parser allows the hidden annotation to reference global elements in addition to global groups. It was noted that this lost the particle properties but we need to discuss with Alejandro. 15/09: Alan circulated section describing Hidden Sequence Groups. Noted that hidden is now only described in the sequence sections of the specficiation. Noted some editorial changes. Alejandro had implemented extensions to the hidden function. 1. Allow hidden sequence to reference a global element. Decided against as Suman had identified some problems with namesapces. 2. Allow the reference global group to contain a choice in addition to a sequence. It was agreed this was a useful extension. 22/09: to be closed 111 Daffodil DFDL parser 11/08: Bob and Alejandro described the new implementation that they have developed. It is a new code base and is not based on the Deffudle prototype. It is written in scala and implements approximately 80% of the features in the public comments draft of DFDL V1. Alejandro will send a list of the features not implemented. We discussed the scenarios that motivated the development which was to extract data from various sources and transform into canonical formats. Bob offered to make Daffodil available for the WG to assess the functionality. IBM WG members will get approval the company to allow them to receive Daffodil. Bob raised the question that if Daffodil becomes the public implementation of DFDL then we will need to work out how that would be funded and managed. It would be helpful if IBM test cases were available to Daffodil. IBM will investigate 25/08: Alejandro had sent a list of the functions that he has implemented and Steve ahd responding indicating the extra functions he thought were essential. Since then Alejandro has implemented some of the missing functions, such as escape schemes, pre-defined variables, binary decimal numbers, etc, and will update his list. Bob is planning to make the parser available on the internet to allow testing. His organisation is being reorganised and he doesn't know what the priority of Daffodill will be so it is essential that we move quickly. It would help if IBM could indicate its support for Daffodil in some semi-formal way. 01/09: Alejandro updating Daffodil to include escape schemes, unordered sequences and ignoreCase. Daffodil being placed under formal source control in anticipation of external release. Bob has a start October deadline to create a report on what has been done for his sponsors. It would be great if we could get Daffodil on the web and have run some IBM tests so it could be highlighted at OGF 30 at end October. 08/09: Alejandro is marking up Spec draft 42 to indicate which features Daffodil implement. Bob expects Daffodil to be available on the web soon. 15/09: Alejandro had indicated in the specification which functions were implemented in Daffodill. Steve had reviewed and identified which function need to be implemented and which could be considered optional (see action 099). Alejandro is implementing the missing core functions. There was some discussion about the limitations on unordered groups. (stop value and expression not supported). It was agreed that it should be a schema definition error if dfdl:occursCountKind is 'stopValue' on any element within an unordered sequence and a floating element. 112 DFDL certification process 25/08: Discussed how to certify DFDL implementations. Alan to investigate if OGF have a defined process. 01/09: In progress, spec needs to state what conformance means, as part of this work 08/09: Discussed what needs to be said in the spec and agreed that details of a conformance test suite should be in another document. Alan to draft conformance section. 15/09: Alan had look at the conformance sections in XML and Schema specifications both of which indicate sections which must be implemented. None just say 'execute the test suite'. They talk in terms of conformance of document, schema and processors.. 113 Regular Expressions. 25/08: The DFDL regular expressions should provide lookahead and backreferences. Is the current regular expression language sufficient? a. Is the XML regular expression language the correct one to use. Tim asked if DFDL needs to specify an language at all and should leave it to implementers to pick one. That would inhibit portability of schema. 01/09: There are many variations of regexp language, it seems wise to specify one that we know contains functions like lookaround, which makes it easy to say things like 'give me everything up to but not including x'. This rules out XML Schema and POSIX, it needs Perl 5 or Java. 08/09: Agreed that specification should define the regular expression language (if only by referring to other specifications) . Should allow a common subset of PERL and Java expressions languages. Alan to update regular expression section. 15/09: Agreed that should just say that either JAva or PERL regular expressions can be used and for portability the common subset of functions should be used. 22/09: to be closed 113b Regular Expressions for Assert/Discriminator. 25/08: The DFDL regular expressions should provide lookahead and backreferences. Is the current regular expression language sufficient? b. A regular expression property on an assert/discriminator as an alternative to the test expression. Either a DFDL expression or a regular expression could be specified but not both. 01/09: Tim to convince Steve (via example) that use of regexp in asserts is needed in 1.0. 08/09: Agreed that this is a useful function Allowed as alternative to expression on dfdl:assert and dfdl:discriminator Pattern may be specified as attribute or element value Attribute: new testPattern attribute Element value: braces ( ) indicate pattern instead of expression 15/09: Do not need the braces as expressions start with '{'. Need to state rules for where the patter matching starts in the data stream. 22/09: to be closed 114 OGF 30 25/08: OGF30 takes place on October 25-29 in Brussels. Should we have a WG session? 09/01: Given emergence of NCSA implementation and spec completion target of 30th Sept it makes sense to host a session at OGF 30. 08/09: Steve to request permission to go 15/09: Travel request has be submitted 115 Clarify allowed lengths for signed integer types when rep is binary integer (ie, two's complement) 01/09: No technical reason to restrict lengths to 2^x bytes, could be odd, could be bits. But rare in practise so if we do relax, limit any core subset to 2^x bytes. 08/09: not discussed 15/09: not discussed Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Fw: DFDL functions testOn/Off and setOn/Off - action 107 - updated
by Steve Hanson 20 Sep '10

20 Sep '10

Two corrections: a) Use correct prefix for XPath integer() function b) Can't use dfdl:outputValueCalc (or any other property) with dfdl:inputValueCalc Regards Steve Hanson Strategy, Common Transformation & DFDL Co-Chair, OGF DFDL WG IBM SWG, Hursley, UK, smh(a)uk.ibm.com, tel +44-(0)1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 20/09/2010 17:29 ----- From: Alan Powell/UK/IBM To: Steve Hanson/UK/IBM@IBMGB Date: 20/09/2010 15:57 Subject: Re: Fw: DFDL functions testOn/Off and setOn/Off - action 107 Steve The spec doesn't allow inputValueCalc and outputValueCalc on the same element. Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com ----- Forwarded by Steve Hanson/UK/IBM on 20/09/2010 11:19 ----- From: Steve Hanson/UK/IBM To: dfdl-wg(a)ogf.org Date: 20/09/2010 11:18 Subject: DFDL functions testOn/Off and setOn/Off - action 107 The expression language section of the spec (21.5.3). defines some functions that make it easier to test/set individual bits in a byte. Action 107 was raised to question whether they were needed and if so to complete the definitions. In particular the spec for setOn/setOff seems odd. Here's an example of where they are useful, although upon investigation the functions need changing a bit. A financial standard exists called ISO8583. It is widely used with debit/credit card transactions. An ISO8583 data stream consists of 128 optional fixed length fields with no tags or delimiters, preceded by a mandatory bitmap that indicates the presence of all the other fields. In DFDL a nice approach is to model each optional field as having dfdl:occursCountKind="expression" with dfdl:occursCount referencing the corresponding bit in the bitmap and casting its value to an integer to give 0 or 1. Each bit in the bitmap would have its output value set using dfdl:outputValueCalc, true if the corresponding field existed in the infoset. The representation of the bitmap is the interesting part. It can either be packed or unpacked. Packed In packed format the bitmap is a real array of bits (8 bytes worth) which must be modelled as individual booleans (ie, not an array) because each needs to carry a distinct dfdl:outputValueCalc expression, as shown below. For example: <complexType name="ISO8583_Packed"> <sequence> <element name="BitMap"> <complexType> <sequence>  <element name="Bit001" type="boolean" dfdl:representation="binary" dfdl:binaryBooleanTrueRep="1" dfdl:binaryBooleanFalseRep="0" dfdl:lengthKind="explicit" dfdl:length="1" dfdl:lengthUnits="bits" dfdl:outputValueCalc="{if fn:exists(../Field001) then true else false}" /> ... <element name="Bit064" type="boolean" dfdl:representation="binary" dfdl:binaryBooleanTrueRep="1" dfdl:binaryBooleanFalseRep="0" dfdl:lengthKind="explicit" dfdl:length="1" dfdl:lengthUnits="bits" dfdl:outputValueCalc="{if fn:exists(../Field064) then true else false}" /> </sequence> </complexType> </element>  <element name="Field001" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{xs:integer(../BitMap/Bit001)}" /> ... <element name="Field064" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{xs:integer(../BitMap/Bit064)}" /> </sequence> </complexType> Unpacked However, in the unpacked format the value of each nibble of the bitmap is taken and unpacked as text (ASCII or EBCDIC). Eg, bitmap '0000000100100011010001010110011110001001101010111100110111101111' (64 bits) = x01x23x45x67x89xABxCDxEF (8 bytes) = x30x31x32x33x34x35x36x37x38x39x41x42x43x44x45x46 (16 bytes) in ASCII. This can be modelled as an unsigned long with representation as a base 16 standard text number. The problem is then how to test and set the individual bits in the unsigned long. That's where the DFDL functions come in useful, because there are no XPath functions for bitwise operations. If the functions are defined as follows: xs:boolean dfdl:testBit(xs:unsignedLong, xs:unsignedShort) Returns Boolean true if the bit number given by arg #2 is set on in the integer given by arg #1, otherwise returns Boolean false. xs:unsignedLong dfdl:setBits(xs:boolean+) Returns an integer being the value of the bit positions provided by the Boolean arguments, where true=1, false=0. The # of args must be a multiple of 8. Note that the bit numbering goes from left to right, in accordance with section 12.3.7.2 of the spec. Applying this to the example above gives: <complexType name="ISO8583_Unpacked"> <sequence> <element name="BitMapInt" type="unsignedLong" dfdl:representation="text" dfdl:textNumberRep="standard" dfdl:textNumberBase="16" dfdl:encoding="ascii" dfdl:lengthKind="explicit" dfdl:length="16" dfdl:lengthUnits="characters" dfdl:outputValueCalc="{dfdl:setBits(fn:exists(../Field001), ... fn:exists(../Field064)}" />  <element name="Field001" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{xs:integer(dfdl:testBit(../BitMapInt, 1)}" /> ... <element name="Field064" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{xs:integer(dfdl:testBit(../BitMapInt, 64)}" /> </sequence> </complexType> To keep the unpacked model as close to the packed model as possible, we can incorporate the logical BitMap from the packed model and hide BitMapInt. This means the Fieldxxx elements are the same for each. <complexType name="ISO8583_Unpacked"> <sequence> <sequence dfdl:hiddenGroupRef="HiddenBitMap" /> <element name="BitMap"> <complexType> <sequence>  <element name="Bit001" type="boolean" dfdl:inputValueCalc="{dfdl:testBit(../BitMapInt, 1)}" /> ... <element name="Bit064" type="boolean" dfdl:inputValueCalc="{dfdl:testBit(../BitMapInt, 64)}" /> </sequence> </complexType> </element>  <element name="Field001" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{xs:integer(../BitMap/Bit001)}" /> ... <element name="Field064" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{xs:integer(../BitMap/Bit064)}" /> </sequence> </complexType> <group name="HiddenBitMap"> <sequence> <element name="BitMapInt" type="unsignedLong" dfdl:representation="text" dfdl:textNumberRep="standard" dfdl:textNumberBase="16" dfdl:encoding="ascii" dfdl:lengthKind="explicit" dfdl:length="16" dfdl:lengthUnits="characters" dfdl:outputValueCalc="{dfdl:setBits(fn:exists(../Field001), ... fn:exists(../Field064)}" /> </sequence> </group> For both packed and unpacked, the BitMap element could be hidden as well, as there is no need for it in the infoset. Regards Steve Hanson Strategy, Common Transformation & DFDL Co-Chair, OGF DFDL WG IBM SWG, Hursley, UK, smh(a)uk.ibm.com, tel +44-(0)1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

DFDL functions testOn/Off and setOn/Off - action 107
by Steve Hanson 20 Sep '10

20 Sep '10

The expression language section of the spec (21.5.3). defines some functions that make it easier to test/set individual bits in a byte. Action 107 was raised to question whether they were needed and if so to complete the definitions. In particular the spec for setOn/setOff seems odd. Here's an example of where they are useful, although upon investigation the functions need changing a bit. A financial standard exists called ISO8583. It is widely used with debit/credit card transactions. An ISO8583 data stream consists of 128 optional fixed length fields with no tags or delimiters, preceded by a mandatory bitmap that indicates the presence of all the other fields. In DFDL a nice approach is to model each optional field as having dfdl:occursCountKind="expression" with dfdl:occursCount referencing the corresponding bit in the bitmap and casting its value to an integer to give 0 or 1. Each bit in the bitmap would have its output value set using dfdl:outputValueCalc, true if the corresponding field existed in the infoset. The representation of the bitmap is the interesting part. It can either be packed or unpacked. Packed In packed format the bitmap is a real array of bits (8 bytes worth) which must be modelled as individual booleans (ie, not an array) because each needs to carry a distinct dfdl:outputValueCalc expression, as shown below. For example: <complexType name="ISO8583_Packed"> <sequence> <element name="BitMap"> <complexType> <sequence>  <element name="Bit001" type="boolean" dfdl:representation="binary" dfdl:binaryBooleanTrueRep="1" dfdl:binaryBooleanFalseRep="0" dfdl:lengthKind="explicit" dfdl:length="1" dfdl:lengthUnits="bits" dfdl:outputValueCalc="{if fn:exists(../Field001) then true else false}" /> ... <element name="Bit064" type="boolean" dfdl:representation="binary" dfdl:binaryBooleanTrueRep="1" dfdl:binaryBooleanFalseRep="0" dfdl:lengthKind="explicit" dfdl:length="1" dfdl:lengthUnits="bits" dfdl:outputValueCalc="{if fn:exists(../Field064) then true else false}" /> </sequence> </complexType> </element>  <element name="Field001" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{fn:integer(../BitMap/Bit001)}" /> ... <element name="Field064" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{fn:integer(../BitMap/Bit064)}" /> </sequence> </complexType> Unpacked However, in the unpacked format the value of each nibble of the bitmap is taken and unpacked as text (ASCII or EBCDIC). Eg, bitmap '0000000100100011010001010110011110001001101010111100110111101111' (64 bits) = x01x23x45x67x89xABxCDxEF (8 bytes) = x30x31x32x33x34x35x36x37x38x39x41x42x43x44x45x46 (16 bytes) in ASCII. This can be modelled as an unsigned long with representation as a base 16 standard text number. The problem is then how to test and set the individual bits in the unsigned long. That's where the DFDL functions come in useful, because there are no XPath functions for bitwise operations. If the functions are defined as follows: xs:boolean dfdl:testBit(xs:unsignedLong, xs:unsignedShort) Returns Boolean true if the bit number given by arg #2 is set on in the integer given by arg #1, otherwise returns Boolean false. xs:unsignedLong dfdl:setBits(xs:boolean+) Returns an integer being the value of the bit positions provided by the Boolean arguments, where true=1, false=0. The # of args must be a multiple of 8. Note that the bit numbering goes from left to right, in accordance with section 12.3.7.2 of the spec. Applying this to the example above gives: <complexType name="ISO8583_Unpacked"> <sequence> <element name="BitMap" type="unsignedLong" dfdl:representation="text" dfdl:textNumberRep="standard" dfdl:textNumberBase="16" dfdl:encoding="ascii" dfdl:lengthKind="explicit" dfdl:length="16" dfdl:lengthUnits="characters" dfdl:outputValueCalc="{dfdl:setBits(fn:exists(../Field001), ... fn:exists(../Field064)}" />  <element name="Field001" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{fn:integer(dfdl:testBit(../BitMap, 1)}" /> ... <element name="Field064" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{fn:integer(dfdl:testBit(../BitMap, 64)}" /> </sequence> </complexType> To keep the packed and unpacked models as close as possible, we can incorporate hide the logical bitmap from the packed model and hide the unsignedLong. Note that this results in a dfdl:outputValueCalc that depends on another dfdl:outputValueCalc. <complexType name="ISO8583_Unpacked"> <sequence> <sequence dfdl:hiddenGroupRef="HiddenBitMap" /> <element name="BitMap"> <complexType> <sequence>  <element name="Bit001" type="boolean" dfdl:inputValueCalc="{dfdl:testBit(../UnpackedBitMap, 1)} " dfdl:outputValueCalc="{if fn:exists(../Field001) then true else false}" /> ... <element name="Bit064" type="boolean" dfdl:inputValueCalc="{dfdl:testBit(../UnpackedBitMap, 64)}" dfdl:outputValueCalc="{if fn:exists(../Field064) then true else false}" /> </sequence> </complexType> </element>  <element name="Field001" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{fn:integer(../BitMap/Bit001)}" /> ... <element name="Field064" type="string" minOccurs="0" dfdl:occursKind="expression" dfdl:occurs="{fn:integer(../BitMap/Bit064)}" /> </sequence> </complexType> <group name="HiddenBitMap"> <sequence> <element name="UnpackedBitMap" type="unsignedLong" dfdl:representation="text" dfdl:textNumberRep="standard" dfdl:textNumberBase="16" dfdl:encoding="ascii" dfdl:lengthKind="explicit" dfdl:length="16" dfdl:lengthUnits="characters" dfdl:outputValueCalc="{dfdl:setBits(../BitMap/Bit001, ... ../BitMap/Bit064)}" /> </sequence> </group> For both packed and unpacked, the derived bitmap can be hidden as well, as there is no need for it in the infoset. Regards Steve Hanson Strategy, Common Transformation & DFDL Co-Chair, OGF DFDL WG IBM SWG, Hursley, UK, smh(a)uk.ibm.com, tel +44-(0)1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Minutes for OGF DFDL Working Group Call, September 15-2010
by Alan Powell 17 Sep '10

17 Sep '10

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, September 15-2010 Attendees Stephanie Fetzer (IBM) Steve Hanson (IBM) Bob McGrath (National Center for Supercomputing Applications) Alan Powell (IBM) Tim Kimber(IBM) Alejandro Rodriguez (National Center for Supercomputing Applications) Apologies Mike Beckerle (Oco) Suman Kalia (IBM) 1. Current Actions Updated Below Action 099 Proposed optional properties Feature Detection Pre-reqs NCSA ? Comments Validation External switch None, but of limited value without Simple type restrictions No Simple type restrictions xs:simpleType in xsd None No Nils xs:nillable='yes' in xsd None No Defaults xs:default or xs:fixed in xsd None No Bi-di dfdl:textBiDi='yes' None No Bits dfdl:alignmentUnits='bits' or dfdl:lengthUnits='bits' None No Delimited binary dfdl:representation='binary' (or implied binary) and dfdl:lengthKind='delimited' None No Patterns dfdl:lengthKind='pattern' or dfdl:assert testPattern None Yes Zoned numbers dfdl:textNumberRep='zoned' None No z/OS Packed numbers dfdl:binaryNumberRep='packed' None Yes z/OS Packed calendars dfdl:binaryCalendarRep='packed' None z/OS S/390 floats dfdl:binaryFloatRep='ibm390Hex' None No z/OS Unordered dfdl:sequenceKind='unordered' None Not clear Floating dfdl:floating='yes' None No dfdl functions in expression language dfdl: functions in expression None No Enables use of off-the-shelf XPath 2.0 package Hidden dfdl:hiddenRef <> '' None Yes Calculated values dfdl:inputValueCalc <> '' or dfdl:outputValueCalc <> '' None Yes Escape schemes dfd;defineEscapeScheme in xsd None Yes Encodings Any dfdl:encoding value beyond the core list None Not clear Asserts dfdl:assert in xsd None Yes Discriminators dfdl:discriminator in xsd None Yes Prefixed lengths dfdl:lengthKind='prefixed' Simple type restrictions Not clear Requires simple types Variables Variables in expression language None Yes 2. Document that an empty sequence that is the content of complex type is ignored even when it has annotations One thing to point out is that the authors should avoid <xs:complexType> <xs:sequence dfdl:hiddenGroupRef="..."/> </xs:complexType> (The same applies to other annotations on sequences, long- or short-form.) The schema spec will discard that sequence (see [1] definition of "effective content" clause 2.1.2). The following works: <xs:complexType> <xs:sequence> <xs:sequence dfdl:hiddenGroupRef="..."/> </xs:sequence> </xs:complexType> [1] http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/#key-exg It is a schema definition error if an empty sequence is the content of a complex type Meeting closed, 16:35 Next call Wednesday 22 September 2010 15:00 UK (10:00 ET) Next action: 118 Actions raised at this meeting No Action Current Actions: No Action 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update ... 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 03/03: Discussions have been taking place on the subset of tests that will be provided. 10/03: work is progressing 17/03: work is progressing 31/03: work is progressing 14/04: And XML test case format has been defined and is being tested. 21/04. Schema for TDML defined. Need to define how this and the test cases will be made public 05/05: Work still progressing 12/05: Work still progressing 02/06: Work still progressing on technical and legal considerations ... 25/08: Will chase to allow Daffodil access to test cases. The WG should define how implementation confirm that they 'conform to DFDL v1' 01/09: IBM still progressing the legal aspect. Intends to publish 100 or so tests as soon as it can, ahead of a full compliance suite. 08/09: IBM still progressing 15/09: IBM still progressing, expect tests to be available within a few weeks 085 ALL: publicise Public comments phase to ensure a good review.. 14/04: see minutes 21/04: Press release, OMG and other standards bodies. 05/05: Alan and Steve H have contacted other standards bodies. Will ask them to add comments on spec 15/05: still no public comments 02/06: No public comments 16/06: Public comments period has ended with no external comments. Alan had posted changes made in draft 041. Steve suggested send a note to the WG highlighting these changes. Steve also suggested requesting an extension as other IBM groups may review. We discussed whether this was necessary as changes will need to be made during the implementation phase anyway. Alan to ask OGF what the process is for changes post public comment. 23/06: Still no comments. Alan will contact OGF to understand the rest of the process. 30/06: Alan has emailed Joel asking what the process is now public comment period is over and can we update the published version with WG updates. No response yet. 07/07: No response. Alan will chase up 14/07: No response from Joel. Sent email to Greg Newby by no response. 21/07: Still no response. 04/08: Joel has responded that it is up to the WG to decide if the changes are significant enough to need additional review. Alan to contact David Martin and Erwin Laure for guidance if we split the specification. 11/08: Received a response from Joel that the WG can decide if a re- public review is necessary before becoming a 'proposed recommendation'. Alan responded that the WG agreed that a re-review was not necessary. The next stage is for OGF review committee to approve publication. 11/08: Specification is now 'awaiting author changes' before being submitted to the OGF technical committee for approval as a 'proposed specification'. Alan would like to have the updated specification complete by Sept 10th. The WG needs to complete all actions by then or decide that they do not need to be included in this phase of the process. 01/09: Alan and Steve have discussed and propose Sept 30th for completion of draft 43 and closure of all actions. 08/09: Target for completion September 30. 15/09: as above 099 Splitting the specification in simpler sections. 07/07: Steve sent a proposal but not discussed. Alan will arrange a separate call. 14/07:Discussed Steve's proposal and Suman's and Alan's comments. Need to add choice, validation, facets. Also how does an implementation declare which subsets it supports. Suggested levels and/or profiles. Steve highlighted a problem when a DFDL schema from an implementation of just the core functions was moved to a full DFDL implementation what should happen about the missing properties. Does the full implementation need to be aware of subsets of functions? Should it raise a schema definition error for use of a function not in the subset. 21/07: no progress 04/08: Steve had updated proposed groups of function. (Subset_proposal_v2.ppt). We discussed whether its is better to have discrete sets of functions or expanding levels of function. Purpose of subsetting is: 1. Allow simpler implementations. (main purpose) 2. Simplify tooling 3. Simplify specification. Steve to contact previous members of WG to check if we have the correct subsets 11/08: Steve sent an email to previous members of the WG asking for opinions on splitting the specification. Bob McGrath from National Center For Supercomputing responded that they had implemented about 80% of the function. Alejandro will send a description of the function they have implemented. Action will be raised to track the Daffodil implementation 11/08: not discussed 01/09: NCSA implementation description received. Making the unparser optional is a good idea (NCSA do not need one) . Work will progress on the subsets. 08/09: No progress 15/09: Steve proposed making 'obscure' properties optional rather than subsetting parts of the specification. See minutes for proposed list of optional properties. 101 Semantics of 'fixed' 21/07: Discussed whether not matching the 'fixed' value should be a validation error or processing error. Decided that for consistency it should be a validation error. It would be useful however to avoid having to duplication of facet information in an assert which could become unwieldy for, say, a large enumeration. Suggestions - a parser option that 'converted all validation errors to processing errors' - a dfdl expression function that 'applied all facets' or 'applied specific facet' to a particular element. Stephanie will produce some examples of how this could be used.. 04/08: Stephanie had produced examples but they were not discussed due to lack of time 11/08: We started to discuss Stephanie's HIPPA example but ran out of time. 25/08: Not discussed 01/09: Discuss next week 08/09: Stephanie sent an example of an X12 document showing how an element with the same name was defined in different groups with different enumerations. Proposal: - xs:fixed will not be used for parsing but only for validation and for providing a default value on unparsing. - A new dfdl function will be defined that applies only to simple element and tests whether the element exists including applying all the schema facets. (need to check with Tim why he wanted to only apply enumerations) dfdl:exists( xpath , true ¦ false) true means apply facets, false means don'e apply facets. <xs:element ref="REF_BillingProviderTaxIdentification_2010AA"> <xs:annotation> <xs:documentation>Discrimination needed to distinguish REF segments</xs:documentation> <xs:appinfo source=" http://www.ogf.org/dfdl/"> <dfdl:discriminator test="{dfdl:exists(./REF01__ReferenceIdentificationQualifier, true)}"/> </xs:appinfo> </xs:annotation> 15/09: Decided that a separate dfdl:checkConstraints (or other suitable name) function, that checks all constraints not just facets, was simpler than extending fn:exists. It applies to both simple and complex element and to groups 107 teston/testoff dfdl expression functions. Are these functions still needed. They were introduced to allow individual bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that use existence flags to see if they are still required. 04/08: Not discussed 11/08: Not discussed 25/08: Not discussed 01/09: Steve to progress by Sept 30th 08/09: Steve to progress by Sept 30th 15/09: The ISO 8583 standard has existence flags at the beginning that are encoded so cannot be defined as an array of bits. Therefore DFDL needs the ability to set individual bits within an unsigned int. However the functions, particualry SetOn/Off, as currently defined are not correct. SetOn returns a byte with the relevant bit set on. This must then be combined with other bytes which isn't very usable. Steve to circulate example of use and suggested improvements. 108 dfdl:hidden There has been some discussion on whether the 'hidden' global group should be indicated in some way. 04/08: A lively discussion. The specification is works as currently defined so whether changes need to be made to make tooling easier. There shouldn't be 'conventions' in particular tooling as they must be able to properly deal with schema from other tools that would not obey those conventions. Steve stated that it is often dangerous to hide too much from users when they can see they underlying schema. To be continued. 25/08: there has been some offline discussions about simplifying how hidden elements are implemented. The proposal is dfdl:hidden property on xs:element only xs:minOccurs and xs:maxOccurs MUST be 0 when hidden dfdl:minOccurs and dfdl:maxOccurs for hidden elements only. An element is 'required' when dfdl:minOccurs >0 and normal default processing occurs. The schema, without dfdl annotations, must match the infoset so assumption is that non-DFDL tools, such as mappers, will ignore/not show elements with xs:minOccurs and xs:maxOccurs = '0' 01/09: The above proposal is flawed due to use of maxOccurs = 0 (this was identified back in 2008 hence current spec). Bob confirmed that NCSA models use hidden in a big way, so punting hidden beyond 1.0 is not an option. Two candidates: - As per spec but with syntactic improvements to make it clear that the two xs:sequences do not take any dfdl:sequence properties - Place a flag directly on a local element and force minOccurs to be 0. Simpler syntax but the semantic changes, as the element *could* be legally in the infoset, although a DFDL parser would never put it there. Steve will circulate the two proposals for next week. Bob to talk to Alejandro as the NCSA implementation is currently more flexible than the spec, allowing the groupref to point to a choice, and an elementref. Are these really needed? 08/09: Discussed the Global Group and Hidden Flag approaches. Decided to stay with Global Group with dfdl:sequence properties rather than the dfdl:hidden annotation. It was agreed that there would be no extra properties on the 'hidden' global group as the syntax was messy as it should really be on the sequence and there are currently no dfdl properties on global groups. Global group approach Summary: Particle to hide can be a local element, element ref, local sequence, local choice or group ref Particle is removed from its parent into a dedicated global group of composition sequence and replaced in the parent by a new empty local sequence The new empty local sequence carries a dfdl:hiddenGroupRef property, other DFDL properties are not allowed Pros: Removal of all DFDL annotations and use of the resultant pure XSD results in same infoset Global group can be reused Cons: Making something hidden is a refactor operation Global group sequence needs DFDL properties setting correctly The Daffodil parser allows the hidden annotation to reference global elements in addition to global groups. It was noted that this lost the particle properties but we need to discuss with Alejandro. 15/09: Alan circulated section describing Hidden Sequence Groups. Noted that hidden is now only described in the sequence sections of the specficiation. Noted some editorial changes. Alejandro had implemented extensions to the hidden function. 1. Allow hidden sequence to reference a global element. Decided against as Suman had identified some problems with namesapces. 2. Allow the reference global group to contain a choice in addition to a sequence. It was agreed this was a useful extension. 111 Daffodil DFDL parser 11/08: Bob and Alejandro described the new implementation that they have developed. It is a new code base and is not based on the Deffudle prototype. It is written in scala and implements approximately 80% of the features in the public comments draft of DFDL V1. Alejandro will send a list of the features not implemented. We discussed the scenarios that motivated the development which was to extract data from various sources and transform into canonical formats. Bob offered to make Daffodil available for the WG to assess the functionality. IBM WG members will get approval the company to allow them to receive Daffodil. Bob raised the question that if Daffodil becomes the public implementation of DFDL then we will need to work out how that would be funded and managed. It would be helpful if IBM test cases were available to Daffodil. IBM will investigate 25/08: Alejandro had sent a list of the functions that he has implemented and Steve ahd responding indicating the extra functions he thought were essential. Since then Alejandro has implemented some of the missing functions, such as escape schemes, pre-defined variables, binary decimal numbers, etc, and will update his list. Bob is planning to make the parser available on the internet to allow testing. His organisation is being reorganised and he doesn't know what the priority of Daffodill will be so it is essential that we move quickly. It would help if IBM could indicate its support for Daffodil in some semi-formal way. 01/09: Alejandro updating Daffodil to include escape schemes, unordered sequences and ignoreCase. Daffodil being placed under formal source control in anticipation of external release. Bob has a start October deadline to create a report on what has been done for his sponsors. It would be great if we could get Daffodil on the web and have run some IBM tests so it could be highlighted at OGF 30 at end October. 08/09: Alejandro is marking up Spec draft 42 to indicate which features Daffodil implement. Bob expects Daffodil to be available on the web soon. 15/09: Alejandro had indicated in the specification which functions were implemented in Daffodill. Steve had reviewed and identified which function need to be implemented and which could be considered optional (see action 099). Alejandro is implementing the missing core functions. There was some discussion about the limitations on unordered groups. (stop value and expression not supported). It was agreed that it should be a schema definition error if dfdl:occursCountKind is 'stopValue' on any element within an unordered sequence and a floating element. 112 DFDL certification process 25/08: Discussed how to certify DFDL implementations. Alan to investigate if OGF have a defined process. 01/09: In progress, spec needs to state what conformance means, as part of this work 08/09: Discussed what needs to be said in the spec and agreed that details of a conformance test suite should be in another document. Alan to draft conformance section. 15/09: Alan had look at the conformance sections in XML and Schema specifications both of which indicate sections which must be implemented. None just say 'execute the test suite'. They talk in terms of conformance of document, schema and processors.. 113 Regular Expressions. 25/08: The DFDL regular expressions should provide lookahead and backreferences. Is the current regular expression language sufficient? a. Is the XML regular expression language the correct one to use. Tim asked if DFDL needs to specify an language at all and should leave it to implementers to pick one. That would inhibit portability of schema. 01/09: There are many variations of regexp language, it seems wise to specify one that we know contains functions like lookaround, which makes it easy to say things like 'give me everything up to but not including x'. This rules out XML Schema and POSIX, it needs Perl 5 or Java. 08/09: Agreed that specification should define the regular expression language (if only by referring to other specifications) . Should allow a common subset of PERL and Java expressions languages. Alan to update regular expression section. 15/09: Agreed that should just say that either JAva or PERL regular expressions can be used and for portability the common subset of functions should be used. 113b Regular Expressions for Assert/Discriminator. 25/08: The DFDL regular expressions should provide lookahead and backreferences. Is the current regular expression language sufficient? b. A regular expression property on an assert/discriminator as an alternative to the test expression. Either a DFDL expression or a regular expression could be specified but not both. 01/09: Tim to convince Steve (via example) that use of regexp in asserts is needed in 1.0. 08/09: Agreed that this is a useful function Allowed as alternative to expression on dfdl:assert and dfdl:discriminator Pattern may be specified as attribute or element value Attribute: new testPattern attribute Element value: braces ( ) indicate pattern instead of expression 15/09: Do not need the braces as expressions start with '{'. Need to state rules for where the patter matching starts in the data stream. 114 OGF 30 25/08: OGF30 takes place on October 25-29 in Brussels. Should we have a WG session? 09/01: Given emergence of NCSA implementation and spec completion target of 30th Sept it makes sense to host a session at OGF 30. 08/09: Steve to request permission to go 15/09: Travel request has be submitted 115 Clarify allowed lengths for signed integer types when rep is binary integer (ie, two's complement) 01/09: No technical reason to restrict lengths to 2^x bytes, could be odd, could be bits. But rare in practise so if we do relax, limit any core subset to 2^x bytes. 08/09: not discussed 15/09: not discussed Closed actions No Action 117 3. Is UTF-16 a fixed width or variable width encoding Appendix A: About UTF-16 and Unicode Character Codes above 0xFFFF When we define UTF-16 to be a fixed-width double-byte wide character set we say that each UTF-16 codepoint is represented by 2 bytes. Notice the careful use of the term 'codepoint' here. Unicode/ISO10646 characters can have character codes as large as 0x10FFFF which requires 3 bytes to store (21 bits actually); however in UTF-16 characters with more than 2 bytes of code are encoded as two codepoints, called a surrogate pair; hence, UTF-16 is fixed-width, 2 bytes per codepoint. It is not 2 bytes per Unicode character. UTF-16 is really a variable-width encoding, but the characters that require the surrogate-pair treatment are so infrequently used that UTF-16 is most often treated like a 16-bit fixed-width character set. It is the acknowledgement of the existence of surrogate pairs that leads to the ?codepoint? vs. ?character code? distinction. UTF-32 is a fixed width encoding with a full 4-bytes per character code. It represents all of Unicode with the same width per character. Hence, when we refer to lengths in character strings we will often refer to length in characters, but we qualify that it means 2-byte codepoints when the character set encoding is UTF-16. Hence, when the property lengthUnitKind is 'characters' and the charset is 'UTF-16', then the units are actually 16-bit codepoints, not Unicode characters. Proposal -UCS2 is a fixed length encoding -UTF-16 is a variable width encoding. - A new property dfdl:UTF16Fixed 'yes ¦ no' treat UTF-16 as a fixed width encoding 15/09: Closed 118 2. Document that an empty sequence that is the content of complex type is ignored even when it has annotations One thing to point out is that the authors should avoid <xs:complexType> <xs:sequence dfdl:hiddenGroupRef="..."/> </xs:complexType> (The same applies to other annotations on sequences, long- or short-form.) The schema spec will discard that sequence (see [1] definition of "effective content" clause 2.1.2). The following works: <xs:complexType> <xs:sequence> <xs:sequence dfdl:hiddenGroupRef="..."/> </xs:sequence> </xs:complexType> [1] http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/#key-exg It is a schema definition error if an empty sequence is the content of a complex type Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 070 Write DFDL primer 071 Write test cases. 083 Implement RFC2116 109 Add 'message' attribute to dfdl:discriminator 01/09: Closed: Conclusion was that this is genuinely useful, and has low implementation cost. Will add a 'message' attribute to dfdl:discriminator. 43 not started 110 Clarify expression limitations for defineVariable, newVariableInstance and setVariable 01/09: Closed: Spec should distinguish newVariableInstance defaultValue from setVariable value. For newVariableInstance defaultValue, disallow downward references and references to self (must be usable from the point of declaration) For setVariable allow downward references and references to self, and always evaluate at end of component. (defineVariable defaultValue should be same as newVariableInstance) 43 not started 113 Be specific about regular expression syntax 43 not started 108 Updates to hidden mechanism 43 not started 99 Updates to reflect subsetting and unparser optionality 43 not started 112 Define what conformance to spec means 43 not started 115 Clarify allowed lengths for signed binary integers 43 not started 116 2. xs:minLength The spec currently states When an element declaration specifies a default value, and has type xs:string, then xs:minLength must be specified and must be 1 or greater. It is a schema definition error otherwise. The process for defaults and nils means this restriction is no longer needed. Agreed 117 3. Is UTF-16 a fixed width or variable width encoding Proposal -UCS2 is a fixed length encoding -UTF-16 is a variable width encoding. - A new property dfdl:UTF16Fixed 'yes ¦ no' treat UTF-16 as a fixed width encoding 15/09: Closed 118 2. Document that an empty sequence that is the content of complex type is ignored even when it has annotations It is a schema definition error if an empty sequence is the content of a complex type Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Making it easier to create DFDL implementations
by Steve Hanson 15 Sep '10

15 Sep '10

In order to make it easier to create powerful conformant implementations of DFDL parsers, here are some proposals: 1) A DFDL unparser is optional. This recognises that many users of DFDL are using it to understand existing files of data, and not for updating and rewriting that data in its original form. 2) Some advanced DFDL features are optional The spec defines an additional kind of error called 'unsupported error' which must be generated if an unsupported optional feature is encountered while processing. The table below is a proposed list of optional features, and how the use of each such feature is detected by a DFDL processor. Most are detected either by a DFDL property enum value, or a DFDL property being other than the empty string. This proposal needs no extra properties to define and/or control optional features. Optional features may have an implied ordering, for example, without simple type restrictions, prefixed lengths are not possible. It is extremely desirable for a valid DFDL schema used with implementation X to work with any other implementation that implements the same or a wider subset. This means that all implementations must check for all properties that apply to a DFDL annotation point, including properties for optional features, even if it is just to ensure the property is set to the empty string. This is a corollary of having no defaults - you can not be silent about a DFDL property. I do not want to dilute the portability of DFDL schemas by expanding the list of optional features such that large swathes of the spec are removed. For example, making all binary representation optional. DFDL is not a format. People who have some data they wish to parse will not write a DFDL parser - they will write a custom parser for their format(s). It's only people who have a wide range of formats, or who want to make money, that will write DFDL parsers. The core DFDL features must still provide a powerful binary and text modelling capability. The list of optional features is no way dictates the usability features of DFDL editor tools. Such tools can offer tailored usability for different data formats if desired. This is orthogonal to optional features. Regards Steve Hanson Strategy, Common Transformation & DFDL Co-Chair, OGF DFDL WG IBM SWG, Hursley, UK, smh(a)uk.ibm.com, tel +44-(0)1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0