
From Tim: I remembered the reason why I thought this was a good idea. Consider the situation where someone is generating their DFDL schema from
Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, August 25-2010 Attendees Alan Powell (IBM) Suman Kalia (IBM) Tim Kimber(IBM) Bob McGrath (National Center for Supercomputing Applications) Alejandro Rodriguez (National Center for Supercomputing Applications) Apologies Mike Beckerle (Oco) Stephanie Fetzer (IBM) Steve Hanson (IBM) 1. Current Actions Updated Below 2. Regular Expressions. The DFDL regular expressions should provide lookahead and backreferences. Is the current regular expression language sufficient? Discussed two aspects: a. Is the XML regular expression language the correct one to use. Tim asked if DFDL needs to specify an language at all and should leave it to implementers to pick one. That would inhibit portability of schema. b. A regular expression property on an assert/discriminator as an alternative to the test expression. Either a DFDL expression or a regular expression could be specified but not both. 3. OGF 30 OGF30 takes place on October 25-29 in Brussels Should we have a WG session? Meeting closed, 16:30 Next call Wednesday 1 September 2010 15:00 UK (10:00 ET) Next action: 115 Actions raised at this meeting No Action 112 DFDL certification process 113 2. Regular Expressions. The DFDL regular expressions should provide lookahead and backreferences. Is the current regular expression language sufficient? Discussed two aspects: a. Is the XML regular expression language the correct one to use. Tim asked if DFDL needs to specify an language at all and should leave it to implementers to pick one. That would inhibit portability of schema. b. A regular expression property on an assert/discriminator as an alternative to the test expression. Either a DFDL expression or a regular expression could be specified but not both. 114 3. OGF 30 OGF30 takes place on October 25-29 in Brussels Should we have a WG session? Current Actions: No Action 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update ... 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 03/03: Discussions have been taking place on the subset of tests that will be provided. 10/03: work is progressing 17/03: work is progressing 31/03: work is progressing 14/04: And XML test case format has been defined and is being tested. 21/04. Schema for TDML defined. Need to define how this and the test cases will be made public 05/05: Work still progressing 12/05: Work still progressing 02/06: Work still progressing on technical and legal considerations ... 25/08: Will chase to allow Daffodil access to test cases. The WG should define how implementation confirm that they 'conform to DFDL v1' 085 ALL: publicise Public comments phase to ensure a good review.. 14/04: see minutes 21/04: Press release, OMG and other standards bodies. 05/05: Alan and Steve H have contacted other standards bodies. Will ask them to add comments on spec 15/05: still no public comments 02/06: No public comments 16/06: Public comments period has ended with no external comments. Alan had posted changes made in draft 041. Steve suggested send a note to the WG highlighting these changes. Steve also suggested requesting an extension as other IBM groups may review. We discussed whether this was necessary as changes will need to be made during the implementation phase anyway. Alan to ask OGF what the process is for changes post public comment. 23/06: Still no comments. Alan will contact OGF to understand the rest of the process. 30/06: Alan has emailed Joel asking what the process is now public comment period is over and can we update the published version with WG updates. No response yet. 07/07: No response. Alan will chase up 14/07: No response from Joel. Sent email to Greg Newby by no response. 21/07: Still no response. 04/08: Joel has responded that it is up to the WG to decide if the changes are significant enough to need additional review. Alan to contact David Martin and Erwin Laure for guidance if we split the specification. 11/08: Received a response from Joel that the WG can decide if a re- public review is necessary before becoming a 'proposed recommendation'. Alan responded that the WG agreed that a re-review was not necessary. The next stage is for OGF review committee to approve publication. 11/08: Specification is now 'awaiting author changes' before being submitted to the OGF technical committee for approval as a 'proposed specification'. Alan would like to have the updated specification complete by Sept 10th. The WG needs to complete all actions by then or decide that they do not need to be included in this phase of the process. 099 Splitting the specification in simpler sections. 07/07: Steve sent a proposal but not discussed. Alan will arrange a separate call. 14/07:Discussed Steve's proposal and Suman's and Alan's comments. Need to add choice, validation, facets. Also how does an implementation declare which subsets it supports. Suggested levels and/or profiles. Steve highlighted a problem when a DFDL schema from an implementation of just the core functions was moved to a full DFDL implementation what should happen about the missing properties. Does the full implementation need to be aware of subsets of functions? Should it raise a schema definition error for use of a function not in the subset. 21/07: no progress 04/08: Steve had updated proposed groups of function. (Subset_proposal_v2.ppt). We discussed whether its is better to have discrete sets of functions or expanding levels of function. Purpose of subsetting is: 1. Allow simpler implementations. (main purpose) 2. Simplify tooling 3. Simplify specification. Steve to contact previous members of WG to check if we have the correct subsets 11/08: Steve sent an email to previous members of the WG asking for opinions on splitting the specification. Bob McGrath from National Center For Supercomputing responded that they had implemented about 80% of the function. Alejandro will send a description of the function they have implemented. Action will be raised to track the Daffodil implementation 11/08: not discussed 101 Semantics of 'fixed' 21/07: Discussed whether not matching the 'fixed' value should be a validation error or processing error. Decided that for consistency it should be a validation error. It would be useful however to avoid having to duplication of facet information in an assert which could become unwieldy for, say, a large enumeration. Suggestions - a parser option that 'converted all validation errors to processing errors' - a dfdl expression function that 'applied all facets' or 'applied specific facet' to a particular element. Stephanie will produce some examples of how this could be used.. 04/08: Stephanie had produced examples but they were not discussed due to lack of time 11/08: We started to discuss Stephanie's HIPPA example but ran out of time. 25/08: Not discussed 107 teston/testoff dfdl expression functions. Are these functions still needed. They were introduced to allow individual bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that use existence flags to see if they are still required. 04/08: Not discussed 11/08: Not discussed 25/08: Not discussed 108 dfdl:hidden There has been some discussion on whether the 'hidden' global group should be indicated in some way. 04/08: A lively discussion. The specification is works as currently defined so whether changes need to be made to make tooling easier. There shouldn't be 'conventions' in particular tooling as they must be able to properly deal with schema from other tools that would not obey those conventions. Steve stated that it is often dangerous to hide too much from users when they can see they underlying schema. To be continued. 25/08: there has been some offline discussions about simplifying how hidden elements are implemented. The proposal is dfdl:hidden property on xs:element only xs:minOccurs and xs:maxOccurs MUST be 0 when hidden dfdl:minOccurs and dfdl:maxOccurs for hidden elements only. An element is 'required' when dfdl:minOccurs >0 and normal default processing occurs. The schema, without dfdl annotations, must match the infoset so assumption is that non-DFDL tools, such as mappers, will ignore/not show elements with xs:minOccurs and xs:maxOccurs = '0' 109 dfdl:discriminator : the 'message' attribute meta-data. The model is large, and consists of many references to global structures. Each global structure ( e.g. an HL7 segment ) is identified in a particular way. Sometimes the segment is required, sometimes it is not. Sometimes it occurs as a child of a choice group, and sometimes not. Regardless, it is highly likely that the segment will be identified in the same way wherever it occurs. A natural decision for the modeler would be to create a dfdl:discriminator on all references to the segement, even if the ref is not under a point of uncertainty. It's harmless, and it carries no performance penalty. If we disallow the "message" attribute, it will force the modeler to put in extra logic to work out whether the ref is under a POI, and generate an assert/discriminator as appropriate. I'd be interested to know what Steph thinks about this - I think I've heard her say that she sometimes uses discriminators where an assert would have done the job, just to maintain consistency throughout the model. 04/08: not discussed. 11/08: Not discussed 25/08: Not discussed 110 Semantics of newVariableInstance and setVariable what should a DFDL processor ( parser or serializer ) do when it cannot evaluate the expression in a newVariableInstance or setVariable annotation? Moving the setting of variable values after the element has been parsed just creates other problems A new instance must be available to other expressions on the same component, and to the children of a group/element. So it cannot be left until the end of the element. On the other hand, there are clearly some types of setVariable / newVariableInstance annotations which *cannot* be evaluated until the element has been parsed. For the parser, it might be OK to - evaluate the expression when the component ( element or group ) is started - if it cannot be evaluated, add it to a list of annotations that must be processed at the end of the component - if in the mean time any other expressions attempt to access the variable that was being set/created then throw a processing error ( because the result will be undefined ). This will probably require the variable/instance to be placed into a 'not available' state until its expression is resolvable 25/08: There was a brief discussion as IBM needs a resolution soon. Is it possible to restrict newVariableInstance to backward references only so remove the problem? setVariable must obviously be able to access the current value. 111 Daffodil DFDL parser Bob and Alejandro described the new implementation that they have developed. It is a new code base and is not based on the Deffudle prototype. It is written in scala and implements approximately 80% of the features in the public comments draft of DFDL V1. Alejandro will send a list of the features not implemented. We discussed the scenarios that motivated the development which was to extract data from various sources and transform into canonical formats. Bob offered to make Daffodil available for the WG to assess the functionality. IBM WG members will get approval the company to allow them to receive Daffodil. Bob raised the question that if Daffodil becomes the public implementation of DFDL then we will need to work out how that would be funded and managed. It would be helpful if IBM test cases were available to Daffodil. IBM will investigate 25/08: Alejandro had sent a list of the functions that he has implemented and Steve ahd responding indicating the extra functions he thought were essential. Since then Alejandro has implemented some of the missing functions, such as escape schemes, pre-defined variables, binary decimal numbers, etc, and will update his list. Bob is planning to make the parser available on the internet to allow testing. His organisation is being reorganised and he doesn't know what the priority of Daffodill will be so it is essential that we move quickly. It would help if IBM could indicate its support for Daffodil in some semi-formal way. Discussed how to certify DFDL implementations. Alan to investigate if OGF have a defined process. 112 DFDL certification process 113 2. Regular Expressions. The DFDL regular expressions should provide lookahead and backreferences. Is the current regular expression language sufficient? Discussed two aspects: a. Is the XML regular expression language the correct one to use. Tim asked if DFDL needs to specify an language at all and should leave it to implementers to pick one. That would inhibit portability of schema. b. A regular expression property on an assert/discriminator as an alternative to the test expression. Either a DFDL expression or a regular expression could be specified but not both. 114 3. OGF 30 OGF30 takes place on October 25-29 in Brussels Should we have a WG session? Closed actions No Action 104 Expressions Discuss error behaviour when evaluating an expression in various contexts - All properties: wrong type returned : schema definition error exception when evaluating expression : schema definition error referenced variables/paths not available : schema definition error - Properties which allow a forward reference referenced variables/paths not available : no error. DFDL processor continues processing until the expression result is available, then acts on the result. 21/07: Steve stated the current definition that returning the incorrect type was a schema definition error and everything else was a processing error. 04/08: Not discussed 25/08: Closed Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 070 Write DFDL primer 071 Write test cases. 083 Implement RFC2116 Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell@uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU