Minutes for OGF DFDL Working Group Call, February 02 & 03-2010

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, February 02 & 03-2010 Attendees Mike Beckerle (Oco) Steve Hanson (IBM) Alan Powell (IBM) Suman Kalia (IBM) Peter Lambros (IBM) Tim Kimber(IBM) Apologies Stephanie Fetzer (IBM) Steve Marting (Progeny) 1. Discriminators Discussed the 'parent exists' and 'component exists' options for dfdl:discriminator semantics. The WG agreed that DFDL would adopt the 'component exists' semantics where the discriminator indicates that the component it is on exists and does not say anything about the components parent. Examples need minor changes and full syntax. Arrays would be the same as optional components. Discriminators are not allowed on - Global groups and the top level sequence or choice of a global group. - Global element decalrations - The top level group of a complex type. - Anonymous groups other than when it is the top level of a choice branch. The discriminator timing property was also discussed. It was agreed that a parser should be able to tell when it is possible to evaluate the expression so the timing property will be removed. The following will be added instead 'The expression will be evaluated when the referenced elements are known to exist or known not to exist.' Timing on asserts was also mention but deferred for the time being. Alan will update the 'component exists' proposal. 1. Action 077 Cobol and numberFormats Suman highlighted a problem using textNumberFormat for COBOL numbers where a separate textNumberFormat is required for each length. This is because a different numberPattern is required for each one. It was suggested that numberPattern was move from textNumberFormat to the standard properties but it was decided to get rid of textNumberFormat altogether and move all the properties. Tim pointed out that this meant that is was now not possible to just vary the number properties so there may be more definedFormats but this was not felt to be a problem. For consistency it was agreed that textCalendarFormat would also be removed. EscapeScheme is different so will be retained. 2. Remaining 037 review issues 2. I agree with the existing comment that the RFC2119 key words should be upper case. Agreed that the RFC2119 keywords should be in upper case and wherever possible the spec should be reworded to use the key words. 16.2 scannablility with lengthKind pattern: One use case it to find the end of of a character element by looking for 'binary' bytes of the following element. Discussed whether dfdl:lengthKind pattern should be allowed for binary elements. Pattern scanning inherently treats the data as characters but for single byte encodings a binary byte can be specified by its character codepoint. Agreed that dfdk:lengthKind pattern will be allowed for binary elements when the encoding is US-ASCII. (why didn't we say any single byte character set?) Section 16.2 says that that the children on a complex type must not change the encoding. This will be relaxed when encoding is US-ASCII. DFDL raw entities will not be allowed in a pattern. Tracker Issue: illegal character encodings for parsing and unparsing. What should DFDL do when it finds illegal bytes when converting strings during parsing and unparsing? Discussed adding a new property to declare a substitution character. Discussed what products do and ICU. Decided to follow ICU. Conversions to Unicode (during parsing) will substitue characters that cannot be converted with the unicode substitute character (U+FFFD). When converting from Unicode (during unparsing) they will be substituted with the substitute character for that encoding, for example 0x1A (Control-Z) for ASCII. Tracker Issue: Processing-time Schema Definition Errors Mike will reword Section 2.3.1 Tim will supply rules for the order that terminators and separators are look for in the data stream Tracker Issue: "round trip" for infoset. Should we omit the whole point? Rephrase sentence 'It is possible to define a schema so that when infoset unparsed and the datastream reparsed, the same infoset will be produced' 3. Go through Actions Meeting closed, 15:00 Next call Tuesday 10 February January 2010 13:00 UK Next action: 079 Actions raised at this meeting No Action 078 MB: Reword section 2.3.1 incorporating markup order rules. Current Actions: No Action 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms. 20/01: Mike will write up 27/01: further discussion of discriminators 29/01: Alan had emailed both proposals but not enough time to discuss 02/02: Agreed to adopt 'component exists' semantics for discriminators 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call 03/03: No progress 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress 03/02: IBM is still invetsigating 077 SKK: mapping of COBOL numbers to textNumberFormats. 03/02: Suman documented the problem. Agreed to remove textNumberFormat and textCalendarFormat. 078 MB: Reword section 2.3.1 incorporating markup order rules. Closed actions No Action Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 069 ICU fractional seconds 039 070 Write DFDL primer 071 Write test cases. 072 it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix 039 073 Rename dfdl:separatorPolicy="required" to "always". 039 Defferred untilaction 071 agreed 078 document UPA checks 039 079 Semantics of length=0, nil handling and defaults. (A071) 039 080 Tlog: Allow LengthKind delimited for packed/bcd (A074) 039 081 Update empty sequence section (A075) 039 082 semantics of minOccurs= 0 on choice branches (A076) 039 083 Implement RFC2116 084 Length|Kind pattern scanability rules 085 Invalid character substitution Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell@uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (1)
-
Alan Powell