
Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, February 24-2010 Attendees Mike Beckerle (Oco) Suman Kalia (IBM) Steve Hanson (IBM) Alan Powell (IBM) Steve Marting (Progeny) Peter Lambros (IBM) Stephanie Fetzer (IBM) Apologies Tim Kimber(IBM) 1. Remaining 037 review issues A: 16.2 scannablility with lengthKind pattern: Confirm that this is what we agreed In summary, you can use a data pattern on any element (complex, simple text, simple binary) as long as the bytes are legal in the stated encoding, which where binary data is involved in practice means an 8-bit ASCII encoding. Mike B I found an official reference which has no "greyed out" codepoints. All 256 values are "mapped". The following ftp table (see URL below) officially defines the mapping for 8859-1 to unicode/iso10646. The table includes all 256 codepoints - some are specified as just <control> i.e., have no specific meaning, but their 8859 codepoint maps one-to-one and onto a unicode/10646 codepoint with the same value. Note that this property holds for 8859-1. It does not hold for 8859-2 to 8859-16, as these have character codes substituted into them that map to other places in the iso10646 codepoint space. Here's the correspondence table: ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT If we reference this mapping table in the references of the DFDL spec, then I believe we can say that using encoding="iso-8859-1", you can treat binary data as textual, use patterns, etc., and the relationship to/from the infoset always insures preservation of the values of the bytes (parsing), and creation of bytes whose values exactly match the string codepoints (unparsing). This language can be added to the section on lengthKind="pattern" and binary data: Binary data can be handled using some of the conveniences of text by way of treating it as text with encoding="iso-8859-1". In this case literal text, such as length patterns, is interpreted as in the iso-8859-1 character encoding, and the correspondence of byte values in the data to a string in the DFDL infoset is one to one. That is, byte with value N, produces an infoset character with character code N. [reference to above FTP site]. B: Glossary Variable-Occurrence Item - Optional elements have a variable number of occurrences (0 or 1) and arrays also can have a variable number of occurrences (when minOccurs < maxOccurs). So when we say an item with a variable number of occurrences, this can mean either an optional element, or an array where minOccurs < maxOccurs. In either array or optional elements, we have the additional constraint that the DFDL representation properties do not preclude a variable number of occurrences. When dfdl:occursCountKind='explicit' and dfdl:occursCount has a literal constant as its value, or an expression that statically evaluates to a constant, then the DFDL properties are specifying exactly the number of occurrences for all instances and so are said to preclude a variable number of occurrences. If dfdl:occursCount has a formula as its expressed value, then the DFDL properties do not preclude a variable number of occurrences. MikeB Comment: This idea that you can have minOccurs < maxOccurs, but dfdl:occurs is equal to a constant and dfdl:occursKind="explicit" is causing us a bunch of grief in these definitions. Can we be conservative and just say it is a schema definition error if minOccurs < maxOccurs but the length is static, i.e., an explicit constant-valued expression? WG decided that the wording can remain as currently written C: DFDL Schema Component Model Only changes needed are: remove wildcards. Add following to describe shading: The shaded boxes have a direct corresponding element syntax and therefore appear in DFDL schema D: Sequence Groups Mike B: To correct syntax diagram for FinalUnused and suggest wording for the Sequence section E: Check other comments in document. Please look at the remaining comments in draft 039 and suggest solutions 2. Go through Actions Updated below 3 DFDL v1 Specification completion. Draft 039 will be publish today. WG review and Comments by 3 March Draft 40 with updates for OGF submission - available 5 March Meeting closed, 14:00 Next call Wednesday 3 March January 2010 13:00 UK Next action: 084 Actions raised at this meeting No Action 083 MB:To correct syntax diagram for FinalUnused and suggest wording for the Sequence section Current Actions: No Action 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call 03/02: No progress 10/02: No progress 17/03: No progress 24/03: No progress 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress 03/02: IBM is still investigating 10/02: IBM is still investigating 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 079 MB:Encoding for binary fields when lenghtkind is pattern 17/02: Discussed but no conclusion 24/03: Mike has found an encoding that matches the first 255 codepoints of iso 10646. Will document its use for binary fields. 080 AP:Clarify semantics of fn:poisition and fn:count 17/02: no progress 24/03: No progress 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 17/02: XML allows Nan and inf for float and double Dfdl will do the same. Requires more investigation of ICU. 24/03: Alan send clarification that Inf and Nan are limited to float and Double. Closed. 083 MB:To correct syntax diagram for FinalUnused and suggest wording for the Sequence section Closed actions No Action 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 17/02: XML allows Nan and inf for float and double Dfdl will do the same. Requires more investigation of ICU. 24/03: Alan send clarification that Inf and Nan are limited to float and Double. Closed. Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 069 ICU fractional seconds 039 070 Write DFDL primer 071 Write test cases. 072 it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix 039 073 Rename dfdl:separatorPolicy="required" to "always". 039 Defferred untilaction 071 agreed 078 document UPA checks 039 079 Semantics of length=0, nil handling and defaults. (A071) 039 080 Tlog: Allow LengthKind delimited for packed/bcd (A074) 039 081 Update empty sequence section (A075) 039 082 semantics of minOccurs= 0 on choice branches (A076) 039 083 Implement RFC2116 084 Length|Kind pattern scanability rules 039 085 Invalid character substitution 039 086 infoset round tripi: Rephrase sentence 'It is possible to define a schema so that when infoset unparsed and the datastream reparsed, the same infoset will be produced' 039 087 Clarify use of relative paths in global components. 039 088 'DFDL expression' 039 089 Ageed that dfdl:represetnation 'text' is implied for strings and dfdl:represetnation 'binary' is implied for hexbinary 039 091 textStringPadCharacter textNumberPadCharacter must be a 1 byte character if the char set encoding is variable width? 039 092 finalDocumentTerminatorCanBeMissing and finalDocumentSeparatorCanBeMissing allowed only in 'default' format 039 093 remove textNumberFormat and textCalendarFormat. 039 094 Alignment should be 1 based 039 095 AP: Inf and Nan 039 Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell@uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU