
Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, February 10-2010 Attendees Mike Beckerle (Oco) Steve Hanson (IBM) Alan Powell (IBM) Suman Kalia (IBM) Peter Lambros (IBM) Tim Kimber(IBM) Stephanie Fetzer (IBM) Apologies Steve Marting (Progeny) 1. Comments of latest discriminators doc. Stephanie has sent some editorial comments. Action will be closed next week subject to comments from others. 2. Remaining 037 review issues 16.2 scannablility with lengthKind pattern: Confirm that this is what we agreed In summary, you can use a data pattern on any element (complex, simple text, simple binary) as long as the bytes are legal in the stated encoding, which where binary data is involved in practice means an 8-bit ASCII encoding. By 8-bit ASCII we really mean an encoding where all the codepoints from 0-255 map to the equivalent value. Subsequent investigation indicates that 'all' 8-bit ASCII encodings have gaps so there isn't a valid character. Mike has suggested 1) for all ascii-based character sets, we say that bytes 0x00 to 0xFF all map to exactly those codepoints in ISO 10646 for the infoset, and vice versa. 2) define dfdl:encoding="bytes" as a special character set name which has the above property. Action rasied · Tracker Issue: [schema] is an absolute or relative SCD. Why bother allowing absolute? Both Abosolute and relative SCDs are allowed. No change needed. · Tracker Issue: Glossary as the place for centralized definitions, or should they be repeated there, but also introduced at point of first use, or should we put the definitions only at the places where they are discussed, and xref from the glossary? The glossary will be used for definitions. · TBD: Issue - semantics of expressions containing relative paths that are inherited via ref to a dfdl:defineFormat. (also section 10.3) This issue applies to expressions used in any global component not just defineFormats. It was agreed that relative paths would be allowed in global components expect for the dfdl:format annotation on an xs:schema. The relative paths are resolved when the global component is reference, not where it is defined.. · TBD: Issue - XPath term - we are not consistent about using the term XPath, or "expression" when referring to our expression language. I prefer to call it our expression language, and then in the section that defines it state that it is a strict subset of XPath 2.0. The term 'DFDL expression' will be used. · TBD: Issue - fn:position is unclear given that we've just said we don't support sequences in the expression language. Action: clarify semantics of fn:position and fn:count. Relax single sequence restriction. · TBD: Issue - order of sections. Scoping rules section should come before variables section, which uses these concepts. Expressions and regular expressions will be moved to the back of the spec as they are 'advanced' features. TBD: Issue: Case sensitivity of enum names - did we say whether this is case sensitive or not? I believe it should be case sensitive. Already agreed that they are case sensitive · Issue: dfdl:representation - Strings in binary rep. I see no reason why elements of type xs:string will examine dfdl:representation. They shouldn?t' care what it is, they are always "text". I should be able to specify a bunch of inter-mixed binary number and string elements without having to specify dfdl:representation="text' just to avoid an error on the string type elements. I believe xs:string type ignores dfdl:representation (always behaves as if dfdl:representation is 'text').(If we change this then the property precedence section for simpletypes changes slightly as representation="text" is implied if type is string.) That will make it impossible to introduce a binary representation of text later What is "a binary representation of text"? Is there a real issue here. This is a primary convenience and clarity issue for me. I do not want to have to change to representation="text" for every string inside a cobol structure, which is ultimately a binary representation object. To me type="string" is enough. I want to put in the file scope level of the schema a representation="binary", and then decorate the elements with the specifics of their types, but I do not expect to have to put representation="text" on anything. I do not understand what you are trying to achieve by requiring representation="text" for things that are already textual based on the type. Ageed that dfdl:represetnation 'text' is implied for strings and dfdl:represetnation 'binary' is implied for hexbinary The rest of the issues below I think we need to discuss on calls. textStringPadCharacter textNumberPadCharacter - did we agree that this character must be a "minimum width" character if the char set encoding is variable width? (i.e., the pad char must be 1 byte if the encoding is UTF-8. textStringPadCharacter textNumberPadCharacter must be a 1 byte character if the char set encoding is variable width? numberInfinityRep numberNanRep - Is this applicable only to xs:double and xs:float? Also, what I've seen requires a distinction of sign. I.e., there are positive and negative infinities often printing as -inf and +inf. The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values. Action raised · TBD: Issue - \n in regular expressions - clarify relationship of this to entities like NL entity. Also, if I include an entity like WSP* in a regular expression (can I?) does it then match accordingly? It appears that some of our multi-valued entities like WSP+ create conditional "matching" behavior without having to use regular expressions, e.g., when WSP+ is used as a separator. But can you use entities like WSP+ in a regular expression? It seems you should be able to use regular "single valued" entities in a regular expression, its these multi-valued ones that have tricky semantics. Added Unicode values to /n, /t,/r. Disallow DFDL general (multi-option) entities in regular expressions . 14.1 Alignment - TBD: Issue - zero-based thinking here. But all the bits stuff and everything else in DFDL uses 1-based reasoning. Need to revisit to make this sensible for 1 based world. Added implicit alignment table. TBD zero-based It was felt that it was more natural to have alignment 0,2,4 etc rather than 1,3,5 etc. MB to rewrite section. Action raised finalTerminatorCanBeMissing - spec is not clear. Also is there a finalSeparatorCanBeMissing Changed to finalDocumentTerminatorCanBeMissing and finalDocumentSeparatorCanBeMissing. Not sure where finalDocumentSeparatorCanBeMissing should be specified. Looks odd on 'distinguished root'. These properties operate differently from other properties as they are defined on the 'distinguished root' but affect some lower down element. Effectively they are put in scope by a different mechanism We discussed if the propoerties should be on the distinguished root and it's sequence but deciced that because these were really global properies that would only be allowed in the 'default' format block and be 'scoped' over the whole schema. 3. Go through Actions Meeting closed, 14:40 Next call Wednesday 17 February January 2010 13:00 UK Next action: 083 Actions raised at this meeting No Action 079 AP:Encoding for binary fields when lenghtkind is pattern 080 AP:Clarify semantics of fn:poisition and fn:count 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 082 MB: Should alignment be 0 or 1 based Current Actions: No Action 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms. 20/01: Mike will write up 27/01: further discussion of discriminators 29/01: Alan had emailed both proposals but not enough time to discuss 02/02: Agreed to adopt 'component exists' semantics for discriminators 10/02: 'component exists' proposal updated. comments by next call. 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call 03/02: No progress 10/02: No progress 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress 03/02: IBM is still invetsigating 103/02: IBM is still invetsigating 079 Encoding for binary fields when lenghtkind is pattern 079 AP:Encoding for binary fields when lenghtkind is pattern 080 AP:Clarify semantics of fn:poisition and fn:count 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 082 MB: Should alignment be 0 or 1 based Closed actions No Action 077 SKK: mapping of COBOL numbers to textNumberFormats. 03/02: Suman documented the problem. Agreed to remove textNumberFormat and textCalendarFormat. 10/02: closed 078 MB: Reword section 2.3.1 incorporating markup order rules. 10/02:closed Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 069 ICU fractional seconds 039 070 Write DFDL primer 071 Write test cases. 072 it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix 039 073 Rename dfdl:separatorPolicy="required" to "always". 039 Defferred untilaction 071 agreed 078 document UPA checks 039 079 Semantics of length=0, nil handling and defaults. (A071) 039 080 Tlog: Allow LengthKind delimited for packed/bcd (A074) 039 081 Update empty sequence section (A075) 039 082 semantics of minOccurs= 0 on choice branches (A076) 039 083 Implement RFC2116 084 Length|Kind pattern scanability rules 085 Invalid character substitution 086 infoset round tripi: Rephrase sentence 'It is possible to define a schema so that when infoset unparsed and the datastream reparsed, the same infoset will be produced' 087 Clarify use of relative paths in global components. 088 'DFDL expression' 089 Ageed that dfdl:represetnation 'text' is implied for strings and dfdl:represetnation 'binary' is implied for hexbinary 091 textStringPadCharacter textNumberPadCharacter must be a 1 byte character if the char set encoding is variable width? 092 finalDocumentTerminatorCanBeMissing and finalDocumentSeparatorCanBeMissing allowed only in 'default' format 093 remove textNumberFormat and textCalendarFormat. Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell@uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU