
Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, June 23-2010 Attendees Steve Hanson (IBM) Alan Powell (IBM) Stephanie Fetzer (IBM) Tim Kimber(IBM) Apologies Suman Kalia (IBM) Mike Beckerle (Oco) 1. Current Actions Updated below 2 Nils and Defaults. Reviewed Steve's updates to the parsing and unparsing rules. - it might be possible to simplify the nil processing in the rules by having a defintion up front. - 'nil indicator set' need a definition - Discussion about whether 'usNilForDefaults' could be replaced by some way of defining a literal, out of band value for the default. - 'value' used when should be 'output value' - dfdl:emptyValueDelimietrPolicy description should be moved from defaulting section to initiator section. - 'text markup' used in section 14.2 of spec 3 DFDL property types and other issues. Tim has proposed more specific types for some properties. In particular separating the different kinds of entities. - Modify the meaning and usage of the type 'DFDL String Literal' and modify the remainder of the specification accordingly - Improve the description of DFDL entities to avoid confusion over the intended usage of raw byte values. - Clarify the standard sentence about forward references in DFDL expressions - the current text implied that the restrictions only applied to the unparser. 1.1 DFDL Properties Not discussed Properties on DFDL annotations may be one or more of the following types · DFDL string literal The property value is a string that represents describes a sequence of literal bytes and or characters which appear in the data stream. · List of DFDL string literals The property value is a space-separated list of DFDL string literals. When parsing, if more than string literal in the list matches the portion of the data stream being evaluated then the longest matching string literal in the list must be used. When unparsing, the first string literal in the list must be used. · DFDL expression The property value is an XPath 2.0 expression that calculates evaluates to a value derived from other property values and/or from the DFDL infoset. DFDL expressions can be used to calculate property values, and to calculate logical values for simple elements. · DFDL regular expression The property value is a regular expression that can be used as a pattern to calculate the length of an element by comparing that pattern to the sequence of literal bytes or characters which appear in the data stream. · Enum The property value is a string literal that must be one of the allowed values listed in the property description. · QName The property value is a string literal that also conforms to an XML Qualified Name as specified in ?Namespaces in XML ? · DFDL Simple Type The property value is a string that describes a logical value. The type of the logical value is one of the XML Schema simple types in the DFDL allowed subset. · Non-negative integer The property is a non-negative integer value What does this add? Integer The property is an integer value.What does this add? 1.1.1.1 DFDL Entities in String Literals 1.1.1.1 Character classes in DFDL String literals 1.1.1.1 Raw byte values in DFDL String Literals 4. using textStringPadCharacter with charRef '%#r' on multi-byte encoding Not discussed textStringPadCharacter DFDL String literal The padding character or byte value that is used when justifying or trimming text elements. A pad character can be specified using DFDL entities. A pad byte value must be specified using the %#r entity. DFDL validation rules - if a pad byte value is specified when lengthUnits='characters' then the encoding must be a fixed-width encoding. - if a pad character is specified when lengthUnits='bytes' then the pad character must be a single-byte character. If a pad byte value is specified when lengthUnits='characters' then padding and trimming must be applied using an array of N pad byte values, where N is the width of a character in the fixed-width encoding. Annotation: dfdl:element, dfdl:simpleType Would adding fillByte to dfdl:padChar enumeration make it clearer. padChar is a pad character (%#r not allowed) fillByte is a fill byte 5. nilIndicatorPath and nilIndicatorIndex properties These properties seem a bit of an anomaly. Tim has suggested they can be simplified. Not discussed Meeting closed, 14:30 Next call Wednesday 30 June 2010 15:00 UK (10:00 ET) Next action: 096 Actions raised at this meeting No Action Current Actions: No Action 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update ... 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 03/03: Discussions have been taking place on the subset of tests that will be provided. 10/03: work is progressing 17/03: work is progressing 31/03: work is progressing 14/04: And XML test case format has been defined and is being tested. 21/04. Schema for TDML defined. Need to define how this and the test cases will be made public 05/05: Work still progressing 12/05: Work still progressing 02/06: Work still progressing on technical and legal considerations 16/06: work continues 23/06: work continues 085 ALL: publicize Public comments phase to ensure a good review.. 14/04: see minutes 21/04: Press release, OMG and other standards bodies. 05/05: Alan and Steve H have contacted other standards bodies. Will ask them to add comments on spec 15/05: still no public comments 02/06: No public comments 16/06: Public comments period has ended with no external comments. Alan had posted changes made in draft 041. Steve suggested send a note to the WG highlighting these changes. Steve also suggested requesting an extension as other IBM groups may review. We discussed whether this was necessary as changes will need to be made during the implementation phase anyway. Alan to ask OGF what the process is for changes post public comment. 23/06: Still no comments. Alan will contact OGF to understand the rest of the process. 086 AP: Nils and Defaults during unparsing - update table 31/03: TK to documetn use cases for parsing 14/04: Investigate new property to control empty string behaviour. 21/04: After investigation a new property is not required. New rules developed and tables updated. Need examples of complexTypes to confirm tables apply. Review Nils, defaulting spec section. 05/05: Discussed defaulting complex elements. Tables updated but need to add terminator. SH; to confirm WMD behaviour when infoset item has no value on unparsing Need to describe defaulting choices. 15/05: More discussion. Alan updating sections 26/05: Discussed draft updates. Stephanie to confirm asserts do not make an element required. Alan will update draft.. All: review rest of draft. 02/06: Alan updated description. Please review. Discussed Stephanie's example using discriminators. Decided no changes needed. 16/05: went through Steves comments. Steve to update draft. 23/06: Steve's updates to the rules discussed. See minutes. Rest of document needs updating. 088 define semantics of choiceKind 'fixedLength' 31/03: TK to provide definition of calculable length. Investigate PL/I varchars and Cobol occurs dependingon. 14/04Tim had distributed a document starting the definition of calculable length for the longest choice member. Alan had done some investigation of COBOL occurs depending on and when used in the working section of a program then the maximum storage was reserved but when used in the linkage section the dependent number was uses. We need to understand how the WMB COBOL importer deals with ODO. 21/04: Need to define 'calculable length' and WMB importer ODO behaviour. 05/05: TK: Still need definition of calculable length. SKK: WMB COBOL imported behaviour with ODO 15/05: Suman sent an expmle of an imported Cobol ODo which suggested that the maximum space was reserved. He will extend the example. 02/06: no progress 16/06: no porgress 23/06: no progress 092 AP: Confirm behaviour of defaulting with various occursCountKinds and separator policies. 16/06: no progress 23/06: discussed - whether when number of instances doesn't match specified number of occurrences is it an error or should missing instances be defaulted? Decided it is an error. - defaulting occurs up to minoccurs unless separator policy is required when default up to maxOccurs and unbounded is an error. Closed actions No Action Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 070 Write DFDL primer 071 Write test cases. 083 Implement RFC2116 097 Remove functions that returns duration 041 098 occursCountKind="expression" is parsing only 041 099 nilValue and OccuresStopValue cannot have an expression. On unparsing only outputValueCalc can have a forward reference. 041 100 Need to state in 4.1.2 Infoset that value is optional. 041 101 When dfdl:textNumberRep is ?zoned? only the pattern for positive numbers is used. Only the following pattern characters may be used: '+' MUST BE present at the beginning or end of the pattern to indicate whether the leading or trailing digit carries the overpunched sign, if the logical type is signed '+' MAY BE present at the beginning or end of the pattern to indicate whether the leading or trailing digit carries the overpunched sign, if the logical type is unsigned 'V' MAY BE used to indicate the location of an implied decimal point '0' indicates the number of required digits (including overpunched). '#' indicates the number optional digits. 041 102 Also textNumberPolicy implies it applies to zoned, but doesn't state what zoned behaviour it covers. I think it should be consistent with binaryNumberCheckPolicy for packed, and control whether positive punched data is accepted or rejected when parsing an unsigned type, and whether unpunched data is accepted or rejected when parsing a signed type. 041 103 - textStandardBase - textStandardGroupSeparator - textStandardExponentCharacter - textStandardInfinityRep - textStandardNanRep - textStandardZeroRep - textZonedSignStyle 041 104 4 Unsigned decimal Proposed solution: - Allow xs:nonNegativeInteger which enables unsigned unbounded integers to be modelled. The problem to solve is then just for xs:decimal. - Call the new property dfdl:decimalSigned. It only applies to xs:decimal or user defined restrictions thereof. It applies to all physical decimals, as its name implies (not just zoned or packed). 041 105 AP: Describe trailingSkipBytes for delimited formats. Alan suggested 'dfdl:terminator must be specified and not empty if dfdl:lengthKind is delimited or endOfParent.' 106 AP: Skip Bytes should allow bits Ageed that it should be possible to specify bits. - LSB and TSB renames to dfdl:leadingSkip, dfdl:trailingSkip - units are specified by dfdl:alignmentUnits. Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell@uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU