February 2010 - dfdl-wg - lists.ogf.org

DFDL V1 Draft 039 is available
by Alan Powell 02 Mar '10

02 Mar '10

All DFDL V1 draft 038 is available at https://forge.gridforum.org/sf/docman/do/downloadDocument/projects.dfdl-wg/… There have again been many changes listed below including many editorial updates to make the specification more readable. The DFDL expression language and DFDL regular expression language sections have been moved towards the end as they are for more advanced users. All highlighted changes are for the changes made in this draft only and do not include changes made in previous drafts. Unfortunately I have had significant problems with MS Word crashes while creating this draft which have resulted the formatting going haywire in some places. I have corrected the major problems but many minor ones still exist. In particular all the cross references have lost the number of the section they reference. This will be fixed in the next draft. Please send comments to me by March 3rd Latest entry at the top please Version Author/ Contributor History Date(yyyy-mm-dd) 039 Alan Powell Added numberPattern section. Added defaulting of complex types Changed missingValueInitiatorPolicy to 'required' and 'prohibited' Changed separatorPolicy to 'required' 'supressed' 'supressedAtEnd' Added defaulting choices. Many editorial changes Added lengthKind delimited for 'binary' 'packed/bcd' Clarified empty sequqnce wording Reworded infoset round tripping. Moved DFDL Expression and regular expression section to near back os spec. Added treatment of unrepresentable characaters to infoset section. Clatified relative path expressions on global declarations. Representation assumed to be text for string and binary for hexBinary Change alignment to be 1 based. docmentFinalTerninatorCanBeMissing and docmentFinalSeparatorCanBeMissing change to dfdl:format only Added UPA checks Added Ambiguity checks Removed defineTextNumberFormat, TextNumberFormat and textNumberFormatRef and moved properties in line Revised 'Resolving points of uncertainty' and discriminators Clarified Nan and infinity values 2010-02-24 Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

3 2

Minutes for OGF DFDL Working Group Call, February 24-2010
by Alan Powell 24 Feb '10

24 Feb '10

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, February 24-2010 Attendees Mike Beckerle (Oco) Suman Kalia (IBM) Steve Hanson (IBM) Alan Powell (IBM) Steve Marting (Progeny) Peter Lambros (IBM) Stephanie Fetzer (IBM) Apologies Tim Kimber(IBM) 1. Remaining 037 review issues A: 16.2 scannablility with lengthKind pattern: Confirm that this is what we agreed In summary, you can use a data pattern on any element (complex, simple text, simple binary) as long as the bytes are legal in the stated encoding, which where binary data is involved in practice means an 8-bit ASCII encoding. Mike B I found an official reference which has no "greyed out" codepoints. All 256 values are "mapped". The following ftp table (see URL below) officially defines the mapping for 8859-1 to unicode/iso10646. The table includes all 256 codepoints - some are specified as just <control> i.e., have no specific meaning, but their 8859 codepoint maps one-to-one and onto a unicode/10646 codepoint with the same value. Note that this property holds for 8859-1. It does not hold for 8859-2 to 8859-16, as these have character codes substituted into them that map to other places in the iso10646 codepoint space. Here's the correspondence table: ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT If we reference this mapping table in the references of the DFDL spec, then I believe we can say that using encoding="iso-8859-1", you can treat binary data as textual, use patterns, etc., and the relationship to/from the infoset always insures preservation of the values of the bytes (parsing), and creation of bytes whose values exactly match the string codepoints (unparsing). This language can be added to the section on lengthKind="pattern" and binary data: Binary data can be handled using some of the conveniences of text by way of treating it as text with encoding="iso-8859-1". In this case literal text, such as length patterns, is interpreted as in the iso-8859-1 character encoding, and the correspondence of byte values in the data to a string in the DFDL infoset is one to one. That is, byte with value N, produces an infoset character with character code N. [reference to above FTP site]. B: Glossary Variable-Occurrence Item - Optional elements have a variable number of occurrences (0 or 1) and arrays also can have a variable number of occurrences (when minOccurs < maxOccurs). So when we say an item with a variable number of occurrences, this can mean either an optional element, or an array where minOccurs < maxOccurs. In either array or optional elements, we have the additional constraint that the DFDL representation properties do not preclude a variable number of occurrences. When dfdl:occursCountKind='explicit' and dfdl:occursCount has a literal constant as its value, or an expression that statically evaluates to a constant, then the DFDL properties are specifying exactly the number of occurrences for all instances and so are said to preclude a variable number of occurrences. If dfdl:occursCount has a formula as its expressed value, then the DFDL properties do not preclude a variable number of occurrences. MikeB Comment: This idea that you can have minOccurs < maxOccurs, but dfdl:occurs is equal to a constant and dfdl:occursKind="explicit" is causing us a bunch of grief in these definitions. Can we be conservative and just say it is a schema definition error if minOccurs < maxOccurs but the length is static, i.e., an explicit constant-valued expression? WG decided that the wording can remain as currently written C: DFDL Schema Component Model Only changes needed are: remove wildcards. Add following to describe shading: The shaded boxes have a direct corresponding element syntax and therefore appear in DFDL schema D: Sequence Groups Mike B: To correct syntax diagram for FinalUnused and suggest wording for the Sequence section E: Check other comments in document. Please look at the remaining comments in draft 039 and suggest solutions 2. Go through Actions Updated below 3 DFDL v1 Specification completion. Draft 039 will be publish today. WG review and Comments by 3 March Draft 40 with updates for OGF submission - available 5 March Meeting closed, 14:00 Next call Wednesday 3 March January 2010 13:00 UK Next action: 084 Actions raised at this meeting No Action 083 MB:To correct syntax diagram for FinalUnused and suggest wording for the Sequence section Current Actions: No Action 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call 03/02: No progress 10/02: No progress 17/03: No progress 24/03: No progress 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress 03/02: IBM is still investigating 10/02: IBM is still investigating 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 079 MB:Encoding for binary fields when lenghtkind is pattern 17/02: Discussed but no conclusion 24/03: Mike has found an encoding that matches the first 255 codepoints of iso 10646. Will document its use for binary fields. 080 AP:Clarify semantics of fn:poisition and fn:count 17/02: no progress 24/03: No progress 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 17/02: XML allows Nan and inf for float and double Dfdl will do the same. Requires more investigation of ICU. 24/03: Alan send clarification that Inf and Nan are limited to float and Double. Closed. 083 MB:To correct syntax diagram for FinalUnused and suggest wording for the Sequence section Closed actions No Action 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 17/02: XML allows Nan and inf for float and double Dfdl will do the same. Requires more investigation of ICU. 24/03: Alan send clarification that Inf and Nan are limited to float and Double. Closed. Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 069 ICU fractional seconds 039 070 Write DFDL primer 071 Write test cases. 072 it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix 039 073 Rename dfdl:separatorPolicy="required" to "always". 039 Defferred untilaction 071 agreed 078 document UPA checks 039 079 Semantics of length=0, nil handling and defaults. (A071) 039 080 Tlog: Allow LengthKind delimited for packed/bcd (A074) 039 081 Update empty sequence section (A075) 039 082 semantics of minOccurs= 0 on choice branches (A076) 039 083 Implement RFC2116 084 Length|Kind pattern scanability rules 039 085 Invalid character substitution 039 086 infoset round tripi: Rephrase sentence 'It is possible to define a schema so that when infoset unparsed and the datastream reparsed, the same infoset will be produced' 039 087 Clarify use of relative paths in global components. 039 088 'DFDL expression' 039 089 Ageed that dfdl:represetnation 'text' is implied for strings and dfdl:represetnation 'binary' is implied for hexbinary 039 091 textStringPadCharacter textNumberPadCharacter must be a 1 byte character if the char set encoding is variable width? 039 092 finalDocumentTerminatorCanBeMissing and finalDocumentSeparatorCanBeMissing allowed only in 'default' format 039 093 remove textNumberFormat and textCalendarFormat. 039 094 Alignment should be 1 based 039 095 AP: Inf and Nan 039 Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Re: [DFDL-WG] 8-bit-ascii for dealing with binary data in text-like manner - problematic
by Mike Beckerle 24 Feb '10

24 Feb '10

I think we've got a fix for this. I found an official reference which has no "greyed out" codepoints. All 256 values are "mapped". The following ftp table (see URL below) officially defines the mapping for 8859-1 to unicode/iso10646. The table includes all 256 codepoints - some are specified as just <control> i.e., have no specific meaning, but their 8859 codepoint maps one-to-one and onto a unicode/10646 codepoint with the same value. Note that this property holds for 8859-1. It does not hold for 8859-2 to 8859-16, as these have character codes substituted into them that map to other places in the iso10646 codepoint space. Here's the correspondence table: ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT If we reference this mapping table in the references of the DFDL spec, then I believe we can say that using encoding="iso-8859-1", you can treat binary data as textual, use patterns, etc., and the relationship to/from the infoset always insures preservation of the values of the bytes (parsing), and creation of bytes whose values exactly match the string codepoints (unparsing). This language can be added to the section on lengthKind="pattern" and binary data: Binary data can be handled using some of the conveniences of text by way of treating it as text with encoding="iso-8859-1". In this case literal text, such as length patterns, is interpreted as in the iso-8859-1 character encoding, and the correspondence of byte values in the data to a string in the DFDL infoset is one to one. That is, byte with value N, produces an infoset character with character code N. [reference to above FTP site]. On Thu, Feb 11, 2010 at 5:32 AM, Steve Hanson <smh(a)uk.ibm.com> wrote: > > Mike > > In the wikipedia entry for ISO 10646 it says "The system deliberately > leaves many code points not assigned to characters, even in the BMP. It does > this to allow for future expansion or to minimize conflicts with other > encoding forms." If those code points are below 256 then we have the same > problem as 8859? I can't find an actual map of the 10646 code points - you > have to buy it from ISO. > > Regards > > Steve Hanson > Programming Model Architect, WebSphere Message Broker, > OGF DFDL WG Co-Chair, > Hursley, UK, > Internet: smh(a)uk.ibm.com, > Phone (+44)/(0) 1962-815848 > > > From: Mike Beckerle <mbeckerle.dfdl(a)gmail.com> To: dfdl-wg(a)ogf.org Date: 11/02/2010 > 00:38 Subject: [DFDL-WG] 8-bit-ascii for dealing with binary data in > text-like manner - problematic Sent by: dfdl-wg-bounces(a)ogf.org > ------------------------------ > > > > > Every "8-bit-ascii" encoding I can find has holes in the code page. That > is, values that don't have a corresponding character codepoint assigned. > > Example: iso-8859-X are a bunch of 8-bit ascii-based encodings that are > popular. > > If you lookup iso-8859-1 it has this language: > > Code values 00–1F, 7F–9F are not assigned to characters by ISO/IEC 8859-1. > > The lower range 20 to 7E (the G0 subset) maps exactly to the same coded G0 > subset of the ISO 646 US variant (commonly known as *ASCII*<http://en.wikipedia.org/wiki/ASCII>), > ... > > They're saying 7-bit ascii is included, and some other codes are there, but > they don't assign a codepoint generally. > > So, to me suggesting use of any particular code page for this purpose is > somewhat ambiguous. E.g., what does &#x01 mean in a string if the encoding > is iso-8859-1? There appears to be a set of translation tables that assign > this to unicode in standard ways that one can find on the web. But the > codepoint doesn't have an assigned meaning in iso-8859-X standards. > > Two possible clarifications: > 1) for all ascii-based character sets, we say that bytes 0x00 to 0xFF all > map to exactly those codepoints in ISO 10646 for the infoset, and vice > versa. > > 2) define dfdl:encoding="bytes" as a special character set name which has > the above property. > > Personally, I prefer 2. It is simpler to explain what is going on, and when > people are depending on bytes it will be clearer that they are. > > ...mike > > -- > dfdl-wg mailing list > dfdl-wg(a)ogf.org > http://www.ogf.org/mailman/listinfo/dfdl-wg > > > > > ------------------------------ > > * > * > > *Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > * > > > > > > >

1 0

Agenda for OGF DFDL WG call 24 February 2010- 13:00 UK (8:00 ET)
by Alan Powell 24 Feb '10

24 Feb '10

1. Remaining 037 review issues A: 16.2 scannablility with lengthKind pattern: Confirm that this is what we agreed In summary, you can use a data pattern on any element (complex, simple text, simple binary) as long as the bytes are legal in the stated encoding, which where binary data is involved in practice means an 8-bit ASCII encoding. By 8-bit ASCII we really mean an encoding where all the codepoints from 0-255 map to the equivalent value. Subsequent investigation indicates that 'all' 8-bit ASCII encodings have gaps so there isn't a valid character. Mike has suggested 1) for all ascii-based character sets, we say that bytes 0x00 to 0xFF all map to exactly those codepoints in ISO 10646 for the infoset, and vice versa. 2) define dfdl:encoding="bytes" as a special character set name which has the above property. Briefly discussed but no conclusion. B: Glossary Variable-Occurrence Item - Optional elements have a variable number of occurrences (0 or 1) and arrays also can have a variable number of occurrences (when minOccurs < maxOccurs). So when we say an item with a variable number of occurrences, this can mean either an optional element, or an array where minOccurs < maxOccurs. In either array or optional elements, we have the additional constraint that the DFDL representation properties do not preclude a variable number of occurrences. When dfdl:occursCountKind='explicit' and dfdl:occursCount has a literal constant as its value, or an expression that statically evaluates to a constant, then the DFDL properties are specifying exactly the number of occurrences for all instances and so are said to preclude a variable number of occurrences. If dfdl:occursCount has a formula as its expressed value, then the DFDL properties do not preclude a variable number of occurrences. MikeB Comment: This idea that you can have minOccurs < maxOccurs, but dfdl:occurs is equal to a constant and dfdl:occursKind="explicit" is causing us a bunch of grief in these definitions. Can we be conservative and just say it is a schema definition error if minOccurs < maxOccurs but the length is static, i.e., an explicit constant-valued expression? C: DFDL Schema Component Model What needs to be changed in the UML diagram? D: Sequence Groups Mike B: TBD: rewrite these property descriptions in terms of the grammar for sequences Specifically, this is where the FinalUnused Region must be described. E: Check other comments in document. 2. Go through Actions Current Actions: No Action 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call 03/02: No progress 10/02: No progress 17/03: No progress 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress 03/02: IBM is still investigating 10/02: IBM is still investigating 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 079 MB:Encoding for binary fields when lenghtkind is pattern 17/02: Discussed but no conclusion 080 AP:Clarify semantics of fn:poisition and fn:count 17/02: no progress 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 17/02: XML allows Nan and inf for float and double Dfdl will do the same. Requires more investigation of ICU. 3 DFDL v1 Specification completion. Draft 039 will be publish today. WG review and Comments by 3 March Draft 40 with updates for OGF submission - available 5 March Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Minutes for OGF DFDL Working Group Call, February 17-2010
by Alan Powell 18 Feb '10

18 Feb '10

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, February 17-2010 Attendees Steve Hanson (IBM) Alan Powell (IBM) Steve Marting (Progeny) Peter Lambros (IBM) Stephanie Fetzer (IBM) Apologies Mike Beckerle (Oco) Suman Kalia (IBM) Tim Kimber(IBM) 1. Comments of latest discriminators doc v5. Steve questioned: It is a processing error if none of the choice branches are known to exist Steve: I know we agreed on this but I?m concerned that I am now unable to model the situation where I have a header, a footer, and in between either nothing or exactly 1 of a choice of n records It was agreed that the body could be an optional complex element with the choice as its content. Also confirmed dfdl:discriminator can be an annotation on · a local xs:element declaration · an xs:element reference · an xs:group reference (when the top level of a choice branch) · an xs:sequence (when the top level of a choice branch) · an xs:choice (when the top level of a choice branch) Action 045 will be closed. 2. Remaining 037 review issues 16.2 scannablility with lengthKind pattern: Confirm that this is what we agreed In summary, you can use a data pattern on any element (complex, simple text, simple binary) as long as the bytes are legal in the stated encoding, which where binary data is involved in practice means an 8-bit ASCII encoding. By 8-bit ASCII we really mean an encoding where all the codepoints from 0-255 map to the equivalent value. Subsequent investigation indicates that 'all' 8-bit ASCII encodings have gaps so there isn't a valid character. Mike has suggested 1) for all ascii-based character sets, we say that bytes 0x00 to 0xFF all map to exactly those codepoints in ISO 10646 for the infoset, and vice versa. 2) define dfdl:encoding="bytes" as a special character set name which has the above property. Briefly discussed but no conclusion. 3. Go through Actions Updated below 4. Biztalk comparison Short discussion which noted that DFDL could match Biztalk representation properties but was missing support for attributes and some simple types. Decided that we don't need to do anything in DFDL v1. 5 DFDL v1 Specification completion. It was agreed that the spec should be in the OGF review process by OGF 28 in March. Alan stated it needed 2 more drafts. Draft 39 with all the updates completed for a final WG review - available 24 Feb Comments by 3 March Draft 40 with updates for OGF submission - available 5 March Meeting closed, 14:00 Next call Wednesday 24 February January 2010 13:00 UK Next action: 083 Actions raised at this meeting No Action Current Actions: No Action 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call 03/02: No progress 10/02: No progress 17/03: No progress 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress 03/02: IBM is still investigating 10/02: IBM is still investigating 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 079 MB:Encoding for binary fields when lenghtkind is pattern 17/02: Discussed but no conclusion 080 AP:Clarify semantics of fn:poisition and fn:count 17/02: no progress 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 17/02: XML allows Nan and inf for float and double Dfdl will do the same. Requires more investigation of ICU. Closed actions No Action 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms. 20/01: Mike will write up 27/01: further discussion of discriminators 29/01: Alan had emailed both proposals but not enough time to discuss 02/02: Agreed to adopt 'component exists' semantics for discriminators 10/02: 'component exists' proposal updated. comments by next call. 17/02: reviewed needs minor updates. Closed 082 MB: Should alignment be 0 or 1 based 17/02: Mike has reworded the section to be 1 based and Agreed this was OK Closed Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 069 ICU fractional seconds 039 070 Write DFDL primer 071 Write test cases. 072 it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix 039 073 Rename dfdl:separatorPolicy="required" to "always". 039 Defferred untilaction 071 agreed 078 document UPA checks 039 079 Semantics of length=0, nil handling and defaults. (A071) 039 080 Tlog: Allow LengthKind delimited for packed/bcd (A074) 039 081 Update empty sequence section (A075) 039 082 semantics of minOccurs= 0 on choice branches (A076) 039 083 Implement RFC2116 084 Length|Kind pattern scanability rules 085 Invalid character substitution 086 infoset round tripi: Rephrase sentence 'It is possible to define a schema so that when infoset unparsed and the datastream reparsed, the same infoset will be produced' 087 Clarify use of relative paths in global components. 088 'DFDL expression' 089 Ageed that dfdl:represetnation 'text' is implied for strings and dfdl:represetnation 'binary' is implied for hexbinary 091 textStringPadCharacter textNumberPadCharacter must be a 1 byte character if the char set encoding is variable width? 092 finalDocumentTerminatorCanBeMissing and finalDocumentSeparatorCanBeMissing allowed only in 'default' format 093 remove textNumberFormat and textCalendarFormat. 094 Alignment should be 1 based Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Action 081: Clarification of NaN and Inf
by Alan Powell 18 Feb '10

18 Feb '10

DFDL support for Infinity and Nan need clarification. There are two aspects Representation of infinity and Nan in the infoset Representation of Infinity and Nan in the datastream Representation of infinity and Nan in the infoset DFDL should follow XML which allows +inf, -inf and Nan for Float and Double >From XML Schema Part 2: Datatypes Second Edition [Definition:] float is patterned after the IEEE single-precision 32-bit floating point type [IEEE 754-1985]. The basic ·value space· of float consists of the values m × 2^e, where m is an integer whose absolute value is less than 2^24, and e is an integer between -149 and 104, inclusive. In addition to the basic ·value space· described above, the ·value space· of float also contains the following three special values: positive and negative infinity and not-a-number (NaN). The ·order-relation· on float is: x < y iff y - x is positive for x and y in the value space. Positive infinity is greater than all other non-NaN values. NaN equals itself but is ·incomparable· with (neither greater than nor less than) any other value in the ·value space·. The special values positive and negative infinity and not-a-number have lexical representations INF, -INF and NaN, respectively. Lexical representations for zero may take a positive or negative sign. Similarly description for Double Minor changes to infoset [dataValue] description 1. [dataValue] The value in the value space (as defined by XML Schema Part 2: Datatypes [XSDLV1] ) of the [datatype] member or special value nil. In a complex element information item this member has no value. For information items of datatype xs:string, the value will be the ISO10646 character codes of the string and 'implicit' (also known as logical), left-to-right bidirectional ordering and orientation. Is it sufficient just to refer to XML schema rather than spell out the value spaces in this specification? Representation of Infinity and Nan in the data stream ICU allows a single character for Nan and infinity. The infinity character can be prefixed by the positive or negative prefix (ie the prefix part of he pattern) This is too restrictive for DFDL as we need to be able to support at least 'INF' '-INF' and 'NaN' Only minor changes to current description. numberInfinityRep String The value used to represent infinity. Infinity is represented as string with the positive or negative prefixes and suffixes from the numberPattern applied This property is applicable when dfdl: textNumberRepresentation is 'standard' and the simple type type is float or double. Annotation: dfdl:textNumberFormat numberNaNRep String The value used to represent NaN. NaN is represented as string and the positive or negative prefixes and suffixes from the numberPattern are not used This property is applicable when dfdl: textNumberRepresentation is 'standard' and the simple type type is float or double. Annotation: dfdl:textNumberFormat Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Agenda for OGF DFDL WG call 17 February 2010- 13:00 UK (8:00 ET)
by Alan Powell 16 Feb '10

16 Feb '10

1. Comments of latest discriminators doc v5. 2. Remaining 037 review issues 16.2 scannablility with lengthKind pattern: Confirm that this is what we agreed In summary, you can use a data pattern on any element (complex, simple text, simple binary) as long as the bytes are legal in the stated encoding, which where binary data is involved in practice means an 8-bit ASCII encoding. By 8-bit ASCII we really mean an encoding where all the codepoints from 0-255 map to the equivalent value. Subsequent investigation indicates that 'all' 8-bit ASCII encodings have gaps so there isn't a valid character. Mike has suggested 1) for all ascii-based character sets, we say that bytes 0x00 to 0xFF all map to exactly those codepoints in ISO 10646 for the infoset, and vice versa. 2) define dfdl:encoding="bytes" as a special character set name which has the above property. Action rasied 3. Go through Actions see below 4. Biztalk comparison Discuss analysis of Biztalk function. Current Actions: No Action 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms. 20/01: Mike will write up 27/01: further discussion of discriminators 29/01: Alan had emailed both proposals but not enough time to discuss 02/02: Agreed to adopt 'component exists' semantics for discriminators 10/02: 'component exists' proposal updated. comments by next call. 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call 03/02: No progress 10/02: No progress 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress 03/02: IBM is still investigating 10/02: IBM is still investigating 079 AP:Encoding for binary fields when lenghtkind is pattern 080 AP:Clarify semantics of fn:poisition and fn:count 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 082 MB: Should alignment be 0 or 1 based Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

DFDL discriminators v5
by Alan Powell 16 Feb '10

16 Feb '10

The latest, and hopefully the last, discriminators write-up which incorporates comments from Steve H and Stephanie. Steve questioned: It is a processing error if none of the choice branches are known to exist. Steve: I know we agreed on this but I?m concerned that I am now unable to model the situation where I have a header, a footer, and in between either nothing or exactly 1 of a choice of n records Also the WG agreed that: Discriminators are not allowed on - Global groups and the top level sequence or choice of a global group. - Global element decalrations - The top level group of a complex type. - Anonymous groups other than when it is the top level of a choice branch. Which I think reduces to dfdl:discriminator can be an annotation on · a local xs:element declaration · an xs:element reference · an xs:group reference · an xs:sequence (when the top level of a choice branch) · an xs:choice (when the top level of a choice branch) Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Comparison of DFDL and Biztalk 2009.
by Alan Powell 15 Feb '10

15 Feb '10

I have taken a brief look at Biztalk to get some idea if DFDL V1 is capable of supporting the same formats. I have more details but to avoid containing developers I will only post a summary here. Biztalk is an XML based integration suite but has 2 non-XML assemblers/ disassemblers for non-XML data: Flat File and EDI. The purpose of these is to transform from non-XML formats to and from XML as data enters/leaves the system. All data within the system is XML. So far I have only look at the Flat File assembler. The EDI assembler seems to be Covast. Summary: · A file is defined by an optional header schema, a body schema and an optional trailer schema. · Schema consists of root node(s), records (complex element), field elements and field attributes. But underlying schema contains complex types, sequences, etc. · Flat File properties are annotations and are allowed on schema, elements and attributes. · Some properties set at schema level codepage, lengthUnits 'bytes' (usually characters) · Defaults set on schema. Local properties have value or 'use default' · Flat files are always text · The children of a record can all be fixed length or delimited · prefix, infix, postfix Child delimiter, Suppress Trailing Delimiters, Preserve Delimiter For Empty Data · Repeating Delimiter · Records can have Tags (initiators) but not terminators. · · No restrictions on logical types. (inc gday, gtime, etc). · Only dates/times have a formatting pattern. Other types seem to have a default representation. · Alignment, padding, minLength · Escape character, Wrap block (escape block), Restricted Characters · Convert all fields to upper/lower case or leave asis The DFDL deficiencies seem to be Lack of support of attributes. Limited set of logical types. Upper/lower case conversion I have only investigated the flat file specific properties as these should be the representation properties. There are other XML properties, such as restricting to parse only, that may apply. Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0

Minute for OGF DFDL Working Group Call, February 10-2010
by Alan Powell 11 Feb '10

11 Feb '10

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, February 10-2010 Attendees Mike Beckerle (Oco) Steve Hanson (IBM) Alan Powell (IBM) Suman Kalia (IBM) Peter Lambros (IBM) Tim Kimber(IBM) Stephanie Fetzer (IBM) Apologies Steve Marting (Progeny) 1. Comments of latest discriminators doc. Stephanie has sent some editorial comments. Action will be closed next week subject to comments from others. 2. Remaining 037 review issues 16.2 scannablility with lengthKind pattern: Confirm that this is what we agreed In summary, you can use a data pattern on any element (complex, simple text, simple binary) as long as the bytes are legal in the stated encoding, which where binary data is involved in practice means an 8-bit ASCII encoding. By 8-bit ASCII we really mean an encoding where all the codepoints from 0-255 map to the equivalent value. Subsequent investigation indicates that 'all' 8-bit ASCII encodings have gaps so there isn't a valid character. Mike has suggested 1) for all ascii-based character sets, we say that bytes 0x00 to 0xFF all map to exactly those codepoints in ISO 10646 for the infoset, and vice versa. 2) define dfdl:encoding="bytes" as a special character set name which has the above property. Action rasied · Tracker Issue: [schema] is an absolute or relative SCD. Why bother allowing absolute? Both Abosolute and relative SCDs are allowed. No change needed. · Tracker Issue: Glossary as the place for centralized definitions, or should they be repeated there, but also introduced at point of first use, or should we put the definitions only at the places where they are discussed, and xref from the glossary? The glossary will be used for definitions. · TBD: Issue - semantics of expressions containing relative paths that are inherited via ref to a dfdl:defineFormat. (also section 10.3) This issue applies to expressions used in any global component not just defineFormats. It was agreed that relative paths would be allowed in global components expect for the dfdl:format annotation on an xs:schema. The relative paths are resolved when the global component is reference, not where it is defined.. · TBD: Issue - XPath term - we are not consistent about using the term XPath, or "expression" when referring to our expression language. I prefer to call it our expression language, and then in the section that defines it state that it is a strict subset of XPath 2.0. The term 'DFDL expression' will be used. · TBD: Issue - fn:position is unclear given that we've just said we don't support sequences in the expression language. Action: clarify semantics of fn:position and fn:count. Relax single sequence restriction. · TBD: Issue - order of sections. Scoping rules section should come before variables section, which uses these concepts. Expressions and regular expressions will be moved to the back of the spec as they are 'advanced' features. TBD: Issue: Case sensitivity of enum names - did we say whether this is case sensitive or not? I believe it should be case sensitive. Already agreed that they are case sensitive · Issue: dfdl:representation - Strings in binary rep. I see no reason why elements of type xs:string will examine dfdl:representation. They shouldn?t' care what it is, they are always "text". I should be able to specify a bunch of inter-mixed binary number and string elements without having to specify dfdl:representation="text' just to avoid an error on the string type elements. I believe xs:string type ignores dfdl:representation (always behaves as if dfdl:representation is 'text').(If we change this then the property precedence section for simpletypes changes slightly as representation="text" is implied if type is string.) That will make it impossible to introduce a binary representation of text later What is "a binary representation of text"? Is there a real issue here. This is a primary convenience and clarity issue for me. I do not want to have to change to representation="text" for every string inside a cobol structure, which is ultimately a binary representation object. To me type="string" is enough. I want to put in the file scope level of the schema a representation="binary", and then decorate the elements with the specifics of their types, but I do not expect to have to put representation="text" on anything. I do not understand what you are trying to achieve by requiring representation="text" for things that are already textual based on the type. Ageed that dfdl:represetnation 'text' is implied for strings and dfdl:represetnation 'binary' is implied for hexbinary The rest of the issues below I think we need to discuss on calls. textStringPadCharacter textNumberPadCharacter - did we agree that this character must be a "minimum width" character if the char set encoding is variable width? (i.e., the pad char must be 1 byte if the encoding is UTF-8. textStringPadCharacter textNumberPadCharacter must be a 1 byte character if the char set encoding is variable width? numberInfinityRep numberNanRep - Is this applicable only to xs:double and xs:float? Also, what I've seen requires a distinction of sign. I.e., there are positive and negative infinities often printing as -inf and +inf. The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values. Action raised · TBD: Issue - \n in regular expressions - clarify relationship of this to entities like NL entity. Also, if I include an entity like WSP* in a regular expression (can I?) does it then match accordingly? It appears that some of our multi-valued entities like WSP+ create conditional "matching" behavior without having to use regular expressions, e.g., when WSP+ is used as a separator. But can you use entities like WSP+ in a regular expression? It seems you should be able to use regular "single valued" entities in a regular expression, its these multi-valued ones that have tricky semantics. Added Unicode values to /n, /t,/r. Disallow DFDL general (multi-option) entities in regular expressions . 14.1 Alignment - TBD: Issue - zero-based thinking here. But all the bits stuff and everything else in DFDL uses 1-based reasoning. Need to revisit to make this sensible for 1 based world. Added implicit alignment table. TBD zero-based It was felt that it was more natural to have alignment 0,2,4 etc rather than 1,3,5 etc. MB to rewrite section. Action raised finalTerminatorCanBeMissing - spec is not clear. Also is there a finalSeparatorCanBeMissing Changed to finalDocumentTerminatorCanBeMissing and finalDocumentSeparatorCanBeMissing. Not sure where finalDocumentSeparatorCanBeMissing should be specified. Looks odd on 'distinguished root'. These properties operate differently from other properties as they are defined on the 'distinguished root' but affect some lower down element. Effectively they are put in scope by a different mechanism We discussed if the propoerties should be on the distinguished root and it's sequence but deciced that because these were really global properies that would only be allowed in the 'default' format block and be 'scoped' over the whole schema. 3. Go through Actions Meeting closed, 14:40 Next call Wednesday 17 February January 2010 13:00 UK Next action: 083 Actions raised at this meeting No Action 079 AP:Encoding for binary fields when lenghtkind is pattern 080 AP:Clarify semantics of fn:poisition and fn:count 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 082 MB: Should alignment be 0 or 1 based Current Actions: No Action 045 20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms. 20/01: Mike will write up 27/01: further discussion of discriminators 29/01: Alan had emailed both proposals but not enough time to discuss 02/02: Agreed to adopt 'component exists' semantics for discriminators 10/02: 'component exists' proposal updated. comments by next call. 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call 03/02: No progress 10/02: No progress 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress 03/02: IBM is still invetsigating 103/02: IBM is still invetsigating 079 Encoding for binary fields when lenghtkind is pattern 079 AP:Encoding for binary fields when lenghtkind is pattern 080 AP:Clarify semantics of fn:poisition and fn:count 081 AP: Inf and Nan The description is the way ICU behaves but need clarification. It isn't clear how inf and Nan are represented in the infoset. Need to investigate if XML allows these values 082 MB: Should alignment be 0 or 1 based Closed actions No Action 077 SKK: mapping of COBOL numbers to textNumberFormats. 03/02: Suman documented the problem. Agreed to remove textNumberFormat and textCalendarFormat. 10/02: closed 078 MB: Reword section 2.3.1 incorporating markup order rules. 10/02:closed Work items: No Item target version status 005 Improvements on property descriptions not started 012 Reordering the properties discussion: move representation earlier, improve flow of topics not started 036 Update dfdl schema with change properties ongoing 042 Mapping of the DFDL infoset to XDM none not required for V1 specification 069 ICU fractional seconds 039 070 Write DFDL primer 071 Write test cases. 072 it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix 039 073 Rename dfdl:separatorPolicy="required" to "always". 039 Defferred untilaction 071 agreed 078 document UPA checks 039 079 Semantics of length=0, nil handling and defaults. (A071) 039 080 Tlog: Allow LengthKind delimited for packed/bcd (A074) 039 081 Update empty sequence section (A075) 039 082 semantics of minOccurs= 0 on choice branches (A076) 039 083 Implement RFC2116 084 Length|Kind pattern scanability rules 085 Invalid character substitution 086 infoset round tripi: Rephrase sentence 'It is possible to define a schema so that when infoset unparsed and the datastream reparsed, the same infoset will be produced' 087 Clarify use of relative paths in global components. 088 'DFDL expression' 089 Ageed that dfdl:represetnation 'text' is implied for strings and dfdl:represetnation 'binary' is implied for hexbinary 091 textStringPadCharacter textNumberPadCharacter must be a 1 byte character if the char set encoding is variable width? 092 finalDocumentTerminatorCanBeMissing and finalDocumentSeparatorCanBeMissing allowed only in 'default' format 093 remove textNumberFormat and textCalendarFormat. Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell(a)uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

1 0