Agenda for OGF DFDL WG call 3 March 2010- 13:00 UK (8:00 ET)

1. 16.2 scannablility with lengthKind pattern: 2. Current Actions: 3 Steve H issues with draft 039 4 Tim's (major) issues with draft 039 5 Status of specification (for OGF28) 1. 16.2 scannablility with lengthKind pattern: In summary, you can use a data pattern on any element (complex, simple text, simple binary) as long as the bytes are legal in the stated encoding, which where binary data is involved in practice means an 8-bit ASCII encoding. Binary data can be handled using some of the conveniences of text by way of treating it as text with encoding="iso-8859-1". In this case literal text, such as length patterns, is interpreted as in the iso-8859-1 character encoding, and the correspondence of byte values in the data to a string in the DFDL infoset is one to one. That is, byte with value N, produces an infoset character with character code N. 2. Current Actions: No Action 049 20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call 03/02: No progress 10/02: No progress 17/03: No progress 24/03: No progress 066 Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress 03/02: IBM is still investigating 10/02: IBM is still investigating 17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite' 24/03: No progress 079 MB:Encoding for binary fields when lenghtkind is pattern 17/02: Discussed but no conclusion 24/03: Mike has found an encoding that matches the first 255 codepoints of iso 10646. Will document its use for binary fields. 080 AP:Clarify semantics of fn:poisition and fn:count 17/02: no progress 24/03: No progress 083 MB:To correct syntax diagram for FinalUnused and suggest wording for the Sequence section 3 Steve H issues with draft 039 1) Name of property dfdl:textNumberRepresentation is not consistent with dfdl:binaryNumberRep, dfdl:binaryFloatRep, etc. 2) The dfdl:numberPattern etc properties that have been moved from the defunct dfdl:textNumberFormat annotation to dfdl:element etc should be called dfdl:textNumberPattern etc. Otherwise users will think they apply to binary numbers too. 3) In section 14.3 on sequences, there are several sub-sections that talk about parsing according to different ways of specifying length (ie, lengthKind). But dfdl:sequence no longer carries dfdl:lengthKind so I think these sub-sections are not in the right place. I think they should be in section 12, under the correct 12.3.x lengthKind sub-section. 4) Section 19 on built-in specifications. Given that we don't have any for public comment phase we should reword this section. 4 Tim's (major) issues with draft 039 12.2 Delimiters: Text Markup - The term 'Delimiters' is not accurate. Most readers will not think of an initiator as a 'delimiter'. - It's not 'Text' markup any more - especially since v0.39 has allowed lengthKind="delimited" for elements with binary representation. Title should be 'Markup' and explanation can then deal with what it really is, rather than justifying the innaccurate title :-) Syntax for specifying markup: It's not clear from this description that each item in the space-separated list is a DFDL string literal. initiator ( and all other space-separated properties ) It is not clear whether the order of the space-separated properties matters. Must the parser test them in the order in which they are specified? ( Q: What if %ES; is the first in the list? ) terminator: is it OK if the final terminator is missing within the scope of a known-length parent? Seems like a reasonable extension of the rule ( in all other scenarios, the end of a known-length parent acts like the end of the data stream for items with its scope ). documentFinalTerminatorCanBeMissing: Let's try to avoid creating another property for the postfix separator scenario. I think this property provides a way of modelling the data naturally. We can recommend use of infix-with-a-terminator rather than 'postfix' if the final terminator can be missing. outputNewLine Should we validate that the 'characterOrCharacters' are all newline characters from the set described by the %NL; mnemonic? Otherwise the DFDL serializer will output data which cannot be parsed by the DFDL parser. dfdl:lengthKind endOfParent 'endOfParent' has almost the same meaning as 'delimited' so should have the same semantics. · the item?s terminator (if specified) · an enclosing construct?s separator or terminator · the end of an enclosing construct designated by its known length · the end of the data stream The effect would be the the element could be ended by the nearest known length parent not just the immediate parent. Also the immediate parent could have lengthKind 'implicit' choiceKind 'Fixed' When lengthKind='implicit' all alternative branches of the choice are padded to the fixed length of the largest one so that overall the entire choice construct is fixed length There must be a restriction that the length of at least one choice must be statically defined. Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell@uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

I will be unable to make the call on 3/3, so here are my comments on Tim's issues. (None of which seem so major to me :-)
*4 Tim's (major) issues with draft 039 *
*12.2 Delimiters: Text Markup* - The term 'Delimiters' is not accurate. Most readers will not think of an initiator as a 'delimiter'. - It's not 'Text' markup any more - especially since v0.39 has allowed lengthKind="delimited" for elements with binary representation. Title should be 'Markup' and explanation can then deal with what it really is, rather than justifying the innaccurate title :-)
*Syntax for specifying markup:* It's not clear from this description that each item in the space-separated list is a DFDL string literal.
These have always bugged me. Any better solution is welcome. XML/XSD does tend to make space separated the standard way to specify more than one
I dislike the use of the term "markup" for something not written by people, and most data formats of the DFDL kind are written by computers, so nothing is getting "marked up" by anyone. Initiators certainly are delimiters in the situations where they are not tags. I.e., initiator="[" terminator="]". Only tags will not be thought of as delimiters. Even then I think it is a stretch to say that nobody will think of an introductory tag as a delimiter. These definitions found online: *de·lim·it·er* (dĭ-lĭm'ĭ-tər) <http://dictionary.reference.com/help/ahd4/pronkey.html> n. *Computer Science* A character or sequence of characters marking the beginning or end of a unit of data. Computing Dictionary *delimiter* character A character <http://dictionary.reference.com/browse/character> or string<http://dictionary.reference.com/browse/string>used to separate, or mark the start and end of, items of data in, e.g., a database <http://dictionary.reference.com/browse/database>, source code<http://dictionary.reference.com/browse/source+code>, or text file <http://dictionary.reference.com/browse/text+file>. See also: record <http://dictionary.reference.com/browse/record>. (2001-03-16) These definitions are consistent with our usage of the term. I suggest no change in our terminology here. thing.
*initiator ( and all other space-separated properties )* It is not clear whether the order of the space-separated properties matters. Must the parser test them in the order in which they are specified? ( Q: What if %ES; is the first in the list? )
I think the order should not matter, and it should test them longest first.
*terminator: * is it OK if the final terminator is missing within the scope of a known-length parent? Seems like a reasonable extension of the rule ( in all other scenarios, the end of a known-length parent acts like the end of the data stream for items with its scope ).
I believe this should be true. "Final" is relative in my mind.
*documentFinalTerminatorCanBeMissing:* Let's try to avoid creating another property for the postfix separator scenario. I think this property provides a way of modelling the data naturally. We can recommend use of infix-with-a-terminator rather than 'postfix' if the final terminator can be missing.
Copasetic.
*outputNewLine* Should we validate that the 'characterOrCharacters' are all newline characters from the set described by the %NL; mnemonic? Otherwise the DFDL serializer will output data which cannot be parsed by the DFDL parser.
Nice catch.
*dfdl:lengthKind endOfParent* 'endOfParent' has almost the same meaning as 'delimited' so should have the same semantics. · the item’s terminator (if specified) · an enclosing construct’s separator or terminator · the end of an enclosing construct designated by its known length · the end of the data stream The effect would be the the element could be ended by the nearest known length parent not just the immediate parent. Also the immediate parent could have lengthKind 'implicit'
Agreed.
*choiceKind 'Fixed'* *When lengthKind='implicit' all alternative branches of the choice are padded to the fixed length of the largest one so that overall the entire choice construct is fixed length*
There must be a restriction that the length of at least one choice must be statically defined.
Also good catch. ...mikeb
participants (2)
-
Alan Powell
-
Mike Beckerle