Re: [DFDL-WG] dfdl-wg Digest, Vol 43, Issue 2

I need to attend the sprint planning meeting from 2pm GMT, so I may not be around when these items are discussed. Comments added below: 12.2 Delimiters: Text Markup - The term 'Delimiters' is not accurate. Most readers will not think of an initiator as a 'delimiter'. - It's not 'Text' markup any more - especially since v0.39 has allowed lengthKind="delimited" for elements with binary representation. Title should be 'Markup' and explanation can then deal with what it really is, rather than justifying the innaccurate title :-) I dislike the use of the term "markup" for something not written by people, and most data formats of the DFDL kind are written by computers, so nothing is getting "marked up" by anyone. Initiators certainly are delimiters in the situations where they are not tags. I.e., initiator="[" terminator="]". Only tags will not be thought of as delimiters. Even then I think it is a stretch to say that nobody will think of an introductory tag as a delimiter. These definitions found online: de·lim·it·er (d?-l?m'?-t?r) n. Computer Science A character or sequence of characters marking the beginning or end of a unit of data. Computing Dictionary delimiter character A character or string used to separate, or mark the start and end of, items of data in, e.g., a database, source code, or text file. See also: record. (2001-03-16) These definitions are consistent with our usage of the term. I suggest no change in our terminology here. <TK> Point taken re: the term 'delimiter'. I still have reservations about calling it 'Text Markup' in the title, though. I think the intro paragraph should explain the common usage ( intiators, separators, terminators for text formats ) and the exceptional usage ( handling delimited binary data and other non-text markup ) </TK> Syntax for specifying markup: It's not clear from this description that each item in the space-separated list is a DFDL string literal. These have always bugged me. Any better solution is welcome. XML/XSD does tend to make space separated the standard way to specify more than one thing. <TK> In a future revision of the spec we need a list of property value types which can then be used consistently in the tables which describe properties. - Enumeration - DFDL string literal - List of DFDL string literals - DFDL expression - DFDL regular expression - Boolean - Non-negative integer - any more? In some cases it will be necessary to place restrictions on the type of content allowed in the string literal ( disallow raw byte values / raw byte values must represent a character / etc ) </TK initiator ( and all other space-separated properties ) It is not clear whether the order of the space-separated properties matters. Must the parser test them in the order in which they are specified? ( Q: What if %ES; is the first in the list? ) I think the order should not matter, and it should test them longest first. <TK> Good idea. I have another related suggestion below. </TK> terminator: is it OK if the final terminator is missing within the scope of a known-length parent? Seems like a reasonable extension of the rule ( in all other scenarios, the end of a known-length parent acts like the end of the data stream for items with its scope ). I believe this should be true. "Final" is relative in my mind. <TK> Good - it's much easier to implement if end of known length parent is always equivalent to end of data stream, from the point of view of enclosed elements. But see next point... </TK> documentFinalTerminatorCanBeMissing: Let's try to avoid creating another property for the postfix separator scenario. I think this property provides a way of modelling the data naturally. We can recommend use of infix-with-a-terminator rather than 'postfix' if the final terminator can be missing. Copasetic. <TK> Had to look up 'copasetic'. I'm amazed that my Mum never came out with that one - she's a walking dictionary. This property has caused problems with naming and interpretation all along the line. Last time we discussed it, I don't think we considered this option ( we did talk about something like it ): - If %ES; is included in the list of values for separator or terminator then a) The parser ignores it while performing ordinary scanning ( otherwise it would always cause a zero-length string to be scanned ). b) The parser accepts 'end of data stream' as a match for the %ES; mnemonic. That makes this property ( and the equivalent one for separators ) redundant. c) Other usages of %ES; remain unchanged. </TK> outputNewLine Should we validate that the 'characterOrCharacters' are all newline characters from the set described by the %NL; mnemonic? Otherwise the DFDL serializer will output data which cannot be parsed by the DFDL parser. Nice catch. dfdl:lengthKind endOfParent 'endOfParent' has almost the same meaning as 'delimited' so should have the same semantics. · the item?s terminator (if specified) · an enclosing construct?s separator or terminator · the end of an enclosing construct designated by its known length · the end of the data stream The effect would be the the element could be ended by the nearest known length parent not just the immediate parent. Also the immediate parent could have lengthKind 'implicit' Agreed. choiceKind 'Fixed' When lengthKind='implicit' all alternative branches of the choice are padded to the fixed length of the largest one so that overall the entire choice construct is fixed length There must be a restriction that the length of at least one choice must be statically defined. Also good catch. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 246742 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (1)
-
Tim Kimber