
I've been progressing the DFDL Properties tracker and several issues have come up that need resolving before I can suggest a complete list of DFDL properties. Please have a think about these and we can discuss at the F2F. 1) When parsing, we want to be flexible and tolerate data appearing different in forms. For example, allow initiators and terminators to be in upper or lower case, allow UTC timezone to be +00:00 or Z, allow thousand separators for decimal numbers, etc. But on output we must make a choice as to which form we use. We need some principles that we apply to all properties that exhibit this behaviour. 2) It looks like there are real-life examples of initiators and terminators being in a different character encoding to the accompanying data. Implies use of initiatorEncoding etc properties. Need to agree that this is the correct way to proceed. 3) Property precedence and the use of 'notApplicable' enums to say that a property is not to be interpreted on this object. Without this it is not possible to override a dfdl:default setting from higher up the scope. For example, how do I switch off the use of separator for a group when a dfdl:default has set it higher up. A value of notApplicable would solve that. Another example, I have a sequence where variable length fields have a terminator but fixed length fields do not. My terminator and most of my lengths are common, so I set both length and terminator at the group level using dfdl:default. When property scoping rules are applied, I will have a length and a terminator for each field, which do I use? I can't just say length takes precedence because every field has been given a length. I need a way of explicitly saying which to use. One approach is to use a special value for length - notApplicable' - which means ignore the property, and I would use that explicitly on all the variable length fields. Alternatively I can be more explicit and have a separate property to control which is used. 4) Use of 'Native' enum meaning use the locally defined value. Used for character encodings, time zones, endian-ness. The DFDL spec draft says: "There should be no ‘platform varying defaults’. For example, byteOrder should default to bigEndian, or littleEndian, or have no default at all (in which case leaving it unspecified will often cause an error except in all-text situations with non-endian character sets). What’s not acceptable is for byteOrder to default to some value based on the current platform or other environmental constraint. Similarly for locale-sensitive things". Does this imply that 'Native' can't be the default? I can see why we would not want something to default silently based on platform/locale, but by being clear that the default is 'Native' avoids that. 5) Agree on what offset facilities are to be offered for establishing the position of an item. Absolute and/or relative. Problem with absolute offsets is their fragility, a single variable length field breaks the scheme unless the offset is given by an expression. Are relative offsets all we really need? And what are they relative to? Last field? 6) Binary versus text. At the moment the Schema for DFDL has the text model inheriting from the binary model. So text isn't really text, it's text or binary. I think this is wrong and that repType=text and repType=binary really imply separate semantics. It still means properties can be shared, but it clarifies things like the behavior of built-in prefixed lengths - a length prefix for a binary string would be a physical integer in the data, a length prefix for a text string would be a physical numeric string in the data. 7) Padding character - does setting this property imply trim on parse as well as pad on output? Do we need a control for this (pad/trim/both) 8) Justification, date/time format, and some other properties have a default that varies depending on the logical type of the element. How does this default interact with setting a default via scoping rules? We have said that there are no hard-coded defaults, implying a dfdl:defineFormat block must exist and must define values for all properties. This contradicts deriving the default from the logical type. For such properties, we could have an accompanying dfdl:defineFormat-only property that says how it defaults. Or we could exempt these from having to have a default set in dfdl:defineFormat. 9) There are many properties that are inherited from the CAM binary model. Suman is establishing with IBM compiler people exactly what ones are needed and what their semantics are. I find some of the names unhelpful and suggest that renaming such properties is a good idea. 10) Decimal properties. There are a host of properties for controlling decimal formats, it's not clear to me what some of them mean. I'd like us to agree on a finished set. 11) Group level properties. We don't have many properties that define how the members of a group behave. We have separator for instance that says all child fields will be separated. Is there benefit in having other properties like this, for example 'initiated' meaning all child fields must have a unique initiator, or 'fixed' meaning all child fields must have a pre-computable length. Such properties have a secondary benefits - they communicate the nature of the group without recourse to examination of all children, and they enable DFDL editors to validate that all intended properties are present. Regards, Steve Steve Hanson WebSphere Message Brokers, IBM United Kingdom Ltd, Hursley, UK Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848
participants (1)
-
Steve Hanson