
As discussed on the call: For case 1) the DFDL xsd always wins, and the context is ignored. If the user wants to use the encoding/byte order from the context, then he must be explicit about this and use case 2) above Will adopt suggestion a). One question - are there any other DFDL properties like dfdl:encoding and dfdl:byteOrder that are commonly provided by context? How about dfdl:binaryFloatRepresentation, or dfdl:outputNewLine? Will not adopt suggestion b). Regards Steve Hanson Programming Model Architect, WebSphere Message Brokers, OGF DFDL WG Co-Chair, Hursley, UK, Internet: smh@uk.ibm.com, Phone (+44)/(0) 1962-815848 From: Steve Hanson/UK/IBM To: dfdl-wg@ogf.org Date: 05/11/2009 14:30 Subject: Action 059: External specification of encoding, byte order DFDL schemas can either: 1) specify fixed encoding(s)/byte order(s) for the data being described, 2) specify that the encoding/byte order is provided by the 'context' that invokes the DFDL processor (using the dfdl:defineVariable 'external' facility). ** For case 1), DFDL is faced with a problem. Namely what happens when the 'context' provides an encoding/byte order for the data, but the DFDL xsd specifies a different encoding/byte order. I think DFDL must make a statement about this situation, as there are several common scenarios where this could occur (HTTP, MIME, MQ). It is worth looking at the precedent set by XML in this regards. The analogous problem for XML is where the XML document itself specifies a different encoding (using the ?xml declaration) to the context. The recommendations for XML are stated in the appendix below - there is no universal rule. It is more complicated with DFDL though. A DFDL xsd can set up the encoding(s)/byte order(s) to use in several different places. Which of those would the context override? All of them? Just the one associated with the top-level structure? My conclusion is therefore that for case 1) the DFDL xsd always wins, and the context is ignored. If the user wants to use the encoding/byte order from the context, then he must be explicit about this and use case 2) above. There are two things that we could allow to be a bit more flexible: a) Pre-define $encoding and $byteOrder variables in the DFDL namespace. These would implictly have 'external' = 'true' and perhaps a 'defaultValue' as well. This simplifies the coding of a DFDL xsd for case 2). b) State that it is an implementation decision to provide an option to use a context encoding/byte order for case 1) instead of the ones in the DFDL xsd. In such a case, the context MUST override all encodings/byte orders in the system of xsds used by the DFDL processor. (In practice this is invariably a single encoding/byte order). . ** (Might be more than encoding & byte order - for example MQ also allows float format to be provided by context) Appendix: XML The equivalent situation for XML is where the XML document specifies its own encoding via the ?xml declaration, and the context also provides the encoding. There is no single rule, in summary: - Basicaly if there is a higher level protocol, then that defines the rules. - Eg, for MIME content-type text/xml, the context encoding is used. If this is omitted, the xml is assumed to be US-ASCII. The ?xml declaration encoding is not used. - Eg, for MIME content-type application/xml, the context encoding is used If this is omitted, the ?xml declaration encoding is used. - For files (where there is no context encoding) use of the ?xml declaration encoding is recommended. Note that in Message Broker, we always use the context encoding, as it should always be present. We never use the ?xml declaration. W3C XML 1.0 spec section F.2 Priorities in the Presence of External Encoding Information The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. In particular, please refer to [IETF RFC 3023] or its successor, which defines the text/xml and application/xml MIME types and provides some useful guidance. In the interests of interoperability, however, the following rule is recommended. If an XML entity is in a file, the Byte-Order Mark and encoding declaration are used (if present) to determine the character encoding. IETF RFC 3023 3.6 Summary The following list applies to text/xml, text/xml-external-parsed- entity, and XML-based media types under the top-level type "text" that define the charset parameter according to this specification: o Charset parameter is strongly recommended. o If the charset parameter is not specified, the default is "us- ascii". The default of "iso-8859-1" in HTTP is explicitly overridden. o No error handling provisions. o An encoding declaration, if present, is irrelevant, but when saving a received resource as a file, the correct encoding declaration SHOULD be inserted. The next list applies to application/xml, application/xml-external- parsed-entity, application/xml-dtd, and XML-based media types under top-level types other than "text" that define the charset parameter according to this specification: o Charset parameter is strongly recommended, and if present, it takes precedence. o If the charset parameter is omitted, conforming XML processors MUST follow the requirements in section 4.3.3 of [XML]. Regards Steve Hanson Programming Model Architect, WebSphere Message Brokers, OGF DFDL WG Co-Chair, Hursley, UK, Internet: smh@uk.ibm.com, Phone (+44)/(0) 1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU