DFDL schemas can either:

1) specify fixed encoding(s)/byte order(s) for the data being described,
2) specify that the encoding/byte order is provided by the 'context' that invokes the DFDL processor (using the dfdl:defineVariable 'external' facility). **

For case 1), DFDL is faced with a problem. Namely what happens when the 'context' provides an encoding/byte order for the data, but the DFDL xsd specifies a different encoding/byte order. I think DFDL must make a statement about this situation, as there are several common scenarios where this could occur (HTTP, MIME, MQ).

It is worth looking at the precedent set by XML in this regards. The analogous problem for XML is where the XML document itself specifies a different encoding (using the ?xml declaration) to the context. The recommendations for XML are stated in the appendix below - there is no universal rule.

It is more complicated with DFDL though.  A DFDL xsd can set up the encoding(s)/byte order(s) to use in several different places. Which of those would the context override? All of them?  Just the one associated with the top-level structure?  

My conclusion is therefore that for case 1) the DFDL xsd always wins, and the context is ignored. If the user wants to use the encoding/byte order from the context, then he must be explicit about this and use case 2) above.

There are two things that we could allow to be a bit more flexible:

a) Pre-define $encoding and $byteOrder variables in the DFDL namespace. These would implictly have 'external' = 'true' and perhaps a 'defaultValue' as well.  This simplifies the coding of a DFDL xsd for case 2).

b) State that it is an implementation decision to provide an option to use a context encoding/byte order for case 1) instead of the ones in the DFDL xsd. In such a case, the context MUST override all encodings/byte orders in the system of xsds used by the DFDL processor.  (In practice this is invariably a single encoding/byte order). .

** (Might be more than encoding & byte order - for example MQ also allows float format to be provided by context)

Appendix: XML

The equivalent situation for XML is where the XML document specifies its own encoding via the ?xml declaration, and the context also provides the encoding. There is no single rule, in summary:
        - Basicaly if there is a higher level protocol, then that defines the rules.
        - Eg, for MIME content-type text/xml, the context encoding is used. If this is omitted,  the xml is assumed to be US-ASCII. The ?xml declaration encoding is not used.
        - Eg, for MIME content-type application/xml, the context encoding is used If this is omitted,  the ?xml declaration encoding is used.
        - For files (where there is no context encoding) use of the ?xml declaration encoding is recommended.

Note that in Message Broker, we always use the context encoding, as it should always be present. We never use the ?xml declaration.


W3C XML 1.0 spec section F.2 Priorities in the Presence of External Encoding Information

The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. In particular, please refer to [IETF RFC 3023] or its successor, which defines the text/xml and application/xml MIME types and provides some useful guidance. In the interests of interoperability, however, the following rule is recommended.



IETF RFC 3023

3.6 Summary

  The following list applies to text/xml, text/xml-external-parsed-
  entity, and XML-based media types under the top-level type "text"
  that define the charset parameter according to this specification:

  o  Charset parameter is strongly recommended.

  o  If the charset parameter is not specified, the default is "us-
     ascii".  The default of "iso-8859-1" in HTTP is explicitly
     overridden.

  o  No error handling provisions.

  o  An encoding declaration, if present, is irrelevant, but when
     saving a received resource as a file, the correct encoding
     declaration SHOULD be inserted.

  The next list applies to application/xml, application/xml-external-
  parsed-entity, application/xml-dtd, and XML-based media types under
  top-level types other than "text" that define the charset parameter
  according to this specification:

  o  Charset parameter is strongly recommended, and if present, it
     takes precedence.

  o  If the charset parameter is omitted, conforming XML processors
     MUST follow the requirements in section 4.3.3 of [XML].


Regards

Steve Hanson
Programming Model Architect, WebSphere Message  Brokers,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh@uk.ibm.com,
Phone (+44)/(0) 1962-815848





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU