I reviewed the IBM messages in the validatordescriptions.properties and modeldescriptions.properties files with an eye toward what categories things fall into that might be universal for DFDL implementations.

There are some questions below. Search for ??

I was able to infer some elements of structure in the assignment of identifiers:

Letter Codes
S - Diagnostics about the subset of XML Schema constructs used in DFDL
V - Diagnostics about validation of the DFDL schema
X - Diagnostics about DFDL schema loading

Suffixes
E = error
W = warning
D = description - extended description of error/warning sharing numeric code
A = action - suggested corrective action

Ranges
1000-1100 - subset of XML Schema (uses letter code S, so range is redundant ??)

1100-1101 - schema loading (letter code X)
1102-1103 - ??? for schema validation errors and warnings (what are these for??) ??? (letter code X)
1104-1105 - schema loading (letter code X)

1106-1149 - DFDL "physical validation" (letter code V) ?? how are these different from 1200+ ??

1150-1159 - Implementation-specific unsupported features. (letter code V)

1160 - Internal Error of the implementation (Letter code V)

1200+  validate proper use of DFDL properties (letter code V) ??? How is this different from 1106-1149 ???

1409-1420 - escape scheme related (only 11 values here, not really enough)

1557 - Internal Error of the implementation (letter code V) - expression related.

In the modeling properties, most seem indistinguishable from the 1200+ V validatordescription With one exception:

CTDM2101E - DFDL namespace prefix - This is an implementation specific limitation. That's a different category - not related to implementation-specific unsupported DFDL features, it's just an ad-hoc implementation-specific restriction.

Preliminary Conclusions:

The above suggests these categories into which Schema Definition Errors can be divided:

* Loading of DFDL schema - can't find file, file isn't a schema, etc.
* Validation of DFDL schema content with sub-categories:
** SUBSET: The subset of XML Schema allowed by DFDL
** UNSUPPORTED: Implementation-specific unsupported features of DFDL
** LIMITATION: Implementation-specific limitations/restrictions
** INTERNAL: Internal error of implementation

The specific fine granularity of the individual messages seems too fine for inclusion in the DFDL standard. In many cases these messages correspond 1-to-1 with requirements one might extract from the specification, but having more than one place where these must be maintained - the spec and also some standardized diagnostic message base - seems problematic. I would suggest that the exercise of extracting and uniquely identifying every requirement in the specification, is much the same task, and is a very big job.

But what about the categorization of the errors. Well this too seems problematic. For example, in the Daffodil implementation, many of the Subset-of-XML-Schema errors will be detected by our simply use of Xerces to validate the DFDL schema against the XML Schema of DFDL schemas. Those are not separated out from any other sort of problems - there is no distinction made between something not being in the subset of XSD, and something just being illegal. For example, both these situations will produce a near identical error message:

<complexType ... mixed='true' ...>

<complexType ...  type="foo" ...>

Both will complain about an unknown attribute that isn't allowed. The fact that 'mixed' is not in the subset of XSD that DFDL uses, and 'type' is just a mistake and it should be 'name' - no distinction is drawn between these.  Drawing this distinction is hard. We effectively have to intercept and re-classify the diagnostic messages generated from Xerces. Given our arms-length relationship to the Xerces code base, this would be fragile at best.

In addition, when in the daffodil-proper code base, the notion of "subset" isn't used in the sense of the subset of XML Schema constructs allowed in DFDL, but in the sense of subset of DFDL features that are unsupported by the implementation. That is, Daffodil uses the term Subset to mean what is called UNSUPPORTED above.

Ultimately, the question arises of what standardizing error/diagnostic messages is intended to achieve. Greater consistency between implementations is always desirable, but the cost of achieving this is very high given the effort already sunk into Daffodil. So long as the diagnostic messaging identifies the problems in the schema in a useful way to the user what more is needed? In some sense, the quality of the diagnostic messaging is an important distinguishing characteristic of different implementations. Standardizing this behavior rigorously seems inconsistent with say, Section 21 of the DFDL spec which lists a large number of features of the DFDL language as entirely optional. I would go so far as to say diagnostic messaging at all should be considered optional. One can imagine a DFDL implementation consistent with Section 21, which for all Schema Definition Errors produces a diagnostic message saying "SDE" and a file name and line number, with no further information. This isn't as nice as a descriptive diagnostic, but it is conceptually aligned with the notion that very small DFDL implementations should be possible.

The Daffodil flavor of TDML allows negative tests to express sub-strings that must appear within the diagnostic messages. So instead of a test expecting exactly code

CTDS1007E = CTDS1007E : Schema redefines are not allowed in DFDL schemas.

We would define the test to mention "redefine" and "Schema Definition Error". We would probably also include fragments of the schema/file name as required to be in the diagnostic message. Any diagnostic message or set of diagnostic messages that contain (case insensitive) these strings would pass the test. This can lead to false positive passes, but avoids the over-specification of the specific diagnostic phrases used.

Conclusion:

I think we should not further attempt to standardize diagnostic messaging as part of the DFDL standardization process.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy