The DFDL spec isn't clear on when encodingErrorPolicy 'error' is allowed to cause an error, and when one must be suppressed, if the implementation pre-decodes data into characters.

Example:

Suppose you have what turns out to be 8 characters of text, followed by some binary data.

Suppose a DFDL implementation happens to always try to fill a buffer of 64 decoded characters, just for efficiency reasons.

Depending on what is in the binary data, this may parse the 8 characters of text without error, but subsequently hit a decode error, because it has strayed into binary data past the text.

There is no actual decode error in the data stream, because parsing should determine there are only 8 characters of text, and then switch to parsing the binary data using binary means.

The DFDL spec doesn't say this isn't allowed to cause a decode error. Perhaps it is implied somewhere? But I didn't find it.

The DFDL spec does point out that for asserts/discriminators with testKind pattern, that pattern matching may cause decode errors. But again, suppose the regex matching library an implementation uses happens to pre-fetch and pre-decode a bunch of characters, but the regex matching library then finds a match that is quite short, and stops well before the characters that were pre-decoded that caused a decode error.

It would seem to me that this sort of pre-decoding should not cause decode errors. but the DFDL spec doesn't state that explicitly.

comments?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com

Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy