[DFDL-WG] clarification on behavior of DFDL encodingErrorPolicy='error' and pre-decoding by implementations

8 Oct 2018

      The DFDL spec isn't clear on when encodingErrorPolicy 'error' is allowed to
cause an error, and when one must be suppressed, if the implementation
pre-decodes data into characters.

Example:

Suppose you have what turns out to be 8 characters of text, followed by
some binary data.

Suppose a DFDL implementation happens to always try to fill a buffer of 64
decoded characters, just for efficiency reasons.

Depending on what is in the binary data, this may parse the 8 characters of
text without error, but subsequently hit a decode error, because it has
strayed into binary data past the text.

There is no actual decode error in the data stream, because parsing should
determine there are only 8 characters of text, and then switch to parsing
the binary data using binary means.

The DFDL spec doesn't say this isn't allowed to cause a decode error.
Perhaps it is implied somewhere? But I didn't find it.

The DFDL spec does point out that for asserts/discriminators with testKind
pattern, that pattern matching may cause decode errors. But again, suppose
the regex matching library an implementation uses happens to pre-fetch and
pre-decode a bunch of characters, but the regex matching library then finds
a match that is quite short, and stops well before the characters that were
pre-decoded that caused a decode error.

It would seem to me that this sort of pre-decoding should not cause decode
errors. but the DFDL spec doesn't state that explicitly.

comments?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>