We discussed lax number processing a while back. We have the same issue with lax calendar parsing.

The DFDL spec has this language:

  1. Values outside valid ranges are normalized (eg, "March 32 1996" is treated as "April 1 1996")
  2. Ignoring a trailing dot after a non-numeric field
  3. Leading and trailing whitespace in the data but not in the pattern is accepted
  4. Whitespace in the pattern can be missing in the data
  5. Partial matching on literal strings. E.g., data "20130621d" allowed for pattern "yyyyMMdd'date' "
I suggest that the first line of that needs to add the word "may" as in "Additional lenient parsing behaviour when in 'lax' mode MAY include:"

This is because we've discovered that lax behavior in the ICU libraries we rely on varies from ICU-release to release. So I think we have to make the spec consistent with the idea that "lax" parsing for numbers and calendars is implementation-dependent, and really only "strict" behavior can be relied upon to be durably meaningful even across releases of the same DFDL implementation.

This doesn't make "lax" behavior entirely useless. Consider you are just doing a one-time conversion of some data from a native format to JSON, or XML, or to get it into your favorite data-integration tool. If you can get it to work one-time using "lax" that's ok, because you intend to discard the schema once your one-time conversion is complete.

So it doesn't bother me to have lax behavior. I think we just want to say that you can't rely on it to be consistent, and you can't rely on it to actually be any different from 'strict' behavior.

I think the alternatives are:
1) that we end up having to fork ICU libraries, carefully characterize lax behavior in that fork, and maintain it ourselves for ever after. (I really don't like this option. I'm just mentioning it to point out the difficulty)
2) deprecate and remove 'lax' behavior entirely and the properties associated with specifying it.
3) make 'lax' an optional DFDL feature, so implementations can choose to not bother implementing it.




Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy