I think this behaviour was to ensure that we could get back a zero-length result, which then enables optional & default processing etc, without the hassle of having to provide an explicit zero-length match component in the regex. I've not heard any complaints about this from IBM DFDL users though. For DFDL 2.0 agree that we can improve things but rather than a whole new lengthKind, just add a new property that says what to do when there is a no match and the regex does not include that.

Apache Daffodil users have had quite a lot of trouble with understanding and proper use of dfdl:lengthKind 'pattern'.

This is due to the fact that no match does *not* cause a parse error, but provides a successful parse with length of zero. People generally find this unintuitive given that if they wanted a zero-length match they could have defined their regex to allow a zero-length match.

I have made this mistake repeatedly myself when creating DFDL schemas, and supposedly I'm an expert in DFDL.

This has been so problematic that I suggest we add an additional enum for lengthKind of 'patternMatch' or maybe 'patternMatchRequired' (I'm open to suggestions for best name here) which is the same as 'pattern', except that failure to match results in a parse error instead of zero length success.

I would argue that the existing 'pattern' behavior is badly designed, but it is too late to change it for DFDL v1.0.

Rather, for DFDL v2.0 we should add a new correct behavior and call it 'patternMatch' and then we can deprecate the existing lengthKind 'pattern'.

Has anyone else had similar difficult experience with lengthKind 'pattern' ?

Mike Beckerle

Apache Daffodil PMC | daffodil.apache.org

OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl

Owl Cyber Defense | www.owlcyberdefense.com