Suggest new lengthKind 'patternMatch' where failure to match is a parse error

Apache Daffodil users have had quite a lot of trouble with understanding and proper use of dfdl:lengthKind 'pattern'. This is due to the fact that no match does *not* cause a parse error, but provides a successful parse with length of zero. People generally find this unintuitive given that if they wanted a zero-length match they could have defined their regex to allow a zero-length match. I have made this mistake repeatedly myself when creating DFDL schemas, and supposedly I'm an expert in DFDL. This has been so problematic that I suggest we add an additional enum for lengthKind of 'patternMatch' or maybe 'patternMatchRequired' (I'm open to suggestions for best name here) which is the same as 'pattern', except that failure to match results in a parse error instead of zero length success. I would argue that the existing 'pattern' behavior is badly designed, but it is too late to change it for DFDL v1.0. Rather, for DFDL v2.0 we should add a new correct behavior and call it 'patternMatch' and then we can deprecate the existing lengthKind 'pattern'. Has anyone else had similar difficult experience with lengthKind 'pattern' ? Mike Beckerle Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com

Mike I think this behaviour was to ensure that we could get back a zero-length result, which then enables optional & default processing etc, without the hassle of having to provide an explicit zero-length match component in the regex. I've not heard any complaints about this from IBM DFDL users though. For DFDL 2.0 agree that we can improve things but rather than a whole new lengthKind, just add a new property that says what to do when there is a no match and the regex does not include that. Regards Steve Hanson IBM Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com<mailto:smh@uk.ibm.com> tel:+44-7717-378890 Note: I work Tuesday to Friday -----Original Message----- From: Mike Beckerle <mbeckerle@apache.org<mailto:Mike%20Beckerle%20%3cmbeckerle@apache.org%3e>> Reply-To: mbeckerle@apache.org<mailto:mbeckerle@apache.org> To: DFDL-WG <dfdl-wg@ogf.org<mailto:DFDL-WG%20%3cdfdl-wg@ogf.org%3e>> Subject: [EXTERNAL] [DFDL-WG] Suggest new lengthKind 'patternMatch' where failure to match is a parse error Date: Thu, 07 Apr 2022 18:39:11 -0400 Apache Daffodil users have had quite a lot of trouble with understanding and proper use of dfdl:lengthKind 'pattern'. This is due to the fact that no match does *not* cause a parse error, but provides a successful parse with length of zero. People generally find this unintuitive given that if they wanted a zero-length match they could have defined their regex to allow a zero-length match. I have made this mistake repeatedly myself when creating DFDL schemas, and supposedly I'm an expert in DFDL. This has been so problematic that I suggest we add an additional enum for lengthKind of 'patternMatch' or maybe 'patternMatchRequired' (I'm open to suggestions for best name here) which is the same as 'pattern', except that failure to match results in a parse error instead of zero length success. I would argue that the existing 'pattern' behavior is badly designed, but it is too late to change it for DFDL v1.0. Rather, for DFDL v2.0 we should add a new correct behavior and call it 'patternMatch' and then we can deprecate the existing lengthKind 'pattern'. Has anyone else had similar difficult experience with lengthKind 'pattern' ? Mike Beckerle Apache Daffodil PMC | daffodil.apache.org<http://daffodil.apache.org/> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl<http://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl> Owl Cyber Defense | www.owlcyberdefense.com<http://www.owlcyberdefense.com/> -- dfdl-wg mailing list dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org> https://www.ogf.org/mailman/listinfo/dfdl-wg
participants (2)
-
Mike Beckerle
-
Steve Hanson