Clarification: priority of delimiters vs. escape chars

Consider two elements in a sequence, dfdl:separator="/ // ///" with escapeCharacter="/" and escapeEscapeCharacter="/" I did not spot language in the spec that makes it clear what gets priority, interpreting a character as an escape char or escape-escape char, or interpreting it as a delimiter. Consider data "foo///bar". 1. I could interpret that as escapeEscape, escape, and minimum length separator "/" 2. Or I could interpret that as "///" maximum length separator, with no escaping. 3. Or it could be an SDE. To me, we'd be best off if the escapeCharacter was not allowed to be (SDE) the same as the first character of any in-scope terminating delimiter. We're not doing anyone any favors by allowing this. Likely a similar restriction would be needed for escapeBlockEnd, that the value of this property could not be a prefix of any in-scope-terminating delimiter, and escapeEscapeCharacter could not be the same as the first character of the escapeBlockEnd. E.g., dfdl:escapeBlockStart="/" escapeBlockEnd="/" dfdl:separator="/ // ///" With data "/foo///bar" Is that 1. escapeBlockStart, foo, escapeBlockEnd, separator "//" bar ? 2. Or escapeBlockStart, foo/, separator "/" bar ? 3. Or SDE? Comments? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>

Mike I ran a test with IBM DFDL using the dfdl:separator, dfdl:escapeCharacter and dfdl:escapeEscapeCharacter in your example. For each element in the sequence I received ... CTDV1466E : DFDL properties 'separator' ('/') and 'escapeCharacter' ('/') cannot include the same value. CTDV1467E : DFDL properties 'separator' ('/') and 'escapeEscapeCharacter' ('/') cannot include the same value. Changing the escape scheme to be escapeBlock as per your example, I get for each element: CTDV1467E : DFDL properties 'separator' ('/') and 'escapeEscapeCharacter' ('/') cannot include the same value. So we must have discussed this in the past and concluded that it's an SDE. I don't get an error for dfdl:escapeBlockEnd itself though, I assume because once inside an escape block we are no longer looking for delimiters. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: DFDL-WG <dfdl-wg@ogf.org> Date: 03/05/2021 20:23 Subject: [EXTERNAL] [DFDL-WG] Clarification: priority of delimiters vs. escape chars Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> Consider two elements in a sequence, dfdl:separator="/ // ///" with escapeCharacter="/" and escapeEscapeCharacter="/" I did not spot language in the spec that makes it clear what gets priority, interpreting a character as an escape char or escape-escape char, or interpreting it as a delimiter. Consider data "foo///bar". 1. I could interpret that as escapeEscape, escape, and minimum length separator "/" 2. Or I could interpret that as "///" maximum length separator, with no escaping. 3. Or it could be an SDE. To me, we'd be best off if the escapeCharacter was not allowed to be (SDE) the same as the first character of any in-scope terminating delimiter. We're not doing anyone any favors by allowing this. Likely a similar restriction would be needed for escapeBlockEnd, that the value of this property could not be a prefix of any in-scope-terminating delimiter, and escapeEscapeCharacter could not be the same as the first character of the escapeBlockEnd. E.g., dfdl:escapeBlockStart="/" escapeBlockEnd="/" dfdl:separator="/ // ///" With data "/foo///bar" Is that 1. escapeBlockStart, foo, escapeBlockEnd, separator "//" bar ? 2. Or escapeBlockStart, foo/, separator "/" bar ? 3. Or SDE? Comments? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Ok, sanity restored :-) So I would concur that this should be an SDE. We need to specify this in the DFDL spec. It's an omission. I suppose this is our first real erratum since the v1.0 spec was finalized. I don't think there is a lot of urgency to this, because well, no real format does anything this insane. It came up in corner case testing of Daffodil. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Tue, May 4, 2021 at 5:12 AM Steve Hanson <smh@uk.ibm.com> wrote:
Mike
I ran a test with IBM DFDL using the dfdl:separator, dfdl:escapeCharacter and dfdl:escapeEscapeCharacter in your example. For each element in the sequence I received ...
CTDV1466E : DFDL properties 'separator' ('/') and 'escapeCharacter' ('/') cannot include the same value. CTDV1467E : DFDL properties 'separator' ('/') and 'escapeEscapeCharacter' ('/') cannot include the same value.
Changing the escape scheme to be escapeBlock as per your example, I get for each element:
CTDV1467E : DFDL properties 'separator' ('/') and 'escapeEscapeCharacter' ('/') cannot include the same value.
So we must have discussed this in the past and concluded that it's an SDE.
I don't get an error for dfdl:escapeBlockEnd itself though, I assume because once inside an escape block we are no longer looking for delimiters.
Regards Steve Hanson
IBM Hybrid Integration, Hursley, UK Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: DFDL-WG <dfdl-wg@ogf.org> Date: 03/05/2021 20:23 Subject: [EXTERNAL] [DFDL-WG] Clarification: priority of delimiters vs. escape chars Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> ------------------------------
Consider two elements in a sequence, dfdl:separator="/ // ///" with escapeCharacter="/" and escapeEscapeCharacter="/"
I did not spot language in the spec that makes it clear what gets priority, interpreting a character as an escape char or escape-escape char, or interpreting it as a delimiter.
Consider data "foo///bar". 1. I could interpret that as escapeEscape, escape, and minimum length separator "/" 2. Or I could interpret that as "///" maximum length separator, with no escaping. 3. Or it could be an SDE.
To me, we'd be best off if the escapeCharacter was not allowed to be (SDE) the same as the first character of any in-scope terminating delimiter. We're not doing anyone any favors by allowing this.
Likely a similar restriction would be needed for escapeBlockEnd, that the value of this property could not be a prefix of any in-scope-terminating delimiter, and escapeEscapeCharacter could not be the same as the first character of the escapeBlockEnd.
E.g., dfdl:escapeBlockStart="/" escapeBlockEnd="/" dfdl:separator="/ // ///"
With data "/foo///bar"
Is that 1. escapeBlockStart, foo, escapeBlockEnd, separator "//" bar ? 2. Or escapeBlockStart, foo/, separator "/" bar ? 3. Or SDE? Comments?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | *www.owlcyberdefense.com* <http://www.owlcyberdefense.com> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the *OGF Intellectual Property Policy* <http://www.ogf.org/About/abt_policies.php> -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Mike Beckerle
-
Steve Hanson