Escaping %NL; and separators with same suffix

Assume we have a schema with separator=", %NL;" and escapeCharacter="\" and the following data: abc,de\CRLFfg,hij Where CRLF is the windows-style line ending. How does the escape character escape the CRLF? One interpretation is that the the escape character only escapes the following character, which means CRLF will not match %NL;, but the LF does. So you might end up with a infoset like this: <seq> <e>abc</e> <e>deCR</e> <e>fg</e> <e>hij</e> </seq> Alternatively, one might think the escape character should escape the entire CRLF, so the resulting infoset might look like this: <seq> <e>abc</e> <e>deCRLFfg</e> <e>hij</e> </seq> More generally, what happens when one separator is a suffix of another. For example: separator="XXYY YY" escapeCharacter="\" data: abc,de\XXYYfg,hij Does the escape character escape the entire XXYY, and YY is not considered as a delimiter? Does this change at all if a separator is also a prefix of another, e.g. separator="XXYY XX YY", which is very similar to %NL;? - Steve

Hi Steve The spec says that a DFDL escape character escapes the character that follows it. So in your example only the CR is escaped. %NL; allows CRLF, CR, LF (plus others) so won't match CR or CRLF but will match LF as it is not escaped. The use of %NL; is not compatible with the data - you need to say dfdl:separator =", %CR;%LF;" Same for your more general case. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Steve Lawrence <slawrence@tresys.com> To: DFDL-WG <dfdl-wg@ogf.org> Date: 04/02/2015 16:35 Subject: [DFDL-WG] Escaping %NL; and separators with same suffix Sent by: dfdl-wg-bounces@ogf.org Assume we have a schema with separator=", %NL;" and escapeCharacter="\" and the following data: abc,de\CRLFfg,hij Where CRLF is the windows-style line ending. How does the escape character escape the CRLF? One interpretation is that the the escape character only escapes the following character, which means CRLF will not match %NL;, but the LF does. So you might end up with a infoset like this: <seq> <e>abc</e> <e>deCR</e> <e>fg</e> <e>hij</e> </seq> Alternatively, one might think the escape character should escape the entire CRLF, so the resulting infoset might look like this: <seq> <e>abc</e> <e>deCRLFfg</e> <e>hij</e> </seq> More generally, what happens when one separator is a suffix of another. For example: separator="XXYY YY" escapeCharacter="\" data: abc,de\XXYYfg,hij Does the escape character escape the entire XXYY, and YY is not considered as a delimiter? Does this change at all if a separator is also a prefix of another, e.g. separator="XXYY XX YY", which is very similar to %NL;? - Steve -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Ok, thank you for the clarification. On 02/04/2015 12:12 PM, Steve Hanson wrote:
Hi Steve
The spec says that a DFDL escape character escapes the character that follows it. So in your example only the CR is escaped. %NL; allows CRLF, CR, LF (plus others) so won't match CR or CRLF but will match LF as it is not escaped.
The use of %NL; is not compatible with the data - you need to say dfdl:separator =", %CR;%LF;"
Same for your more general case.
Regards
Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848
From: Steve Lawrence <slawrence@tresys.com> To: DFDL-WG <dfdl-wg@ogf.org> Date: 04/02/2015 16:35 Subject: [DFDL-WG] Escaping %NL; and separators with same suffix Sent by: dfdl-wg-bounces@ogf.org
Assume we have a schema with separator=", %NL;" and escapeCharacter="\" and the following data:
abc,de\CRLFfg,hij
Where CRLF is the windows-style line ending.
How does the escape character escape the CRLF?
One interpretation is that the the escape character only escapes the following character, which means CRLF will not match %NL;, but the LF does. So you might end up with a infoset like this:
<seq> <e>abc</e> <e>deCR</e> <e>fg</e> <e>hij</e> </seq>
Alternatively, one might think the escape character should escape the entire CRLF, so the resulting infoset might look like this:
<seq> <e>abc</e> <e>deCRLFfg</e> <e>hij</e> </seq>
More generally, what happens when one separator is a suffix of another. For example:
separator="XXYY YY" escapeCharacter="\"
data: abc,de\XXYYfg,hij
Does the escape character escape the entire XXYY, and YY is not considered as a delimiter? Does this change at all if a separator is also a prefix of another, e.g. separator="XXYY XX YY", which is very similar to %NL;?
- Steve -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Steve Hanson
-
Steve Lawrence