Hi Steve
The spec says that a DFDL escape character
escapes the character that follows it. So in your example only the CR is
escaped. %NL; allows CRLF, CR, LF (plus others) so won't match CR
or CRLF but will match LF as it is not escaped.
The use of %NL; is not compatible with
the data - you need to say dfdl:separator =", %CR;%LF;"
Same for your more general case.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Steve Lawrence <slawrence@tresys.com>
To:
DFDL-WG <dfdl-wg@ogf.org>
Date:
04/02/2015 16:35
Subject:
[DFDL-WG] Escaping
%NL; and separators with same suffix
Sent by:
dfdl-wg-bounces@ogf.org
Assume we have a schema with separator=", %NL;"
and escapeCharacter="\"
and the following data:
abc,de\CRLFfg,hij
Where CRLF is the windows-style line ending.
How does the escape character escape the CRLF?
One interpretation is that the the escape character only escapes the
following character, which means CRLF will not match %NL;, but the LF
does. So you might end up with a infoset like this:
<seq>
<e>abc</e>
<e>deCR</e>
<e>fg</e>
<e>hij</e>
</seq>
Alternatively, one might think the escape character should escape the
entire CRLF, so the resulting infoset might look like this:
<seq>
<e>abc</e>
<e>deCRLFfg</e>
<e>hij</e>
</seq>
More generally, what happens when one separator is a suffix of another.
For example:
separator="XXYY YY" escapeCharacter="\"
data: abc,de\XXYYfg,hij
Does the escape character escape the entire XXYY, and YY is not
considered as a delimiter? Does this change at all if a separator is
also a prefix of another, e.g. separator="XXYY XX YY", which
is very
similar to %NL;?
- Steve
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU