Mike
Description of escapeCharacter in 13.2.1
says "Specifies one character that
escapes the subsequent character. ".
It doesn't say that the next character has to be a delimiter or be the
start of a delimiter. So I would expect to see <x>foo/bar</x>
in the infoset.
On output, the modeller can list whatever
characters he likes to be escaped, using extraEscapeCharacters. These don't
have to be delimiters. To re-parse correctly therefore requires that the
parser obeys the escape character wherever it finds it.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date:
14/04/2014 17:23
Subject:
[DFDL-WG] Clarification
on escape schemes - lookahead distance past an
escape character
Sent by:
dfdl-wg-bounces@ogf.org
Consider this infix separator situation:
<sequence dfdl:separator="/%WSP*;/">
<element name="x" type="xs:string"/>
<element name="y" type="xs:string"
minOccurs="0"/>
</sequence>
Length kind is delimited
Suppose the escape character is "/"
Suppose the data is "foo//bar"
Should the above be
(a) <x>foo/bar</x> or
(b) <x>foobar</x>
The problem is this. In order to produce <x>foobar</x> you
have to recognize that the second / isn't in fact the start of a delimiter,
and that requires lookahead for the entire possible length of the delimiter,
and that's unbounded because of the %WSP*; in it.
I believe the semantics of escape characters should not
require looking at more than the next character after the escape character,
but this will result in the escape character behaving as if it escapes
any single character that follows it, not only the first character of a
delimiter.
Is the right behavior here clear?
...mikeb
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU