Mike

Description of escapeCharacter in 13.2.1 says "Specifies one character that escapes the subsequent character. ". It doesn't say that the next character has to be a delimiter or be the start of a delimiter.  So I would expect to see <x>foo/bar</x> in the infoset.

On output, the modeller can list whatever characters he likes to be escaped, using extraEscapeCharacters. These don't have to be delimiters. To re-parse correctly therefore requires that the parser obeys the escape character wherever it finds it.

Regards
 
Steve Hanson
Architect,
IBM DFDL
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848




From:        Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:        "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>,
Date:        14/04/2014 17:23
Subject:        [DFDL-WG] Clarification on escape schemes - lookahead distance past        an escape character
Sent by:        dfdl-wg-bounces@ogf.org





Consider this infix separator situation:

<sequence dfdl:separator="/%WSP*;/">
   <element name="x" type="xs:string"/>
   <element name="y" type="xs:string" minOccurs="0"/>
</sequence>

Length kind is delimited

Suppose the escape character is "/"
Suppose the data is "foo//bar"


Should the above be
(a) <x>foo/bar</x> or
(b) <x>foobar</x>


The problem is this. In order to produce <x>foobar</x> you have to recognize that the second / isn't in fact the start of a delimiter, and that requires lookahead for the entire possible length of the delimiter, and that's unbounded because of the %WSP*;  in it.

I believe the semantics of escape characters should not require looking at more than the next character after the escape character, but this will result in the escape character behaving as if it escapes any single character that follows it, not only the first character of a delimiter.

Is the right behavior here clear?

...mikeb

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy
--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 
https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU