
Mike Description of escapeCharacter in 13.2.1 says "Specifies one character that escapes the subsequent character. ". It doesn't say that the next character has to be a delimiter or be the start of a delimiter. So I would expect to see <x>foo/bar</x> in the infoset. On output, the modeller can list whatever characters he likes to be escaped, using extraEscapeCharacters. These don't have to be delimiters. To re-parse correctly therefore requires that the parser obeys the escape character wherever it finds it. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Date: 14/04/2014 17:23 Subject: [DFDL-WG] Clarification on escape schemes - lookahead distance past an escape character Sent by: dfdl-wg-bounces@ogf.org Consider this infix separator situation: <sequence dfdl:separator="/%WSP*;/"> <element name="x" type="xs:string"/> <element name="y" type="xs:string" minOccurs="0"/> </sequence> Length kind is delimited Suppose the escape character is "/" Suppose the data is "foo//bar" Should the above be (a) <x>foo/bar</x> or (b) <x>foobar</x> The problem is this. In order to produce <x>foobar</x> you have to recognize that the second / isn't in fact the start of a delimiter, and that requires lookahead for the entire possible length of the delimiter, and that's unbounded because of the %WSP*; in it. I believe the semantics of escape characters should not require looking at more than the next character after the escape character, but this will result in the escape character behaving as if it escapes any single character that follows it, not only the first character of a delimiter. Is the right behavior here clear? ...mikeb Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU