I think I have come across a concrete example
of this. It's from a standard called vCard 3.0 (RFC 2426 - http://tools.ietf.org/html/rfc2426).
A backslash is used to a) escape itself; b) escape in-scope delimiters;
c) indicate an embedded linefeed. The backslash would need removing
for a) and b) but not for c).
ESCAPED-CHAR = "\\" / "\;" / "\,"
/ "\n" / "\N")
; \\ encodes \, \n or
\N encodes newline
; \; encodes ;, \, encodes ,
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Steve Hanson/UK/IBM
To:
"Cranford, Jonathan
W." <jcranford@mitre.org>,
Cc:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>, dfdl-wg-bounces@ogf.org, Mike Beckerle <mbeckerle.dfdl@gmail.com>
Date:
04/06/2014 08:53
Subject:
Re: [DFDL-WG]
data where inactive escape character is to be retained
WG call 3rd June: The generalised use
case is perhaps speculative, so it was agreed not to change the DFDL spec
to handle this unless a concrete use case emerges.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
"Cranford, Jonathan
W." <jcranford@mitre.org>
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>,
"dfdl-wg@ogf.org" <dfdl-wg@ogf.org>,
Date:
20/05/2014 18:11
Subject:
Re: [DFDL-WG]
data where inactive escape character is to be retained
Sent by:
dfdl-wg-bounces@ogf.org
All,
I’ll chime in with an observation
and few more details on how Roger Costello got around a similar problem.
Observation
An escape block allows the
escapeBlockEnd to be escaped with the escapeEscapeCharacter, while allowing
the escapeEscapeCharacter itself to appear in the data without any special
semantics as long as it is NOT followed by escapeBlockEnd. (From section
13.2.1 of the spec: “On parsing the dfdl:escapeBlockStart is removed from
the beginning of the data and dfdl:escapeBlockEnd is removed from end of
the data and any dfdl:escapeEscapeCharacters are removed when they precede
a dfdl:escapeBlockEnd.”)
This is really similar to
the problem that Mike posed to the group below, where an escape character
is sometimes an escape character but sometimes isn’t.
While an escape block might
not be suitable in all circumstances, the original problem that sparked
Mike’s post was amenable to using an escape block, and that is how Roger
Costello got around the problem.
Some more details
Roger Costello was using
a quotation mark (“) as the initiator and terminator for quoted values
in a data format. In this format, quotation marks can be escaped
with a backslash (\); however, within a quoted string, the data could have
a backslash as a normal data character (e.g. \n, representing two characters,
not a single newline character).
Roger posed his challenge
to the Daffodil team, and then Mike created the example below to demonstrate
the problem to the DFDL WG. In contrast to Mike’s example, Roger
was having the problem with initiators and terminators, not a separator.
At the time, we thought that that an escape block couldn’t be applied
to the data format in question, so Mike may have altered the problem in
order to prevent an escape block from clouding the issue as posed to the
WG.
It turns out, after more
analysis, that an escape block could be used, and that solved the problem:
escapeBlockStart=”"”
escapeBlockEnd=”"” escapeEscapeCharacter=”\”
Closing Observations
In general, DFDL supports
two different escape schemes with different behavior for the escape character.
* When escapeKind=”escapeCharacter”,
the escape character is always an escape character.
* When escapeKind=”escapeBlock”,
the escape character (escapeEscapeCharacter) is only an escape character
in front of escapeBlockEnd.
In this case, we were able
to use an escape block to model the data format. While there may
be a data format that has a character that is sometimes an escape character
and sometimes isn’t, without a real world example, I echo Mike’s hesitance
to add this feature to DFDL.
HTH,
Jonathan Cranford
From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org]
On Behalf Of Mike Beckerle
Sent: Friday, May 02, 2014 2:47 PM
To: dfdl-wg@ogf.org
Subject: [DFDL-WG] data where inactive escape character is to be retained
We have data that has ; as separator
and has fields that look like this:
abcd \; efgh
That's a single field.
The backslash escapes the ; so
that the data is abcd ; efgh.
This same data set also has
abcd \n efgh
Here the backslash precedes an
ordinary non-delimiter. The data is supposed to be abcd \n efgh. That is,
this data set requires the backslash to be retained in the data when it
is not preceding the start of a delimiter.
Am I missing something or is it
impossible to model this?
It would seem there needs to be
a flag to indicate whether the escape characters that don't actually escape
a delimiter are to be retained or not.
Mike Beckerle | OGF DFDL Workgroup
Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the
DFDL Workgroup's email discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU