Proposal below accepted. Issue raised: http://redmine.ogf.org/issues/237

Regards
 
Steve Hanson
Architect,
IBM DFDL
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848




From:        Steve Hanson/UK/IBM
To:        DFDL-WG <dfdl-wg@ogf.org>
Date:        31/10/2014 11:20
Subject:        Fw: [DFDL-WG] Action 258 - data where inactive escape character is to be retained



On last WG call there was still some concern about the names used.  We will go with the proposal unless better names are suggested.

Regards
 
Steve Hanson
Architect,
IBM DFDL
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848

----- Forwarded by Steve Hanson/UK/IBM on 31/10/2014 11:18 -----

From:        Steve Hanson/UK/IBM
To:        dfdl-wg@ogf.org
Date:        24/10/2014 13:22
Subject:        Re: Fw: [DFDL-WG] Action 258 - data where inactive escape character is to be retained



Proposal updated below to reflect recent WG calls - changes in red type.

Regards
 
Steve Hanson
Architect,
IBM DFDL
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848





From:        Steve Hanson/UK/IBM
To:        dfdl-wg@ogf.org
Date:        28/08/2014 15:47
Subject:        Fw: [DFDL-WG] Action 258 - data where inactive escape character is to be retained



Strawman proposal for action 258.
258
Consider allowing more flexible escapeCharacter schemes (Mike)
6/5: Motivated by example of an escape character which is active when in front of an in-scope delimiter, but not when in front of another character.
20/5: Can't model Mike's example with current facilities, but Mike's example is a generalisation of a particular MITRE example. Do we really need this? Jonathan to follow up.
3/6: Closed. Jonathan has provided the background to the MITRE example which was really about initiators and terminators. The generalised use case is perhaps speculative, so it was agreed not to change the DFDL spec to handle this unless a concrete use case emerges.
17/6: Re-opened. vCard 3.0 (http://tools.ietf.org/html/rfc2426) is an example of a format that exhibits the need for this. Need a proposal to handle this case, and which fits in with the existing extraEscapedCharacters and escapeEscapeCharacter property. Noted that using lengthKind 'pattern' is sometimes a way of working round this kind of thing.
...
15/7: No progress
22/7: Steve has started to write up a proposal.
...
26/8: No further progress
2/9: Strawman proposal sent by Steve for comment. Concern over name of new property. Review for next week.
...
16/9: No further progress
23/9: Discussed the naming issue, Steve to send revised proposal. Candidate for deferral to 1.1 ?
14/10: Agreed that it would be good to be able to handle vCard 3.0. With Steve to revise the proposal.


New property dfdl:applyEscapeCharacter added. The description of dfdl:escapeKind is updated. No changes to dfdl:generateEscapeBlock but I've added it below by way of comparison.

Property Name Description
escapeKind Enum

Valid values 'escapeCharacter', 'escapeBlock'

The type of escape mechanism defined in the escape scheme

When 'escapeCharacter': On unparsing a single character of the data is escaped by adding a dfdl:escapeCharacter or dfdl:escapeEscapeCharacter immediately before it. The characters to escape are determined by property dfdl:escapeCharacterPolicy.

On parsing any in-scope terminating delimiter encountered in the data is not interpreted as such when it is immediately preceded by the dfdl:escapeCharacter (when not itself preceded by the dfdl:escapeEscapeCharacter). Occurrences of the dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed from the data as determined by property dfdl:escapeCharacterPolicy unless the dfdl:escapeCharacter is preceded by the dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter does not precede the dfdl:escapeCharacter, respectively.

When 'escapeBlock': On unparsing the entire data are escaped by adding dfdl:escapeBlockStart to the beginning and dfdl:escapeBlockEnd to the end of the data. The data is either always escaped or escaped when needed as specified by dfdl:generateEscapeBlock. If the data is escaped and contains the dfdl:escapeBlockEnd then first character of each appearance of the dfdl:escapeBlockEnd is escaped by the dfdl:escapeEscapeCharacter.

On parsing the dfdl:escapeBlockStart string must be the first characters in the (trimmed) data in order to activate the escape scheme. The dfdl:escapeBlockStart string is removed from the beginning of the data. Until a matching dfdl:escapeBlockEnd string (that is, one not preceded by the dfdl:escapeEscapeCharacter) is found in the data, any in-scope terminating delimiter encountered in the data is not interpreted as such, and any dfdl:escapeEscapeCharacters are removed when they precede an dfdl:escapeBlockEnd string. The matching dfdl:escapeBlockEnd string is removed from the data.. The matching dfdl:escapeBlockEnd does not have to be the last characters in the (trimmed) data in order to de-activate the escape scheme. A dfdl:escapeBlockStart occurring anywhere in the data other than the first characters has no significance.

Annotation: dfdl:escapeScheme

escapeCharacterPolicy Enum

Valid values 'all', 'delimiters'

Controls when escape characters are removed during parsing, and output during unparsing, when dfdl:escapeKind is 'escapeCharacter'.

When 'all':

During unparsing the following are escaped as described in dfdl:escapeKind when they are in the data.

·        Any in-scope terminating delimiter by escaping its first character.

·        dfdl:escapeCharacter (escaped by dfdl:escapeEscapeCharacter)

·        any dfdl:extraEscapedCharacters

During parsing, occurrences of dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are interpreted and removed from the data as described in dfdl:escapeKind.

When 'delimiters':

During unparsing the following are escaped as described in dfdl:escapeKind when they are in the data.

·        Any in-scope terminating delimiter by escaping its first character.

·        dfdl:escapeCharacter (escaped by dfdl:escapeEscapeCharacter)

During parsing, occurrences of dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are interpreted and removed from the data as described in dfdl:escapeKind, except that dfdl:escapeCharacter is only removed when it immediately precedes an in-scope terminating delimiter.

Annotation: dfdl:escapeScheme

generateEscapeBlock Enum

Valid values 'always',  'whenNeeded'

Controls when escaping is used on unparsing when dfdl:escapeKind is 'escapeBlock'.

If 'always' then escaping always occurs as described in dfdl:escapeKind.  

If 'whenNeeded' then escaping occurs as described in dfdl:escapeKind when the data contains any of the following:

·        any in-scope terminating delimiter

·        dfdl:escapeBlockStart at the start of the data

·        any dfdl:extraEscapedCharacters

Annotation: dfdl:escapeScheme




Regards
 
Steve Hanson
Architect,
IBM DFDL
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848

----- Forwarded by Steve Hanson/UK/IBM on 22/07/2014 15:10 -----

From:        Steve Hanson/UK/IBM
To:        "Cranford, Jonathan W." <jcranford@mitre.org>,
Cc:        "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>
Date:        13/06/2014 17:28
Subject:        Re: [DFDL-WG] data where inactive escape character is to be retained




I think I have come across a concrete example of this. It's from a standard called vCard 3.0 (RFC 2426 - http://tools.ietf.org/html/rfc2426). A backslash is used to a) escape itself; b) escape in-scope delimiters;  c) indicate an embedded linefeed. The backslash would need removing for a) and b) but not for c).

ESCAPED-CHAR = "\\" / "\;" / "\," / "\n" / "\N")

        ; \\ encodes \, \n or \N encodes newline
       ; \; encodes ;, \, encodes ,

Regards
 
Steve Hanson
Architect,
IBM DFDL
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848





From:        Steve Hanson/UK/IBM
To:        "Cranford, Jonathan W." <jcranford@mitre.org>,
Cc:        "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, dfdl-wg-bounces@ogf.org, Mike Beckerle <mbeckerle.dfdl@gmail.com>
Date:        04/06/2014 08:53
Subject:        Re: [DFDL-WG] data where inactive escape character is to be retained



WG call 3rd June: The generalised use case is perhaps speculative, so it was agreed not to change the DFDL spec to handle this unless a concrete use case emerges.

Regards
 
Steve Hanson
Architect,
IBM DFDL
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848





From:        "Cranford, Jonathan W." <jcranford@mitre.org>
To:        Mike Beckerle <mbeckerle.dfdl@gmail.com>, "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>,
Date:        20/05/2014 18:11
Subject:        Re: [DFDL-WG] data where inactive escape character is to be retained
Sent by:        dfdl-wg-bounces@ogf.org




All,
 
I’ll chime in with an observation and few more details on how Roger Costello got around a similar problem.
 
Observation
An escape block allows the escapeBlockEnd to be escaped with the escapeEscapeCharacter, while allowing the escapeEscapeCharacter itself to appear in the data without any special semantics as long as it is NOT followed by escapeBlockEnd. (From section 13.2.1 of the spec: “On parsing the dfdl:escapeBlockStart is removed from the beginning of the data and dfdl:escapeBlockEnd is removed from end of the data and any dfdl:escapeEscapeCharacters are removed when they precede a dfdl:escapeBlockEnd.”)
 
This is really similar to the problem that Mike posed to the group below, where an escape character is sometimes an escape character but sometimes isn’t.  
 
While an escape block might not be suitable in all circumstances, the original problem that sparked Mike’s post was amenable to using an escape block, and that is how Roger Costello got around the problem.
 
Some more details
Roger Costello was using a quotation mark (“) as the initiator and terminator for quoted values in a data format.  In this format, quotation marks can be escaped with a backslash (\); however, within a quoted string, the data could have a backslash as a normal data character (e.g. \n, representing two characters, not a single newline character).
 
Roger posed his challenge to the Daffodil team, and then Mike created the example below to demonstrate the problem to the DFDL WG.  In contrast to Mike’s example, Roger was having the problem with initiators and terminators, not a separator.  At the time, we thought that that an escape block couldn’t be applied to the data format in question, so Mike may have altered the problem in order to prevent an escape block from clouding the issue as posed to the WG.
 
It turns out, after more analysis, that an escape block could be used, and that solved the problem:
escapeBlockStart=”&quot;” escapeBlockEnd=”&quot;” escapeEscapeCharacter=”\”
 
Closing Observations
In general, DFDL supports two different escape schemes with different behavior for the escape character.
* When escapeKind=”escapeCharacter”, the escape character is always an escape character.
* When escapeKind=”escapeBlock”, the escape character (escapeEscapeCharacter) is only an escape character in front of escapeBlockEnd.
 
In this case, we were able to use an escape block to model the data format.  While there may be a data format that has a character that is sometimes an escape character and sometimes isn’t, without a real world example, I echo Mike’s hesitance to add this feature to DFDL.
 
HTH,
 
Jonathan Cranford
 
 
From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On Behalf Of Mike Beckerle
Sent:
Friday, May 02, 2014 2:47 PM
To:
dfdl-wg@ogf.org
Subject:
[DFDL-WG] data where inactive escape character is to be retained

 
 
We have data that has ; as separator and has  fields that look like this:

abcd \; efgh

 
That's a single field.
The backslash escapes the ; so that the data is abcd ; efgh.
This same data set also has

abcd \n efgh

Here the backslash precedes an ordinary non-delimiter. The data is supposed to be abcd \n efgh. That is, this data set requires the backslash to be retained in the data when it is not preceding the start of a delimiter.
Am I missing something or is it impossible to model this?
 
It would seem there needs to be a flag to indicate whether the escape characters that don't actually escape a delimiter are to be retained or not.
 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy
 --
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 
https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU