Proposal updated below to reflect recent WG calls - changes in red type.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: dfdl-wg@ogf.org
Date: 28/08/2014 15:47
Subject: Fw: [DFDL-WG] Action 258 - data where inactive escape
character is to be retained
Strawman proposal for action 258.
258
Consider allowing more flexible escapeCharacter schemes (Mike)
6/5: Motivated by example of an escape character which is active when in
front of an in-scope delimiter, but not when in front of another
character.
20/5: Can't model Mike's example with current facilities, but Mike's
example is a generalisation of a particular MITRE example. Do we really
need this? Jonathan to follow up.
3/6: Closed. Jonathan has provided the background to the MITRE example
which was really about initiators and terminators. The generalised use
case is perhaps speculative, so it was agreed not to change the DFDL spec
to handle this unless a concrete use case emerges.
17/6: Re-opened. vCard 3.0 (http://tools.ietf.org/html/rfc2426) is an
example of a format that exhibits the need for this. Need a proposal to
handle this case, and which fits in with the existing
extraEscapedCharacters and escapeEscapeCharacter property. Noted that
using lengthKind 'pattern' is sometimes a way of working round this kind
of thing.
...
15/7: No progress
22/7: Steve has started to write up a proposal.
...
26/8: No further progress
2/9: Strawman proposal sent by Steve for comment. Concern over name of new
property. Review for next week.
...
16/9: No further progress
23/9: Discussed the naming issue, Steve to send revised proposal.
Candidate for deferral to 1.1 ?
14/10: Agreed that it would be good to be able to handle vCard 3.0. With
Steve to revise the proposal.
New property dfdl:applyEscapeCharacter added. The description of
dfdl:escapeKind is updated. No changes to dfdl:generateEscapeBlock but
I've added it below by way of comparison.
Property Name
Description
escapeKind
Enum
Valid values 'escapeCharacter', 'escapeBlock'
The type of escape mechanism defined in the escape scheme
When 'escapeCharacter': On unparsing a single character of the data is
escaped by adding a dfdl:escapeCharacter or dfdl:escapeEscapeCharacter
immediately before it. The characters to escape are determined by property
dfdl:escapeCharacterPolicy.
On parsing any in-scope terminating delimiter encountered in the data is
not interpreted as such when it is immediately preceded by the
dfdl:escapeCharacter (when not itself preceded by the
dfdl:escapeEscapeCharacter). Occurrences of the dfdl:escapeCharacter and
dfdl:escapeEscapeCharacter are removed from the data as determined by
property dfdl:escapeCharacterPolicy unless the dfdl:escapeCharacter is
preceded by the dfdl:escapeEscapeCharacter, or the
dfdl:escapeEscapeCharacter does not precede the dfdl:escapeCharacter,
respectively.
When 'escapeBlock': On unparsing the entire data are escaped by adding
dfdl:escapeBlockStart to the beginning and dfdl:escapeBlockEnd to the end
of the data. The data is either always escaped or escaped when needed as
specified by dfdl:generateEscapeBlock. If the data is escaped and contains
the dfdl:escapeBlockEnd then first character of each appearance of the
dfdl:escapeBlockEnd is escaped by the dfdl:escapeEscapeCharacter.
On parsing the dfdl:escapeBlockStart string must be the first characters
in the (trimmed) data in order to activate the escape scheme. The
dfdl:escapeBlockStart string is removed from the beginning of the data.
Until a matching dfdl:escapeBlockEnd string (that is, one not preceded by
the dfdl:escapeEscapeCharacter) is found in the data, any in-scope
terminating delimiter encountered in the data is not interpreted as such,
and any dfdl:escapeEscapeCharacters are removed when they precede an
dfdl:escapeBlockEnd string. The matching dfdl:escapeBlockEnd string is
removed from the data.. The matching dfdl:escapeBlockEnd does not have to
be the last characters in the (trimmed) data in order to de-activate the
escape scheme. A dfdl:escapeBlockStart occurring anywhere in the data
other than the first characters has no significance.
Annotation: dfdl:escapeScheme
escapeCharacterPolicy
Enum
Valid values 'all', 'delimiters'
Controls when escape characters are removed during parsing, and output
during unparsing, when dfdl:escapeKind is 'escapeCharacter'.
When 'all':
During unparsing the following are escaped as described in dfdl:escapeKind
when they are in the data.
· Any in-scope terminating delimiter by escaping its first
character.
· dfdl:escapeCharacter (escaped by dfdl:escapeEscapeCharacter)
· any dfdl:extraEscapedCharacters
During parsing, occurrences of dfdl:escapeCharacter and
dfdl:escapeEscapeCharacter are interpreted and removed from the data as
described in dfdl:escapeKind.
When 'delimiters':
During unparsing the following are escaped as described in dfdl:escapeKind
when they are in the data.
· Any in-scope terminating delimiter by escaping its first
character.
· dfdl:escapeCharacter (escaped by dfdl:escapeEscapeCharacter)
During parsing, occurrences of dfdl:escapeCharacter and
dfdl:escapeEscapeCharacter are interpreted and removed from the data as
described in dfdl:escapeKind, except that dfdl:escapeCharacter is only
removed when it immediately precedes an in-scope terminating delimiter.
Annotation: dfdl:escapeScheme
generateEscapeBlock
Enum
Valid values 'always', 'whenNeeded'
Controls when escaping is used on unparsing when dfdl:escapeKind is
'escapeBlock'.
If 'always' then escaping always occurs as described in dfdl:escapeKind.
If 'whenNeeded' then escaping occurs as described in dfdl:escapeKind when
the data contains any of the following:
· any in-scope terminating delimiter
· dfdl:escapeBlockStart at the start of the data
· any dfdl:extraEscapedCharacters
Annotation: dfdl:escapeScheme
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 22/07/2014 15:10 -----
From: Steve Hanson/UK/IBM
To: "Cranford, Jonathan W."