Some extra notes added by Steve to Mike's original answers.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 12/11/2013 12:14 -----

From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve Hanson/UK/IBM@IBMGB, Tim Kimber/UK/IBM@IBMGB,
Date: 23/10/2013 17:02
Subject: Partial progress on action 235: Request clarifications of the Escape Schemes.

Questions from Taylor Wise:

1. Does any character effectively escape the block start or is the block start inside the data a syntax error (is a valid escape block start only one that appears at the beginning of the data?).
2. Does an end block have to be followed by the delimiter (optionally padding first) or does the absence of a delimiter mean that it is not an end block?
3. Without an escape block start are the escape escape characters still interpreted.
4. Does extra escape characters require escape kind = escapeCharacter.
5. What is the appropriate behavior for the following:

Assuming escapeBlockStart="Start", escapeBlockEnd="END", escapeEscapeCharacter="!"

,4StartStart1!!23END4!ENDEND, ( comma is delimiter)

----------------------------------------------------------------

Answers:

1. Yes

2. Yes - no lookahead. To be clear:

There may not be a delimiter. When following a block start, the block end, not preceded by an escape escape character, is always interpreted as ending the content region. It may be followed by a delimiter if that is what is expected in the model; however, there is no lookahead for the delimiter or anything else.

For an element with dfdl:lengthKind='delimited', it is a processing error if the block end is not followed by optional padding and a delimiter.

3. No - without a block start nothing will be interpreted as an escape escape character nor as a block end.

4. No - For escapeKind="escapeBlock" presence of any of the extra escaped characters in the data implies that the data must be surrounded by the block start and block end when unparsing. This is stated in the spec. See dfdl:generateEscapeBlock.

5. <SMH>Taking the example above: ,4StartStart1!!23END4!ENDEND, ( comma is delimiter).
a) If the leading '4' is not trimmed as a padding character, then the escape block start is not treated as such because it is not at the start of the data, so the infoset contains '4StartStart1!!23END4!ENDEND' - no escaping is applied.
b) If the leading '4' is trimmed as a padding character, then the first 'Start' is treated as escape block start, and the first unescaped 'END' is treated as escape block end. The '4' after 'END' may also be trimmed as a padding character if justification is 'center'. But the first '!' will cause a processing error, because the next character is expected to be the ',' delimiter.
c) If the data was instead ',4StartStart1!!23!END!ENDEND,' and the leading '4' is trimmed' as per b) then the first two occurrences of 'END' are escaped by the '!' and the last 'END' is treated as the escape block end. The infoset contains 'Start1!!23ENDEND' (because
spec says the escape escape character is not removed when it does not precede the escape block end **).</SMH>

The definition of escapeKind for escapeBlock needs clarification, because it implies one can isolate the data without interpreting the block start and block end. For delimited formats, the block start and block end are integral to identification of the delimiter.

<SMH> Agree. And it's just not delimited formats. The text needs to be processed from start to finish to handle the escape escape character. </SMH>

Need to clarify that the escape escape character does not apply to the block start ever.

Consider expressing this with a small grammar.

** <SMH>Is this really correct, or should the escape escape character always be removed? </SMH>

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU