Assuming escapeBlockStart="Start", escapeBlockEnd="END", escapeEscapeCharacter="!"
,4StartStart1!!23END4!ENDEND, ( comma is delimiter)
----------------------------------------------------------------
Answers:
1. Yes
2. Yes - no lookahead. To be clear:
There may not be a delimiter. When following a block start, the block end,
not preceded by an escape escape character, is always interpreted as ending
the content region. It may be followed by a delimiter if that is what is
expected in the model; however, there is no lookahead for the delimiter
or anything else.
For an element with dfdl:lengthKind='delimited', it is a processing error
if the block end is not followed by optional padding and a delimiter.
3. No - without a block start nothing will be interpreted as an escape
escape character nor as a block end.
4. No - For escapeKind="escapeBlock" presence
of any of the extra escaped characters in the data implies that the data
must be surrounded by the block start and block end when unparsing. This
is stated in the spec. See dfdl:generateEscapeBlock.
5. <SMH>Taking the
example above: ,4StartStart1!!23END4!ENDEND, ( comma is delimiter).
a) If the leading '4' is not trimmed as a padding
character, then the escape block start is not treated as such because it
is not at the start of the data, so the infoset contains '4StartStart1!!23END4!ENDEND'
- no escaping is applied.
b) If the leading '4' is trimmed as a padding
character, then the first 'Start' is treated as escape block start, and
the first unescaped 'END' is treated as escape block end. The '4' after
'END' may also be trimmed as a padding character if justification is 'center'.
But the first '!' will cause a processing error, because the next character
is expected to be the ',' delimiter.
c) If the data was instead ',4StartStart1!!23!END!ENDEND,'
and the leading '4' is trimmed' as per b) then the first two occurrences
of 'END' are escaped by the '!' and the last 'END' is treated as the escape
block end. The infoset contains 'Start1!!23ENDEND' (because
spec says the escape escape character is not
removed when it does not precede the escape block end **).</SMH>
The definition of escapeKind for escapeBlock needs clarification,
because it implies one can isolate the data without interpreting the block
start and block end. For delimited formats, the block start and block end
are integral to identification of the delimiter.
<SMH> Agree. And it's just not delimited
formats. The text needs to be processed from start to finish to handle
the escape escape character. </SMH>
Need to clarify that the escape escape character does not apply to the
block start ever.
Consider expressing this with a small grammar.
** <SMH>Is this really correct, or should
the escape escape character always be removed? </SMH>
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU