Section 17.3 Sequence groups with
delimiters
The separator region contains one of the
strings specified by the dfdl:separator property. When this property has
"" (empty string) as its value then the separator region is of
length zero.
...
‘postfix’ means the separator occurs
after each element. On parsing the separator after the last item is optional.
On unparsing the final separator will always be written.
Section 17.3.1 Sequence groups
and separators
re: ordered/suppressAtEnd : All separators
must be found in the data except that when the sequence has trailing optional
items, the separators are suppressed for any final missing items.
My interpretation of the spec:
a) If an element's parent group defines
a separator, that separator might not appear after the element. Instead,
the group might be terminated early by the group's own terminator, or by
the separator/terminator of an enclosing element/group or by end-of-data.
b) On the other hand, if an element
defines a terminator, that terminator *must* appear after the element unless
FTCBM="true" ( in which case the element and its parent group
can be terminated early by enclosing markup or end of data )
c) separatorPosition="postfix"
is not enforced rigidly. The input document can always be constructed as
if separatorPosition="infix" and the parser will not complain.
This allows early termination of a separated group by enclosing markup,
as well as by end-of-data.
d) The FTCBM flag allows the terminator
of the final group member to be missing. This allows early termination
of the group by enclosing markup or by end-of-data.
I have reservations about these rules.
- It seems overly lax to unconditionally
allow 'postfix' to behave like 'infix'. The equivalent flexibility for
a terminator requires FTCBM to be set to "true".
- FTCBM is not as useful as it seems
because it only applies to the final group member. If the final group member
is optional, the user will be forced to use a postfix separator, and will
then lose the control afforded by FTCBM.
- DFDL needs to allow strict validation
of postfix separators/terminators. I can't see a way to achieve that with
the current rules ( see example below)
Example: Lines are separated by <lf>.
Lines have up to 3 fields. Fields can be empty. Fields are always terminated
by a *.
line:field1*field2*field3*<lf>
line:field1*field2*<lf>
line:field1**field3*<lf>
With the current rules, this form of
the second line
line:field1*field2<lf>
...would also be allowed: ( assuming
that the * is defined as a postfix separator with separatorPolicy="suppressAtEnd"
)
Note that the missing * after field2
is silently tolerated because postfix separators are allowed to be omitted.
To enforce the presence of the * after
field2 it would have to be defined as a terminator on every field. But
that would remove the flexibility afforded by the use of separators ( see
third line )
A possible solution:
- Strictly enforce separatorPosition="postfix".
- Make terminators mandatory
- Remove the FTCBM flag, and replace
it with a flag which tolerates end-of-data where any separator/terminator
was expected. The definition of end-of-data would include the end of a
defined-length parent element, but would specifically exclude end-of-parent
caused by enclosing markup ( because that would re-introduce the ambiguity
which I'm trying to avoid ).
These rules are considerably tighter
than the existing ones, but I don't think they make anything impossible.
I do think they make the meaning of the various settings a lot simpler.
Terminators would be less 'optional' than before, but I suspect that the
real-world scenarios would be catered for.
Anyway - comments invited. ( invitation
unnecessary, I suspect )
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU