Any sequence, empty or otherwise, which
is a child of an outer sequence follows the separator rules for the outer
sequence.
The empty sequence in your example will
require a separator when parsing, and cause a separator when unparsing.
Tested with IBM DFDL and that's what
we have implemented. Missing a leading separator (eg a,b,c,d) gives a processing
error.
I think that section 14.1 implies that,
but you can add extra words to clarify if you like.
We have not encountered the need for
the concept of a DFDL non-represented sequence.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
dfdl-wg@ogf.org
Cc:
dev@daffodil.apache.org
Date:
19/07/2018 18:37
Subject:
[DFDL-WG] Clarification
needed: separator for empty sequence
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
I believe Daffodil has an incorrect behavior in the way
it treats separators today, but I want to clarify things relative to the
DFDL spec before fixing it.
Consider this element:
<xs:element name="NS_13">
<xs:complexType>
<xs:sequence dfdl:separator=","
dfdl:separatorPosition="infix"
dfdl:separatorSuppressionPolicy="trailingEmptyStrict">
<xs:sequence>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{fn:true()}</dfdl:assert>
</xs:appinfo>
</xs:annotation>
</xs:sequence>
<xs:element name="e2"
type="xs:int" minOccurs="1" maxOccurs="unbounded"
dfdl:textNumberPattern="#####"
dfdl:occursCountKind="implicit" />
</xs:sequence>
</xs:complexType>
</xs:element>
The outermost sequence has infix separator.
It's content begins with another sequence, but this sequence
is empty, having only assert statements in it.
Question: does this empty sequence cause a separator to
be inserted after it?
E.g., Should this data parse ",1,2,3" and is
that initial comma required?
An argument can be made that such empty sequences can
be detected by a DFDL implementation and treated as having "no representation"
akin to how an element with dfdl:inputValueCalc is treated. Such elements
are invisible as far as the representation is concerned. They cause no
separators in sequences, no alignment regions, etc.
In that case the data would not require nor accept the
initial comma.
But the DFDL spec is not clear on whether empty sequences
are treated in this way, so I am assuming they are not treated specially,
so the comma is required because all model groups are treated as required
regardless of whether they have zero-length representations.
Is that correct? If so then daffodil has a bug, because
it does NOT put a separator in for this today.
What if the empty sequence carries a dfdl:initiator="A"
annotation?
Such a sequence is still empty in the XSDL sense of empty
sequence, but clearly has a representation that is not zero length. In
that case I think the data has to be "A,1,2,3" so that there
is a separator after the empty sequence's non-ZL representation. I think
this is not controversial.
Other variations:
What if the empty sequence contains elements with dfdl:inputValueCalc
only. So there is a content model, but it is all non-represented elements.
Would it still be a "DFDL empty sequence" or "DFDL non-represented
sequence"?
What if the sequence isn't empty, but contains only "optional"
elements. In that case, is the whole sequence "optional" and
so the comma becomes sensitive to the dfdl:separatorSuppressionPolicy?
I guess this means a model group inherits the optionality/required-ness
of its contents, unless it has its own required framing.
I believe the simplest thing is to require the comma here.
This is, however, a backward incompatible change to Daffodil to conform,
so I want to be sure this is correct.
The element declaration can be rewritten so that the empty
sequence comes before the separated sequence. Presumably the reason someone
would insert an empty sequence like this is to get the assert statement
executed at the beginning of the sequence, not afterwards. However, if
the sequence has separators, then you can't just insert an initial empty
sequence to carry that assert without requiring a separator.
The element can be rewritten:
<xs:element name="NS_13">
<xs:complexType>
<xs:sequence >
<xs:sequence>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{fn:true()}</dfdl:assert>
</xs:appinfo>
</xs:annotation>
</xs:sequence>
<xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"
dfdl:separatorSuppressionPolicy="trailingEmptyStrict">
<xs:element name="e2" type="xs:int" maxOccurs="unbounded"
dfdl:textNumberPattern="#####" />
</xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
This achieves the desired behavior where the assertion
executes first, but does not cause a separator to be needed, and does so
regardless of the treatment of empty sequences.
Ultimately, we need a clarification that states either:
1) empty sequences are considered represented terms that
are required, and hence, they imply framing such as alignment and presence
of separators in separated sequences even if they have zero length.
or
2) Introduce concept "DFDL non-represented sequence".
These are sequences that have no framing, no delimiters, and have empty
content model, or only elements with dfdl:inputValueCalc or other DFDL
non-represented sequences in their content model (recursively). They have
no representation, so imply no alignment, no need of separators, etc. A
group ref (hidden or not) to a DFDL non-represented sequence is also a
DFDL non-represented sequence.
Note that these could be generalized to choice groups
also.
Comments?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU