Yes we can add some clarification, although
by implication any child that does not carry inputValueCalc will give rise
to a separator.
Same therefore true for empty choice,
though a choice with no branches is a schema definition error, as is a
choice branch that carries inputValueCalc, so it's clearer than the empty
sequence case.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson <smh@uk.ibm.com>
Cc:
dev@daffodil.apache.org,
dfdl-wg@ogf.org, dfdl-wg <dfdl-wg-bounces@ogf.org>
Date:
01/08/2018 14:57
Subject:
Re: [DFDL-WG]
Clarification needed: separator for empty sequence
Added Daffodil bug https://issues.apache.org/jira/browse/DAFFODIL-1975.
I think we should add a very positive statement like "All
sequences, including empty sequences are considered represented terms that
are required, and hence, they imply framing such as alignment and presence
of separators in separated sequences even if they have zero length. Separator
suppression based on dfdl:separatorSuppressionPolicy does not apply."
Does this go for empty choices as well? I think it should.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
On Wed, Aug 1, 2018 at 8:35 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
Any sequence, empty or otherwise, which
is a child of an outer sequence follows the separator rules for the outer
sequence.
The empty sequence in your example will require a separator when parsing,
and cause a separator when unparsing.
Tested with IBM DFDL and that's what we have implemented. Missing a leading
separator (eg a,b,c,d) gives a processing error.
I think that section 14.1 implies that, but you can add extra words to
clarify if you like.
We have not encountered the need for the concept of a DFDL non-represented
sequence.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: dfdl-wg@ogf.org
Cc: dev@daffodil.apache.org
Date: 19/07/2018
18:37
Subject: [DFDL-WG]
Clarification needed: separator for empty sequence
Sent by: "dfdl-wg"
<dfdl-wg-bounces@ogf.org>
I believe Daffodil has an incorrect behavior in the way it treats separators
today, but I want to clarify things relative to the DFDL spec before fixing
it.
Consider this element:
<xs:element name="NS_13">
<xs:complexType>
<xs:sequence dfdl:separator=","
dfdl:separatorPosition="infix"
dfdl:separatorSuppressionPolicy="trailingEmptyStrict">
<xs:sequence>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{fn:true()}</dfdl:assert>
</xs:appinfo>
</xs:annotation>
</xs:sequence>
<xs:element name="e2"
type="xs:int" minOccurs="1" maxOccurs="unbounded"
dfdl:textNumberPattern="#####"
dfdl:occursCountKind="implicit" />
</xs:sequence>
</xs:complexType>
</xs:element>
The outermost sequence has infix separator.
It's content begins with another sequence, but this sequence is empty,
having only assert statements in it.
Question: does this empty sequence cause a separator to be inserted after
it?
E.g., Should this data parse ",1,2,3" and is that initial comma
required?
An argument can be made that such empty sequences can be detected by a
DFDL implementation and treated as having "no representation"
akin to how an element with dfdl:inputValueCalc is treated. Such elements
are invisible as far as the representation is concerned. They cause no
separators in sequences, no alignment regions, etc.
In that case the data would not require nor accept the initial comma.
But the DFDL spec is not clear on whether empty sequences are treated in
this way, so I am assuming they are not treated specially, so the comma
is required because all model groups are treated as required regardless
of whether they have zero-length representations.
Is that correct? If so then daffodil has a bug, because it does NOT put
a separator in for this today.
What if the empty sequence carries a dfdl:initiator="A" annotation?
Such a sequence is still empty in the XSDL sense of empty sequence, but
clearly has a representation that is not zero length. In that case I think
the data has to be "A,1,2,3" so that there is a separator after
the empty sequence's non-ZL representation. I think this is not controversial.
Other variations:
What if the empty sequence contains elements with dfdl:inputValueCalc only.
So there is a content model, but it is all non-represented elements. Would
it still be a "DFDL empty sequence" or "DFDL non-represented
sequence"?
What if the sequence isn't empty, but contains only "optional"
elements. In that case, is the whole sequence "optional" and
so the comma becomes sensitive to the dfdl:separatorSuppressionPolicy?
I guess this means a model group inherits the optionality/required-ness
of its contents, unless it has its own required framing.
I believe the simplest thing is to require the comma here. This is, however,
a backward incompatible change to Daffodil to conform, so I want to be
sure this is correct.
The element declaration can be rewritten so that the empty sequence comes
before the separated sequence. Presumably the reason someone would insert
an empty sequence like this is to get the assert statement executed at
the beginning of the sequence, not afterwards. However, if the sequence
has separators, then you can't just insert an initial empty sequence to
carry that assert without requiring a separator.
The element can be rewritten:
<xs:element name="NS_13">
<xs:complexType>
<xs:sequence >
<xs:sequence>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{fn:true()}</dfdl:assert>
</xs:appinfo>
</xs:annotation>
</xs:sequence>
<xs:sequence
dfdl:separator="," dfdl:separatorPosition="infix"
dfdl:separatorSuppressionPolicy="trailingEmptyStrict">
<xs:element
name="e2" type="xs:int" maxOccurs="unbounded"
dfdl:textNumberPattern="#####" />
</xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
This achieves the desired behavior where the assertion executes first,
but does not cause a separator to be needed, and does so regardless of
the treatment of empty sequences.
Ultimately, we need a clarification that states either:
1) empty sequences are considered represented terms that are required,
and hence, they imply framing such as alignment and presence of separators
in separated sequences even if they have zero length.
or
2) Introduce concept "DFDL non-represented sequence". These are
sequences that have no framing, no delimiters, and have empty content model,
or only elements with dfdl:inputValueCalc or other DFDL non-represented
sequences in their content model (recursively). They have no representation,
so imply no alignment, no need of separators, etc. A group ref (hidden
or not) to a DFDL non-represented sequence is also a DFDL non-represented
sequence.
Note that these could be generalized to choice groups also.
Comments?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU