Action 278: Unparser maxOccurs issue

Please have a position on the below proposal from IBM for this week's WG call. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Steve Hanson/UK/IBM To: DFDL-WG <dfdl-wg@ogf.org> Date: 12/01/2015 16:56 Subject: Unparser maxOccurs issue Possible change to spec needed., where it describes what happens when maxOccurs is exceeded during unparsing for occursCountKind 'fixed' and 'implicit' (and by implication scalar elements). It currently says it is a processing error. I think it is better to say that the unparser moves on when maxOccurs is reached. This makes the behaviour analogous to parsing, when it does not try to parse beyond maxOccurs and moves on. The current unparser wording is based on the assumption that any next occurrence of the element in the infoset must be an error, but this is not true - the next occurrence could be an occurrence of a same named element later in the schema. An obvious example is: <xs:element name="data" minOccurs="2"maxOccurs="2" dfdl:occursCountKind="fixed" ... /> <xs:element name="stuff" minOccurs="0" dfdl:occursCountKind="implicit" ... /> <xs:element name="data" maxOccurs="2" maxOccurs="2" dfdl:occursCountKind="fixed" ... /> with an infoset where 'stuff' is missing: message_data data - xx1 data - xx2 data - yy1 data - yy2 A more interesting example is this, taken from MIL-STD-2045 schema (my bold comments added): <xsd:sequence dfdl:separator=""> <!-- Element Value1 --> <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value1" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> <!-- Element Value2 --> <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value2" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> <!-- Element Value3 --> <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value3" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> </xsd:sequence> ...where the FPI_true and FPI_false elements are defined in their own global groups. <xsd:group name="FPI_true"> <xsd:sequence dfdl:separator=""> <xsd:element default="true" dfdl:length="1" dfdl:lengthKind="explicit" dfdl:textBooleanFalseRep="0" dfdl:textBooleanTrueRep="1" name="FPI_true" type="xsd:boolean"> <xsd:annotation> <xsd:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:discriminator>{. eq fn:true()}</dfdl:discriminator> </xsd:appinfo> </xsd:annotation> </xsd:element> </xsd:sequence> </xsd:group> <xsd:group name="FPI_false"> <xsd:sequence dfdl:separator=""> <xsd:element default="false" dfdl:length="1" dfdl:lengthKind="explicit" dfdl:textBooleanFalseRep="0" dfdl:textBooleanTrueRep="1" name="FPI_false" type="xsd:boolean"> </xsd:element> </xsd:sequence> </xsd:group> If the infoset looked like the following an error would be given, whereas it is valid because the second FPI_false is for Value2: message_prefixedOccurs FPI_false - false FPI_false - false FPI_true - true Value3 - 9999 Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

I read this over. I agree that current language is not symmetric for unparsing with parsing, and in this case it should be symmetric. If the parser is going to stop looking for more instances when maxOccurs is reached, then the unparser should stop output of those instances when maxOccurs is reached. There is a fair amount of complexity to assigning the proper schema component for use during unparsing, given an infoset, but counting numbers of occurrences is certainly not the most complex such thing, and parsing has to do this counting. So unparsing should as well. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Mon, Jan 19, 2015 at 1:01 PM, Steve Hanson <smh@uk.ibm.com> wrote:
Please have a position on the below proposal from IBM for this week's WG call.
Regards
Steve Hanson Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> IBM SWG, Hursley, UK *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848
From: Steve Hanson/UK/IBM To: DFDL-WG <dfdl-wg@ogf.org> Date: 12/01/2015 16:56 Subject: Unparser maxOccurs issue ------------------------------
Possible change to spec needed., where it describes what happens when maxOccurs is exceeded during unparsing for occursCountKind 'fixed' and 'implicit' (and by implication scalar elements).
It currently says it is a processing error. I think it is better to say that the unparser moves on when maxOccurs is reached. This makes the behaviour analogous to parsing, when it does not try to parse beyond maxOccurs and moves on. The current unparser wording is based on the assumption that any next occurrence of the element in the infoset must be an error, but this is not true - the next occurrence could be an occurrence of a same named element later in the schema.
An obvious example is:
<xs:element name="data" minOccurs="2"maxOccurs="2" dfdl:occursCountKind="fixed" ... /> <xs:element name="stuff" minOccurs="0" dfdl:occursCountKind="implicit" ... /> <xs:element name="data" maxOccurs="2" maxOccurs="2" dfdl:occursCountKind="fixed" ... />
with an infoset where 'stuff' is missing:
message_data data - xx1 data - xx2 data - yy1 data - yy2
A more interesting example is this, taken from MIL-STD-2045 schema (my bold comments added):
<xsd:sequence dfdl:separator=""> *<!-- Element Value1 -->* <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value1" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> *<!-- Element Value2 -->* <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value2" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> *<!-- Element Value3 -->* <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value3" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> </xsd:sequence>
...where the FPI_true and FPI_false elements are defined in their own global groups.
<xsd:group name="FPI_true"> <xsd:sequence dfdl:separator=""> <xsd:element default="true" dfdl:length="1" dfdl:lengthKind="explicit" dfdl:textBooleanFalseRep="0" dfdl:textBooleanTrueRep="1" name="FPI_true" type="xsd:boolean"> <xsd:annotation> <xsd:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:discriminator>{. eq fn:true()}</dfdl:discriminator> </xsd:appinfo> </xsd:annotation> </xsd:element> </xsd:sequence> </xsd:group>
<xsd:group name="FPI_false"> <xsd:sequence dfdl:separator=""> <xsd:element default="false" dfdl:length="1" dfdl:lengthKind="explicit" dfdl:textBooleanFalseRep="0" dfdl:textBooleanTrueRep="1" name="FPI_false" type="xsd:boolean"> </xsd:element> </xsd:sequence> </xsd:group>
If the infoset looked like the following an error would be given, whereas it is valid because the second FPI_false is for Value2:
message_prefixedOccurs FPI_false - false FPI_false - false FPI_true - true Value3 - 9999
Regards
Steve Hanson Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> IBM SWG, Hursley, UK *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

I wanted to follow up on this issue, as I am writing the unparser implementation for daffodil for arrays currently. Specifically, if I have an array with dfdl:occursCountKind="implicit", maxOccurs="2", but there are three occurrences in the infoset, but there is no possible element declaration following the array that could account for an additional element after two, is this a processing error, or a validation error? <element name="root"> <complexType> <sequence> <element name="a" type="string" minOccurs="0" maxOccurs="2"/> </sequence> </complexType> </element> If the infoset contains <root><a>1</a><a>2</a><a>3</a></root> is this a processing error all the time irrespective of whether there is any following sibling element also named "a". Or is it a validation error because there is no other option for "a" except the array? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Thu, Jan 29, 2015 at 4:41 PM, Mike Beckerle <mbeckerle.dfdl@gmail.com> wrote:
I read this over. I agree that current language is not symmetric for unparsing with parsing, and in this case it should be symmetric. If the parser is going to stop looking for more instances when maxOccurs is reached, then the unparser should stop output of those instances when maxOccurs is reached.
There is a fair amount of complexity to assigning the proper schema component for use during unparsing, given an infoset, but counting numbers of occurrences is certainly not the most complex such thing, and parsing has to do this counting. So unparsing should as well.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>
On Mon, Jan 19, 2015 at 1:01 PM, Steve Hanson <smh@uk.ibm.com> wrote:
Please have a position on the below proposal from IBM for this week's WG call.
Regards
Steve Hanson Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> IBM SWG, Hursley, UK *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848
From: Steve Hanson/UK/IBM To: DFDL-WG <dfdl-wg@ogf.org> Date: 12/01/2015 16:56 Subject: Unparser maxOccurs issue ------------------------------
Possible change to spec needed., where it describes what happens when maxOccurs is exceeded during unparsing for occursCountKind 'fixed' and 'implicit' (and by implication scalar elements).
It currently says it is a processing error. I think it is better to say that the unparser moves on when maxOccurs is reached. This makes the behaviour analogous to parsing, when it does not try to parse beyond maxOccurs and moves on. The current unparser wording is based on the assumption that any next occurrence of the element in the infoset must be an error, but this is not true - the next occurrence could be an occurrence of a same named element later in the schema.
An obvious example is:
<xs:element name="data" minOccurs="2"maxOccurs="2" dfdl:occursCountKind="fixed" ... /> <xs:element name="stuff" minOccurs="0" dfdl:occursCountKind="implicit" ... /> <xs:element name="data" maxOccurs="2" maxOccurs="2" dfdl:occursCountKind="fixed" ... />
with an infoset where 'stuff' is missing:
message_data data - xx1 data - xx2 data - yy1 data - yy2
A more interesting example is this, taken from MIL-STD-2045 schema (my bold comments added):
<xsd:sequence dfdl:separator=""> *<!-- Element Value1 -->* <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value1" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> *<!-- Element Value2 -->* <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value2" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> *<!-- Element Value3 -->* <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value3" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> </xsd:sequence>
...where the FPI_true and FPI_false elements are defined in their own global groups.
<xsd:group name="FPI_true"> <xsd:sequence dfdl:separator=""> <xsd:element default="true" dfdl:length="1" dfdl:lengthKind="explicit" dfdl:textBooleanFalseRep="0" dfdl:textBooleanTrueRep="1" name="FPI_true" type="xsd:boolean"> <xsd:annotation> <xsd:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:discriminator>{. eq fn:true()}</dfdl:discriminator> </xsd:appinfo> </xsd:annotation> </xsd:element> </xsd:sequence> </xsd:group>
<xsd:group name="FPI_false"> <xsd:sequence dfdl:separator=""> <xsd:element default="false" dfdl:length="1" dfdl:lengthKind="explicit" dfdl:textBooleanFalseRep="0" dfdl:textBooleanTrueRep="1" name="FPI_false" type="xsd:boolean"> </xsd:element> </xsd:sequence> </xsd:group>
If the infoset looked like the following an error would be given, whereas it is valid because the second FPI_false is for Value2:
message_prefixedOccurs FPI_false - false FPI_false - false FPI_true - true Value3 - 9999
Regards
Steve Hanson Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> IBM SWG, Hursley, UK *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

This is a processing error. The unparser has stopped matching against 'a' because it has reached maxOccurs. The unparser then moves on to the next element in the sequence but there isn't one. The infoset therefore contains an item that does not match anything in the model, hence processing error. Further, it is a processing error all the time. It is not possible to receive a 'maxOccurs exceeded' validation error when parsing or unparsing when OCK is 'fixed' or 'implicit'. I will add to agenda for today's call. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson/UK/IBM@IBMGB Cc: DFDL-WG <dfdl-wg@ogf.org> Date: 23/03/2015 18:36 Subject: Re: [DFDL-WG] Action 278: Unparser maxOccurs issue I wanted to follow up on this issue, as I am writing the unparser implementation for daffodil for arrays currently. Specifically, if I have an array with dfdl:occursCountKind="implicit", maxOccurs="2", but there are three occurrences in the infoset, but there is no possible element declaration following the array that could account for an additional element after two, is this a processing error, or a validation error? <element name="root"> <complexType> <sequence> <element name="a" type="string" minOccurs="0" maxOccurs="2"/> </sequence> </complexType> </element> If the infoset contains <root><a>1</a><a>2</a><a>3</a></root> is this a processing error all the time irrespective of whether there is any following sibling element also named "a". Or is it a validation error because there is no other option for "a" except the array? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy On Thu, Jan 29, 2015 at 4:41 PM, Mike Beckerle <mbeckerle.dfdl@gmail.com> wrote: I read this over. I agree that current language is not symmetric for unparsing with parsing, and in this case it should be symmetric. If the parser is going to stop looking for more instances when maxOccurs is reached, then the unparser should stop output of those instances when maxOccurs is reached. There is a fair amount of complexity to assigning the proper schema component for use during unparsing, given an infoset, but counting numbers of occurrences is certainly not the most complex such thing, and parsing has to do this counting. So unparsing should as well. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy On Mon, Jan 19, 2015 at 1:01 PM, Steve Hanson <smh@uk.ibm.com> wrote: Please have a position on the below proposal from IBM for this week's WG call. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Steve Hanson/UK/IBM To: DFDL-WG <dfdl-wg@ogf.org> Date: 12/01/2015 16:56 Subject: Unparser maxOccurs issue Possible change to spec needed., where it describes what happens when maxOccurs is exceeded during unparsing for occursCountKind 'fixed' and 'implicit' (and by implication scalar elements). It currently says it is a processing error. I think it is better to say that the unparser moves on when maxOccurs is reached. This makes the behaviour analogous to parsing, when it does not try to parse beyond maxOccurs and moves on. The current unparser wording is based on the assumption that any next occurrence of the element in the infoset must be an error, but this is not true - the next occurrence could be an occurrence of a same named element later in the schema. An obvious example is: <xs:element name="data" minOccurs="2"maxOccurs="2" dfdl:occursCountKind="fixed" ... /> <xs:element name="stuff" minOccurs="0" dfdl:occursCountKind="implicit" ... /> <xs:element name="data" maxOccurs="2" maxOccurs="2" dfdl:occursCountKind="fixed" ... /> with an infoset where 'stuff' is missing: message_data data - xx1 data - xx2 data - yy1 data - yy2 A more interesting example is this, taken from MIL-STD-2045 schema (my bold comments added): <xsd:sequence dfdl:separator=""> <!-- Element Value1 --> <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value1" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> <!-- Element Value2 --> <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value2" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> <!-- Element Value3 --> <xsd:choice> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_true"/> <xsd:element dfdl:length="4" dfdl:lengthKind="explicit" name="Value3" type="xsd:string"/> </xsd:sequence> <xsd:sequence dfdl:separator=""> <xsd:group ref="FPI_false"/> </xsd:sequence> </xsd:choice> </xsd:sequence> ...where the FPI_true and FPI_false elements are defined in their own global groups. <xsd:group name="FPI_true"> <xsd:sequence dfdl:separator=""> <xsd:element default="true" dfdl:length="1" dfdl:lengthKind="explicit" dfdl:textBooleanFalseRep="0" dfdl:textBooleanTrueRep="1" name="FPI_true" type="xsd:boolean"> <xsd:annotation> <xsd:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:discriminator>{. eq fn:true()}</dfdl:discriminator> </xsd:appinfo> </xsd:annotation> </xsd:element> </xsd:sequence> </xsd:group> <xsd:group name="FPI_false"> <xsd:sequence dfdl:separator=""> <xsd:element default="false" dfdl:length="1" dfdl:lengthKind="explicit" dfdl:textBooleanFalseRep="0" dfdl:textBooleanTrueRep="1" name="FPI_false" type="xsd:boolean"> </xsd:element> </xsd:sequence> </xsd:group> If the infoset looked like the following an error would be given, whereas it is valid because the second FPI_false is for Value2: message_prefixedOccurs FPI_false - false FPI_false - false FPI_true - true Value3 - 9999 Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Mike Beckerle
-
Steve Hanson