clarification: on suppressed ZL string/hexBinary - do we keep variable assignments?

In some situations we parse and get a successful zero-length parse for a string or hexBinary. But because the occurrence is optional, we do NOT add an element to the infoset. In that case, what happens to side-effects that occurred during the successful parse. There are two possible kinds of side-effects. Variables can be set, and a discriminator can be set to true. It seems to me that if a discriminator is set, then that *must* be preserved, and in that case it would seem the variable settings should be retained as well. Comments? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>

Whether to add a zero-length string or hexBinary to the infoset for an optional element depends on the setting of emptyValueDelimiterPolicy. A setting of 'none' stops it from being added. Regardless, it does not give a processing error, so is therefore known-to-exist, and therefore does not cause backtracking, so preserving discriminators and variables. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org Date: 24/07/2018 15:15 Subject: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> In some situations we parse and get a successful zero-length parse for a string or hexBinary. But because the occurrence is optional, we do NOT add an element to the infoset. In that case, what happens to side-effects that occurred during the successful parse. There are two possible kinds of side-effects. Variables can be set, and a discriminator can be set to true. It seems to me that if a discriminator is set, then that *must* be preserved, and in that case it would seem the variable settings should be retained as well. Comments? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

To follow up then, I have assumed that dfdl:emptyValueDelimiterPolicy isn't even examined unless the element has default="..." i.e., a non-zero-length default value specified. I.e., without a default value, there is no concept of emptiness as distinct from "normal" representation. Are you suggesting that it is also used to control when an empty string (or empty hexbinary) is accepted as a normal representation value for an optional element, vs. treated as a missing value? That's a reasonable interpretation that I would support, but I don't know that the spec says that anywhere, so we need to add a sentence. (Unless I'm missing where this is stated.) I have also thought that dfdl:emptyValueDelimiterPolicy must be combined with dfdl:initiator and dfdl:terminator. If the combination of these is such that the empty representation is zero-length, that is what creates the situation of interest here, where it is ambiguous whether the value is the official empty representation or is the normal representation that just so happens to be of zero length. That is, there's no special significance to the 'none' EVDP property value. For example, if dfdl:emptyValueDelimiterPolicy is 'both', but dfdl:initiator="" and dfdl:terminator="", then that's just as good as dfdl:emptyValueDelimiterPolicy='none' in terms of whatever effect this has on a decision about normal vs. missing. Does this match your understanding? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Wed, Aug 1, 2018 at 9:26 AM, Steve Hanson <smh@uk.ibm.com> wrote:
Whether to add a zero-length string or hexBinary to the infoset for an optional element depends on the setting of emptyValueDelimiterPolicy. A setting of 'none' stops it from being added.
Regardless, it does not give a processing error, so is therefore known-to-exist, and therefore does not cause backtracking, so preserving discriminators and variables.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org Date: 24/07/2018 15:15 Subject: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> ------------------------------
In some situations we parse and get a successful zero-length parse for a string or hexBinary.
But because the occurrence is optional, we do NOT add an element to the infoset.
In that case, what happens to side-effects that occurred during the successful parse. There are two possible kinds of side-effects. Variables can be set, and a discriminator can be set to true.
It seems to me that if a discriminator is set, then that *must* be preserved, and in that case it would seem the variable settings should be retained as well.
Comments?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *www.tresys.com* <http://www.tresys.com> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the *OGF Intellectual Property Policy* <http://www.ogf.org/About/abt_policies.php> -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

9.4.2.2 Simple element (xs:string or xs:hexBinary) Required occurrence: If the element has a default value then an item is added to the infoset using the default value, otherwise an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value. Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is added to the Infoset. Note: To prevent unwanted empty strings or empty hexBinary values from being added to the Infoset, use XSD minLength > '0' and a dfdl:assert that uses the dfdl:checkConstraints() function, to raise a processing error. 9.2.2 Empty Representation An element occurrence has an empty representation if the occurrence does not have a nil representation and it conforms to the grammar for SimpleEmptyElementRep or ComplexEmptyElementRep. Specifically, the EmptyElementInitiator and EmptyElementTerminator regions must be conformant with dfdl:emptyValueDelimiterPolicy and the occurrence's content in the data stream is of length zero. (If non-conformant it is not a processing error and the representation is not empty). LeadingAlignment, TrailingAlignment, PrefixLength regions may be present. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson <smh@uk.ibm.com> Cc: dfdl-wg@ogf.org Date: 01/08/2018 15:13 Subject: Re: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? To follow up then, I have assumed that dfdl:emptyValueDelimiterPolicy isn't even examined unless the element has default="..." i.e., a non-zero-length default value specified. I.e., without a default value, there is no concept of emptiness as distinct from "normal" representation. Are you suggesting that it is also used to control when an empty string (or empty hexbinary) is accepted as a normal representation value for an optional element, vs. treated as a missing value? That's a reasonable interpretation that I would support, but I don't know that the spec says that anywhere, so we need to add a sentence. (Unless I'm missing where this is stated.) I have also thought that dfdl:emptyValueDelimiterPolicy must be combined with dfdl:initiator and dfdl:terminator. If the combination of these is such that the empty representation is zero-length, that is what creates the situation of interest here, where it is ambiguous whether the value is the official empty representation or is the normal representation that just so happens to be of zero length. That is, there's no special significance to the 'none' EVDP property value. For example, if dfdl:emptyValueDelimiterPolicy is 'both', but dfdl:initiator="" and dfdl:terminator="", then that's just as good as dfdl:emptyValueDelimiterPolicy='none' in terms of whatever effect this has on a decision about normal vs. missing. Does this match your understanding? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy On Wed, Aug 1, 2018 at 9:26 AM, Steve Hanson <smh@uk.ibm.com> wrote: Whether to add a zero-length string or hexBinary to the infoset for an optional element depends on the setting of emptyValueDelimiterPolicy. A setting of 'none' stops it from being added. Regardless, it does not give a processing error, so is therefore known-to-exist, and therefore does not cause backtracking, so preserving discriminators and variables. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org Date: 24/07/2018 15:15 Subject: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> In some situations we parse and get a successful zero-length parse for a string or hexBinary. But because the occurrence is optional, we do NOT add an element to the infoset. In that case, what happens to side-effects that occurred during the successful parse. There are two possible kinds of side-effects. Variables can be set, and a discriminator can be set to true. It seems to me that if a discriminator is set, then that *must* be preserved, and in that case it would seem the variable settings should be retained as well. Comments? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

ok, so here's the way I understand this now. if EVDP is 'none', then empty strings have no representation syntax at all. So we don't create optional elements for them period. If EVDP is not 'none', then empty string requires some syntax to be there, and based on this syntax appearing an empty string value is added to the infoset for the optional element. Seems simple in retrospect.... Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Wed, Aug 1, 2018 at 11:07 AM, Steve Hanson <smh@uk.ibm.com> wrote:
*9.4.2.2 Simple element (xs:string or xs:hexBinary)*
*Required occurrence: If the element has a default value then an item is added to the infoset using the default value, otherwise an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value. *
*Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is added to the Infoset. *
*Note: To prevent unwanted empty strings or empty hexBinary values from being added to the Infoset, use XSD minLength > '0' and a dfdl:assert that uses the dfdl:checkConstraints() function, to raise a processing error.*
*9.2.2 Empty Representation*
*An element occurrence has an empty representation if the occurrence does not have a nil representation and it conforms to the grammar for SimpleEmptyElementRep or ComplexEmptyElementRep. Specifically, the EmptyElementInitiator** and EmptyElementTerminator** regions must be conformant with dfdl:emptyValueDelimiterPolicy and the occurrence's content in the data stream is of length zero.** (If non-conformant it is not a processing error and the representation is not empty).** LeadingAlignment, TrailingAlignment, PrefixLength regions may be present. *
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson <smh@uk.ibm.com> Cc: dfdl-wg@ogf.org Date: 01/08/2018 15:13 Subject: Re: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? ------------------------------
To follow up then,
I have assumed that dfdl:emptyValueDelimiterPolicy isn't even examined unless the element has default="..." i.e., a non-zero-length default value specified. I.e., without a default value, there is no concept of emptiness as distinct from "normal" representation.
Are you suggesting that it is also used to control when an empty string (or empty hexbinary) is accepted as a normal representation value for an optional element, vs. treated as a missing value? That's a reasonable interpretation that I would support, but I don't know that the spec says that anywhere, so we need to add a sentence. (Unless I'm missing where this is stated.)
I have also thought that dfdl:emptyValueDelimiterPolicy must be combined with dfdl:initiator and dfdl:terminator. If the combination of these is such that the empty representation is zero-length, that is what creates the situation of interest here, where it is ambiguous whether the value is the official empty representation or is the normal representation that just so happens to be of zero length. That is, there's no special significance to the 'none' EVDP property value.
For example, if dfdl:emptyValueDelimiterPolicy is 'both', but dfdl:initiator="" and dfdl:terminator="", then that's just as good as dfdl: emptyValueDelimiterPolicy='none' in terms of whatever effect this has on a decision about normal vs. missing.
Does this match your understanding?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *www.tresys.com* <http://www.tresys.com> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the *OGF Intellectual Property Policy* <http://www.ogf.org/About/abt_policies.php>
On Wed, Aug 1, 2018 at 9:26 AM, Steve Hanson <*smh@uk.ibm.com* <smh@uk.ibm.com>> wrote: Whether to add a zero-length string or hexBinary to the infoset for an optional element depends on the setting of emptyValueDelimiterPolicy. A setting of 'none' stops it from being added.
Regardless, it does not give a processing error, so is therefore known-to-exist, and therefore does not cause backtracking, so preserving discriminators and variables.
Regards
Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday
From: Mike Beckerle <*mbeckerle.dfdl@gmail.com* <mbeckerle.dfdl@gmail.com>> To: *dfdl-wg@ogf.org* <dfdl-wg@ogf.org> Date: 24/07/2018 15:15 Subject: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? Sent by: "dfdl-wg" <*dfdl-wg-bounces@ogf.org* <dfdl-wg-bounces@ogf.org>> ------------------------------
In some situations we parse and get a successful zero-length parse for a string or hexBinary.
But because the occurrence is optional, we do NOT add an element to the infoset.
In that case, what happens to side-effects that occurred during the successful parse. There are two possible kinds of side-effects. Variables can be set, and a discriminator can be set to true.
It seems to me that if a discriminator is set, then that *must* be preserved, and in that case it would seem the variable settings should be retained as well.
Comments?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *www.tresys.com* <http://www.tresys.com> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the *OGF Intellectual Property Policy* <http://www.ogf.org/About/abt_policies.php> -- dfdl-wg mailing list *dfdl-wg@ogf.org* <dfdl-wg@ogf.org> *https://www.ogf.org/mailman/listinfo/dfdl-wg* <https://www.ogf.org/mailman/listinfo/dfdl-wg>
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

I spoke too soon. I'm good with this: - if EVDP is 'none', then empty strings have no representation syntax at all. So we don't create optional elements for them period. This statement I made is not correct. - If EVDP is not 'none', then empty string requires some syntax to be there, and based on this syntax appearing an empty string value is added to the infoset for the optional element. Suppose EVDP is not none, but initiator and terminator are both "", then the empty string does *not* require syntax to be there. So the above statement is simply wrong. When EVDP is 'both' but neither initiator nor terminator are defined, then since EVDP is not 'none', a zero-length string would still cause an optional element to be added to the infoset. In a separated sequence, this would mean one could control how many empty strings go into optional values by providing more separators. So "a,b,,,,,," would add 2 non-empty and 6 empty strings to the infoset, regardless of whether they are required or optional. If the element is named 'x', has minOccurs "3", default="c" maxOccurs="12" and occursCountKind 'implicit', then the first empty would trigger defaulting, and the data would be <x>a</x><x>b</x><x>c</x><x/><x/><x/><x/><x/> So we would get defaulting the required index locations, but creation of elements with empty string values for optional index locations. This allows us to construct an XSD invalid document. I suppose that is no big deal, there are many ways to construct data by parsing with DFDL where the data proves to be invalid per XSD rules. So we really have 3 cases: 1. EVDP is 'none' 2. EVDP not 'none' but initiator/terminator such that empty representation has no syntax 3. EVDP not 'none' but initator/termiantor such that empty representation DOES have syntax. Case 1 = optional elements never populated from ZL strings Case 2 = optional elements are always populated from ZL strings Case 3 = optional elements are populated from the non-ZL empty representation, optional elements are not populated from ZL strings. Daffodil has heretofore been crushing Cases 1 and 2 together with behavior of Case 1. Does IBM DFDL distinguish all 3 cases? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Wed, Aug 1, 2018 at 2:33 PM, Mike Beckerle <mbeckerle.dfdl@gmail.com> wrote:
ok, so here's the way I understand this now.
if EVDP is 'none', then empty strings have no representation syntax at all. So we don't create optional elements for them period.
If EVDP is not 'none', then empty string requires some syntax to be there, and based on this syntax appearing an empty string value is added to the infoset for the optional element.
Seems simple in retrospect....
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>
On Wed, Aug 1, 2018 at 11:07 AM, Steve Hanson <smh@uk.ibm.com> wrote:
*9.4.2.2 Simple element (xs:string or xs:hexBinary)*
*Required occurrence: If the element has a default value then an item is added to the infoset using the default value, otherwise an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value. *
*Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is added to the Infoset. *
*Note: To prevent unwanted empty strings or empty hexBinary values from being added to the Infoset, use XSD minLength > '0' and a dfdl:assert that uses the dfdl:checkConstraints() function, to raise a processing error.*
*9.2.2 Empty Representation*
*An element occurrence has an empty representation if the occurrence does not have a nil representation and it conforms to the grammar for SimpleEmptyElementRep or ComplexEmptyElementRep. Specifically, the EmptyElementInitiator** and EmptyElementTerminator** regions must be conformant with dfdl:emptyValueDelimiterPolicy and the occurrence's content in the data stream is of length zero.** (If non-conformant it is not a processing error and the representation is not empty).** LeadingAlignment, TrailingAlignment, PrefixLength regions may be present. *
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson <smh@uk.ibm.com> Cc: dfdl-wg@ogf.org Date: 01/08/2018 15:13 Subject: Re: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? ------------------------------
To follow up then,
I have assumed that dfdl:emptyValueDelimiterPolicy isn't even examined unless the element has default="..." i.e., a non-zero-length default value specified. I.e., without a default value, there is no concept of emptiness as distinct from "normal" representation.
Are you suggesting that it is also used to control when an empty string (or empty hexbinary) is accepted as a normal representation value for an optional element, vs. treated as a missing value? That's a reasonable interpretation that I would support, but I don't know that the spec says that anywhere, so we need to add a sentence. (Unless I'm missing where this is stated.)
I have also thought that dfdl:emptyValueDelimiterPolicy must be combined with dfdl:initiator and dfdl:terminator. If the combination of these is such that the empty representation is zero-length, that is what creates the situation of interest here, where it is ambiguous whether the value is the official empty representation or is the normal representation that just so happens to be of zero length. That is, there's no special significance to the 'none' EVDP property value.
For example, if dfdl:emptyValueDelimiterPolicy is 'both', but dfdl:initiator="" and dfdl:terminator="", then that's just as good as dfdl:emptyValueDelimiterPolicy='none' in terms of whatever effect this has on a decision about normal vs. missing.
Does this match your understanding?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *www.tresys.com* <http://www.tresys.com> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the *OGF Intellectual Property Policy* <http://www.ogf.org/About/abt_policies.php>
On Wed, Aug 1, 2018 at 9:26 AM, Steve Hanson <*smh@uk.ibm.com* <smh@uk.ibm.com>> wrote: Whether to add a zero-length string or hexBinary to the infoset for an optional element depends on the setting of emptyValueDelimiterPolicy. A setting of 'none' stops it from being added.
Regardless, it does not give a processing error, so is therefore known-to-exist, and therefore does not cause backtracking, so preserving discriminators and variables.
Regards
Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday
From: Mike Beckerle <*mbeckerle.dfdl@gmail.com* <mbeckerle.dfdl@gmail.com>> To: *dfdl-wg@ogf.org* <dfdl-wg@ogf.org> Date: 24/07/2018 15:15 Subject: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? Sent by: "dfdl-wg" <*dfdl-wg-bounces@ogf.org* <dfdl-wg-bounces@ogf.org>> ------------------------------
In some situations we parse and get a successful zero-length parse for a string or hexBinary.
But because the occurrence is optional, we do NOT add an element to the infoset.
In that case, what happens to side-effects that occurred during the successful parse. There are two possible kinds of side-effects. Variables can be set, and a discriminator can be set to true.
It seems to me that if a discriminator is set, then that *must* be preserved, and in that case it would seem the variable settings should be retained as well.
Comments?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *www.tresys.com* <http://www.tresys.com> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the *OGF Intellectual Property Policy* <http://www.ogf.org/About/abt_policies.php> -- dfdl-wg mailing list *dfdl-wg@ogf.org* <dfdl-wg@ogf.org> *https://www.ogf.org/mailman/listinfo/dfdl-wg* <https://www.ogf.org/mailman/listinfo/dfdl-wg>
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

There are only 2 cases. Section 22 (property precedence) makes it clear that EVDP and NVDP are only ever examined if there is an initiator and/or terminator. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson <smh@uk.ibm.com> Cc: dfdl-wg@ogf.org Date: 01/08/2018 20:53 Subject: Re: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? I spoke too soon. I'm good with this: if EVDP is 'none', then empty strings have no representation syntax at all. So we don't create optional elements for them period. This statement I made is not correct. If EVDP is not 'none', then empty string requires some syntax to be there, and based on this syntax appearing an empty string value is added to the infoset for the optional element. Suppose EVDP is not none, but initiator and terminator are both "", then the empty string does *not* require syntax to be there. So the above statement is simply wrong. When EVDP is 'both' but neither initiator nor terminator are defined, then since EVDP is not 'none', a zero-length string would still cause an optional element to be added to the infoset. In a separated sequence, this would mean one could control how many empty strings go into optional values by providing more separators. So "a,b,,,,,," would add 2 non-empty and 6 empty strings to the infoset, regardless of whether they are required or optional. If the element is named 'x', has minOccurs "3", default="c" maxOccurs="12" and occursCountKind 'implicit', then the first empty would trigger defaulting, and the data would be <x>a</x><x>b</x><x>c</x><x/><x/><x/><x/><x/> So we would get defaulting the required index locations, but creation of elements with empty string values for optional index locations. This allows us to construct an XSD invalid document. I suppose that is no big deal, there are many ways to construct data by parsing with DFDL where the data proves to be invalid per XSD rules. So we really have 3 cases: 1. EVDP is 'none' 2. EVDP not 'none' but initiator/terminator such that empty representation has no syntax 3. EVDP not 'none' but initator/termiantor such that empty representation DOES have syntax. Case 1 = optional elements never populated from ZL strings Case 2 = optional elements are always populated from ZL strings Case 3 = optional elements are populated from the non-ZL empty representation, optional elements are not populated from ZL strings. Daffodil has heretofore been crushing Cases 1 and 2 together with behavior of Case 1. Does IBM DFDL distinguish all 3 cases? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy On Wed, Aug 1, 2018 at 2:33 PM, Mike Beckerle <mbeckerle.dfdl@gmail.com> wrote: ok, so here's the way I understand this now. if EVDP is 'none', then empty strings have no representation syntax at all. So we don't create optional elements for them period. If EVDP is not 'none', then empty string requires some syntax to be there, and based on this syntax appearing an empty string value is added to the infoset for the optional element. Seems simple in retrospect.... Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy On Wed, Aug 1, 2018 at 11:07 AM, Steve Hanson <smh@uk.ibm.com> wrote: 9.4.2.2 Simple element (xs:string or xs:hexBinary) Required occurrence: If the element has a default value then an item is added to the infoset using the default value, otherwise an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value. Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is added to the Infoset. Note: To prevent unwanted empty strings or empty hexBinary values from being added to the Infoset, use XSD minLength > '0' and a dfdl:assert that uses the dfdl:checkConstraints() function, to raise a processing error. 9.2.2 Empty Representation An element occurrence has an empty representation if the occurrence does not have a nil representation and it conforms to the grammar for SimpleEmptyElementRep or ComplexEmptyElementRep. Specifically, the EmptyElementInitiator and EmptyElementTerminator regions must be conformant with dfdl:emptyValueDelimiterPolicy and the occurrence's content in the data stream is of length zero. (If non-conformant it is not a processing error and the representation is not empty). LeadingAlignment, TrailingAlignment, PrefixLength regions may be present. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson <smh@uk.ibm.com> Cc: dfdl-wg@ogf.org Date: 01/08/2018 15:13 Subject: Re: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? To follow up then, I have assumed that dfdl:emptyValueDelimiterPolicy isn't even examined unless the element has default="..." i.e., a non-zero-length default value specified. I.e., without a default value, there is no concept of emptiness as distinct from "normal" representation. Are you suggesting that it is also used to control when an empty string (or empty hexbinary) is accepted as a normal representation value for an optional element, vs. treated as a missing value? That's a reasonable interpretation that I would support, but I don't know that the spec says that anywhere, so we need to add a sentence. (Unless I'm missing where this is stated.) I have also thought that dfdl:emptyValueDelimiterPolicy must be combined with dfdl:initiator and dfdl:terminator. If the combination of these is such that the empty representation is zero-length, that is what creates the situation of interest here, where it is ambiguous whether the value is the official empty representation or is the normal representation that just so happens to be of zero length. That is, there's no special significance to the 'none' EVDP property value. For example, if dfdl:emptyValueDelimiterPolicy is 'both', but dfdl:initiator="" and dfdl:terminator="", then that's just as good as dfdl:emptyValueDelimiterPolicy='none' in terms of whatever effect this has on a decision about normal vs. missing. Does this match your understanding? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy On Wed, Aug 1, 2018 at 9:26 AM, Steve Hanson <smh@uk.ibm.com> wrote: Whether to add a zero-length string or hexBinary to the infoset for an optional element depends on the setting of emptyValueDelimiterPolicy. A setting of 'none' stops it from being added. Regardless, it does not give a processing error, so is therefore known-to-exist, and therefore does not cause backtracking, so preserving discriminators and variables. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org Date: 24/07/2018 15:15 Subject: [DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments? Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> In some situations we parse and get a successful zero-length parse for a string or hexBinary. But because the occurrence is optional, we do NOT add an element to the infoset. In that case, what happens to side-effects that occurred during the successful parse. There are two possible kinds of side-effects. Variables can be set, and a discriminator can be set to true. It seems to me that if a discriminator is set, then that *must* be preserved, and in that case it would seem the variable settings should be retained as well. Comments? Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Mike Beckerle
-
Steve Hanson