Empty element with initiator and terminator

I have an element that can be one of two things, either: < string > Or <> So I modeled this as a choice. For the first choice, I made it a string with < and > being the initiator and terminator. Works great. For the second choice, I also made it a string with < and > being the initiator and terminator, and I set the length kind to explicit and the length to 0. That seems obvious enough. But, alas, that causes errors in parsing. Nothing I do for this seemingly simple construct works. There must be some design pattern here that I am missing. Ideas? TIA

James, if you look in the trace viewer and work back to the element in question, I suspect that you will see an error like: CTDP3138E: An unexpected initiator was found for an empty element DFDL lets you specify what the syntax is for an empty element, using a property called dfdl:emptyValueDelimiterPolicy here. It is probably set to 'none'. If you set it to 'both' then the parser will expect to find an initiator and terminator when the content is empty. Note that eliminates the need for the choice - you just have a single element with lengthKind 'delimited' and emptyValueDelimiterPolicy set appropriately. The current behaviour in IBM DFDL when it finds an empty xs:string is to give an error for a required (minOccurs '1') element, or to not add anything to the infoset for an optional (minOccurs '0') element. However, there is an errata in this area which has just been concluded, which changes this behaviour for xs:string so that a zero-length xs:string is added to the infoset under certain circumstances. So in order to guide you down the right path, I need to know a bit more info. Specifically, what do you want ideally to appear in the infoset for the <> case? A zero length string, or nothing at all, or the special value 'nil', or a default value? And what would you want to appear when the infoset was serialized: <> or is nothing also acceptable? Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: "Garriss Jr., James P." <jgarriss@mitre.org> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Date: 01/03/2013 20:10 Subject: [DFDL-WG] Empty element with initiator and terminator Sent by: dfdl-wg-bounces@ogf.org I have an element that can be one of two things, either: < string > Or <> So I modeled this as a choice. For the first choice, I made it a string with < and > being the initiator and terminator. Works great. For the second choice, I also made it a string with < and > being the initiator and terminator, and I set the length kind to explicit and the length to 0. That seems obvious enough. But, alas, that causes errors in parsing. Nothing I do for this seemingly simple construct works. There must be some design pattern here that I am missing. Ideas? TIA-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Thank you Steve and Tim, your answers are very helpful.
The current behaviour in IBM DFDL when it finds an empty xs:string is to give an error for a required (minOccurs '1') element
That is the behavior I am experiencing now, even with the emptyValueDelimiterPolicy set to none. Mar 5, 2013 9:25:17 AM error: CTDP3059E: Element 'PlainEmail' has minOccurs='1' and no default value but the input document contained only '0' occurrences. That's not the behavior I expect or desire, as <> is completely valid input.
there is an errata in this area which has just been concluded, which changes this behaviour for xs:string so that a zero-length xs:string is added to the infoset under certain circumstances
That would be more appropriate for my situation.
what do you want ideally to appear in the infoset for the <> case? A zero length string, or nothing at all, or the special value 'nil', or a default value?
Hmmm, I think a zero length string gives the correct sense. Nil implies the value is unknown or not applicable, which is not true. Nothing indicates there was no input, which is not true, as <> is a legal value. Default value doesn't make sense.
And what would you want to appear when the infoset was serialized: <> or is nothing also acceptable?
<>, because we need to reconstruct the input. So I guess I know how to model this, I just need to wait for MTBK to be updated to reflect this errata. From: Steve Hanson [mailto:smh@uk.ibm.com] Sent: Monday, March 04, 2013 4:40 AM To: Garriss Jr., James P. Cc: dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org Subject: Re: [DFDL-WG] Empty element with initiator and terminator James, if you look in the trace viewer and work back to the element in question, I suspect that you will see an error like: CTDP3138E: An unexpected initiator was found for an empty element DFDL lets you specify what the syntax is for an empty element, using a property called dfdl:emptyValueDelimiterPolicy here. It is probably set to 'none'. If you set it to 'both' then the parser will expect to find an initiator and terminator when the content is empty. Note that eliminates the need for the choice - you just have a single element with lengthKind 'delimited' and emptyValueDelimiterPolicy set appropriately. The current behaviour in IBM DFDL when it finds an empty xs:string is to give an error for a required (minOccurs '1') element, or to not add anything to the infoset for an optional (minOccurs '0') element. However, there is an errata in this area which has just been concluded, which changes this behaviour for xs:string so that a zero-length xs:string is added to the infoset under certain circumstances. So in order to guide you down the right path, I need to know a bit more info. Specifically, what do you want ideally to appear in the infoset for the <> case? A zero length string, or nothing at all, or the special value 'nil', or a default value? And what would you want to appear when the infoset was serialized: <> or is nothing also acceptable? Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group<http://www.ogf.org/dfdl/> IBM SWG, Hursley, UK smh@uk.ibm.com<mailto:smh@uk.ibm.com> tel:+44-1962-815848 From: "Garriss Jr., James P." <jgarriss@mitre.org<mailto:jgarriss@mitre.org>> To: "dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org>" <dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org>>, Date: 01/03/2013 20:10 Subject: [DFDL-WG] Empty element with initiator and terminator Sent by: dfdl-wg-bounces@ogf.org<mailto:dfdl-wg-bounces@ogf.org> ________________________________ I have an element that can be one of two things, either: < string > Or <> So I modeled this as a choice. For the first choice, I made it a string with < and > being the initiator and terminator. Works great. For the second choice, I also made it a string with < and > being the initiator and terminator, and I set the length kind to explicit and the length to 0. That seems obvious enough. But, alas, that causes errors in parsing. Nothing I do for this seemingly simple construct works. There must be some design pattern here that I am missing. Ideas? TIA-- dfdl-wg mailing list dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org> https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

James From your answers, you need IBM DFDL to implement the errata that sets zero length string in the infoset. That gives you the round tripping you require. I'd like to add this to the next release of IBM DFDL but can't guarantee that it will get done. Here's the specific rules from the errata, the relevant stuff in blue: Empty representation when parsing If empty representation is established when parsing, the possibility of applying a default value arises. Essentially, if a required occurrence of an element has empty representation, then a default value will be applied if present, though there are a couple of variations on this rule. Remember that in order to have established empty representation, the occurrence must be compliant with the emptyValueDelimiterPolicy for the element, and for a complex element the parser must have descended into the type and returned with no unsuppressed processing error. There are three main cases to consider. In what follows the term ‘string’ encompasses both xs:string and xs:hexBinary as these are the two data types for which a zero length (empty) string is valid for the type. This behaviour is independent of occursCountKind. Simple element (non-string) Required occurrence: If a XSD ‘default’ or ‘fixed’ property is specified then an item is added to the Infoset using the value of the property, otherwise nothing is added to the Infoset. (This may cause a subsequent processing error depending on occursCountKind). Optional occurrence: Nothing is added to the Infoset. Simple element (string) Required occurrence: If a XSD ‘default’ or ‘fixed‘ property is specified then an item is added to the infoset using the value of the property, otherwise an item is added to the Infoset using empty string as the value. Optional occurrence: If emptyValueDelimiterPolicy is not ‘none’ then an item is added to the Infoset using empty string as the value, otherwise nothing is added to the Infoset. (To prevent unwanted empty strings from being added to the Infoset, use minLength > ‘0’ and a dfdl:assert that uses the dfdl:checkConstraints() function, to raise a processing error.) Complex element Required occurrence: An item is added to the Infoset. Optional occurrence: If emptyValueDelimiterPolicy is not ‘none’ then an item is added to the Infoset, otherwise nothing is added to the Infoset. For both required and optional occurrences, the Infoset item may also have a child item. A) If the first child element of the complex type is a required simple element, then an empty string or default value will also be added to the Infoset. B) If the first child element of the complex type is a required complex element, then an item is added to the Infoset (which may itself have a child via A) Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: "Garriss Jr., James P." <jgarriss@mitre.org> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Date: 05/03/2013 14:30 Subject: Re: [DFDL-WG] Empty element with initiator and terminator Sent by: dfdl-wg-bounces@ogf.org Thank you Steve and Tim, your answers are very helpful.
The current behaviour in IBM DFDL when it finds an empty xs:string is to give an error for a required (minOccurs '1') element
That is the behavior I am experiencing now, even with the emptyValueDelimiterPolicy set to none. Mar 5, 2013 9:25:17 AM error: CTDP3059E: Element 'PlainEmail' has minOccurs='1' and no default value but the input document contained only '0' occurrences. That’s not the behavior I expect or desire, as <> is completely valid input.
there is an errata in this area which has just been concluded, which changes this behaviour for xs:string so that a zero-length xs:string is added to the infoset under certain circumstances
That would be more appropriate for my situation.
what do you want ideally to appear in the infoset for the <> case? A zero length string, or nothing at all, or the special value 'nil', or a default value?
Hmmm, I think a zero length string gives the correct sense. Nil implies the value is unknown or not applicable, which is not true. Nothing indicates there was no input, which is not true, as <> is a legal value. Default value doesn’t make sense.
And what would you want to appear when the infoset was serialized: <> or is nothing also acceptable?
<>, because we need to reconstruct the input. So I guess I know how to model this, I just need to wait for MTBK to be updated to reflect this errata. From: Steve Hanson [mailto:smh@uk.ibm.com] Sent: Monday, March 04, 2013 4:40 AM To: Garriss Jr., James P. Cc: dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org Subject: Re: [DFDL-WG] Empty element with initiator and terminator James, if you look in the trace viewer and work back to the element in question, I suspect that you will see an error like: CTDP3138E: An unexpected initiator was found for an empty element DFDL lets you specify what the syntax is for an empty element, using a property called dfdl:emptyValueDelimiterPolicy here. It is probably set to 'none'. If you set it to 'both' then the parser will expect to find an initiator and terminator when the content is empty. Note that eliminates the need for the choice - you just have a single element with lengthKind 'delimited' and emptyValueDelimiterPolicy set appropriately. The current behaviour in IBM DFDL when it finds an empty xs:string is to give an error for a required (minOccurs '1') element, or to not add anything to the infoset for an optional (minOccurs '0') element. However, there is an errata in this area which has just been concluded, which changes this behaviour for xs:string so that a zero-length xs:string is added to the infoset under certain circumstances. So in order to guide you down the right path, I need to know a bit more info. Specifically, what do you want ideally to appear in the infoset for the <> case? A zero length string, or nothing at all, or the special value 'nil', or a default value? And what would you want to appear when the infoset was serialized: <> or is nothing also acceptable? Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: "Garriss Jr., James P." <jgarriss@mitre.org> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Date: 01/03/2013 20:10 Subject: [DFDL-WG] Empty element with initiator and terminator Sent by: dfdl-wg-bounces@ogf.org I have an element that can be one of two things, either: < string > Or <> So I modeled this as a choice. For the first choice, I made it a string with < and > being the initiator and terminator. Works great. For the second choice, I also made it a string with < and > being the initiator and terminator, and I set the length kind to explicit and the length to 0. That seems obvious enough. But, alas, that causes errors in parsing. Nothing I do for this seemingly simple construct works. There must be some design pattern here that I am missing. Ideas? TIA-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Garriss Jr., James P.
-
Steve Hanson