DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance)

I'll write this up like an errata, but this is for discussion of whether we believe this is clear and complete. ------------------------------------- *Glossary*: DFDL Statements are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. *Errata*: Locations where DFDL Statements are allowed to appear are extended to also include Global Element Declarations, and on Simple Type Definitions. *Errata*: Clarification about discriminators: Discriminators exclude Assertions even when combining across references. Beyond the stipulation that there can be only one dfdl:discriminator at any annotation point of the DFDL schema, there are further constraints. A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations. A dfdl:discriminator annotation may appear on a group reference or on the model group within the global group definition it refers to. But only one of those places, and similarly, if a discriminator appears in any of those places, then no dfdl:assert annotations may appear in any of those locations. *Errata*: Clarification about the execution order of DFDL Statements when they appear on an element reference or element declaration. DFDL Statement annotations for a given schema component are executed as follows: 1) all relevant DFDL statement annotations are gathered to form a single list which preserves schema-definition order. - For a simpleType definition, the DFDL statement annotations found immediately on it are kept in schema-definition order, and are appended to the end of a list of those from any base simpleType definition. - For an element declaration having simple type, the DFDL statement annotations found immediately on the declaration are appended to the end of the list of those from its simple type. - For an element reference, DFDL statement annotations found immediately on the element reference are appended to the end of the list of those from the global element declaration it references. 2) given the combined list, the annotations are executed as follows: 1. before any parsing of the element, a dfdl:discriminator with testKind="pattern" is executed. 2. if there is no discriminator, then before any parsing of the element, all dfdl:asserts (there could be several) with testKind="pattern" are executed in the order they appear in the list of DFDL statements. 3. The element itself is parsed, or its inputValueCalc property is evaluated to create its value. 4. all newVariableInstance annotations are executed and new variables are placed into scope for the duration of these remaining steps. The statements are executed in the order they appear in the list of DFDL statements. 5. all setVariable annotations are executed. The statements are executed in the order they appear in the list of DFDL statements. 6. if a discriminator is present it is executed 7. if no discriminator is present, then assert annotations can be present, and they are executed. If there are multiple assert annotations the statements are executed in the order they appear in the list of DFDL statements. If the element reference or local element declaration is an array, then this evaluation is repeated for each occurrence of the array. *Discussion: * The above allows the default expressions associated with any statement to refer to the value of the element itself as "." However, there's this anomaly of syntax where things don't seem right: <dfdl:newVariableInstance ref="foo:bar" default="{....some expression ...}"/> creates a new variable for the scope of the entity it annotates. It's clear what this means if this annotation is placed on a sequence or choice. For the children of that sequence/choice the new variable instance is in effect. On a simpleType or element having simple type, similarly it is clear (in that case it's a very local variable, just for the expressions in other newVariableInstance statemsnts, setVariableStatements, and discriminators and assertions). The rub: on an element declaration (or reference) when there is a complex type, its not clear. <element name="foo"> <annotation><appinfo ....> <dfdl:newVariableInstance......> <!-- what can refer to this? --> </appinfo></annotation> <complexType> <sequence> <annotation><appinfo ....> <dfdl:assert>... according to rules above this cannot refer to the newVariableInstance...</dfdl:assert> </appinfo></annotation> .... </sequence> </complexType> </element> The above rules say the newVariableInstance isn't evaluated until AFTER the element is parsed, so the assert down inside on the sequence will NOT see this new variable even tho textually, the newVariableInstance annotation looks like it would be in scope over the sequence. Possible ways to avoid this oddity without messing up evaluation order: allow newVariableInstance only on simpleType, element declarations and element references having simpleType, group references, and sequence/choice. Disallow them on element declarations of complex type or on element references to those. The schema author can always introduce an extra tier of sequence to provide the exact behavior they need, and this otherwise error-prone issue can be avoided.

Thanks for writing this up, I have a couple of comments from thinking about this in parallel, and reading the write-up. - In 2), replace element with object as applies to sequence/choice as well - newVariableInstance should be evaluated before the object is parsed (note: self-axis is not allowed) - setVariable should be evaluated before the object is parsed unless it uses self-axis - setVariable should be evaluated after the object is parsed if it uses self-axis - consider only allowing self-axis in setVariable for simple elements/types - add statement about early evaluation of non-pattern asserts/discriminators if that can be done (as per IBM implementation) Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 29/10/2012 22:54 Subject: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I'll write this up like an errata, but this is for discussion of whether we believe this is clear and complete. ------------------------------------- Glossary: DFDL Statements are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. Errata: Locations where DFDL Statements are allowed to appear are extended to also include Global Element Declarations, and on Simple Type Definitions. Errata: Clarification about discriminators: Discriminators exclude Assertions even when combining across references. Beyond the stipulation that there can be only one dfdl:discriminator at any annotation point of the DFDL schema, there are further constraints. A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations. A dfdl:discriminator annotation may appear on a group reference or on the model group within the global group definition it refers to. But only one of those places, and similarly, if a discriminator appears in any of those places, then no dfdl:assert annotations may appear in any of those locations. Errata: Clarification about the execution order of DFDL Statements when they appear on an element reference or element declaration. DFDL Statement annotations for a given schema component are executed as follows: 1) all relevant DFDL statement annotations are gathered to form a single list which preserves schema-definition order. For a simpleType definition, the DFDL statement annotations found immediately on it are kept in schema-definition order, and are appended to the end of a list of those from any base simpleType definition. For an element declaration having simple type, the DFDL statement annotations found immediately on the declaration are appended to the end of the list of those from its simple type. For an element reference, DFDL statement annotations found immediately on the element reference are appended to the end of the list of those from the global element declaration it references. 2) given the combined list, the annotations are executed as follows: 1. before any parsing of the element, a dfdl:discriminator with testKind="pattern" is executed. 2. if there is no discriminator, then before any parsing of the element, all dfdl:asserts (there could be several) with testKind="pattern" are executed in the order they appear in the list of DFDL statements. 3. The element itself is parsed, or its inputValueCalc property is evaluated to create its value. 4. all newVariableInstance annotations are executed and new variables are placed into scope for the duration of these remaining steps. The statements are executed in the order they appear in the list of DFDL statements. 5. all setVariable annotations are executed. The statements are executed in the order they appear in the list of DFDL statements. 6. if a discriminator is present it is executed 7. if no discriminator is present, then assert annotations can be present, and they are executed. If there are multiple assert annotations the statements are executed in the order they appear in the list of DFDL statements. If the element reference or local element declaration is an array, then this evaluation is repeated for each occurrence of the array. Discussion: The above allows the default expressions associated with any statement to refer to the value of the element itself as "." However, there's this anomaly of syntax where things don't seem right: <dfdl:newVariableInstance ref="foo:bar" default="{....some expression ...}"/> creates a new variable for the scope of the entity it annotates. It's clear what this means if this annotation is placed on a sequence or choice. For the children of that sequence/choice the new variable instance is in effect. On a simpleType or element having simple type, similarly it is clear (in that case it's a very local variable, just for the expressions in other newVariableInstance statemsnts, setVariableStatements, and discriminators and assertions). The rub: on an element declaration (or reference) when there is a complex type, its not clear. <element name="foo"> <annotation><appinfo ....> <dfdl:newVariableInstance......> <!-- what can refer to this? --> </appinfo></annotation> <complexType> <sequence> <annotation><appinfo ....> <dfdl:assert>... according to rules above this cannot refer to the newVariableInstance...</dfdl:assert> </appinfo></annotation> .... </sequence> </complexType> </element> The above rules say the newVariableInstance isn't evaluated until AFTER the element is parsed, so the assert down inside on the sequence will NOT see this new variable even tho textually, the newVariableInstance annotation looks like it would be in scope over the sequence. Possible ways to avoid this oddity without messing up evaluation order: allow newVariableInstance only on simpleType, element declarations and element references having simpleType, group references, and sequence/choice. Disallow them on element declarations of complex type or on element references to those. The schema author can always introduce an extra tier of sequence to provide the exact behavior they need, and this otherwise error-prone issue can be avoided. -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

I agree with most of that. I certainly would want newVariableInstance to be evaluated before a complex element is parsed. The complication with the setVariable rule is that an absolute path may be equivalent to a path which uses the self-axis but it is not possible, in general, to determine this by static analysis of the xsd. Furthermore, one XPath expression can contain zero to many path refs, any or all of which might ( or might not ) start with the self-axis. I can only see two possible solutions to this. - Asserts/discriminators/setVariable are always executed after the component on which they are positioned has been fully parsed. With the exception of asserts/discriminators with test=pattern. - Asserts/discriminators/setVariable have a 'timing' flag that defaults to 'after'. If it is set to 'before' and the expression does not evaluate successfully then it is a schema definition error. This would allow earlier rejection of the wrong branch in the model ( and therefore more efficient parsing ) in cases where the expression only refers to items that have already been parsed. regards, Tim Kimber, DFDL Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 37246742 From: Steve Hanson/UK/IBM@IBMGB To: Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org Date: 30/10/2012 13:12 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org Thanks for writing this up, I have a couple of comments from thinking about this in parallel, and reading the write-up. - In 2), replace element with object as applies to sequence/choice as well - newVariableInstance should be evaluated before the object is parsed (note: self-axis is not allowed) - setVariable should be evaluated before the object is parsed unless it uses self-axis - setVariable should be evaluated after the object is parsed if it uses self-axis - consider only allowing self-axis in setVariable for simple elements/types - add statement about early evaluation of non-pattern asserts/discriminators if that can be done (as per IBM implementation) Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 29/10/2012 22:54 Subject: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I'll write this up like an errata, but this is for discussion of whether we believe this is clear and complete. ------------------------------------- Glossary: DFDL Statements are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. Errata: Locations where DFDL Statements are allowed to appear are extended to also include Global Element Declarations, and on Simple Type Definitions. Errata: Clarification about discriminators: Discriminators exclude Assertions even when combining across references. Beyond the stipulation that there can be only one dfdl:discriminator at any annotation point of the DFDL schema, there are further constraints. A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations. A dfdl:discriminator annotation may appear on a group reference or on the model group within the global group definition it refers to. But only one of those places, and similarly, if a discriminator appears in any of those places, then no dfdl:assert annotations may appear in any of those locations. Errata: Clarification about the execution order of DFDL Statements when they appear on an element reference or element declaration. DFDL Statement annotations for a given schema component are executed as follows: 1) all relevant DFDL statement annotations are gathered to form a single list which preserves schema-definition order. For a simpleType definition, the DFDL statement annotations found immediately on it are kept in schema-definition order, and are appended to the end of a list of those from any base simpleType definition. For an element declaration having simple type, the DFDL statement annotations found immediately on the declaration are appended to the end of the list of those from its simple type. For an element reference, DFDL statement annotations found immediately on the element reference are appended to the end of the list of those from the global element declaration it references. 2) given the combined list, the annotations are executed as follows: 1. before any parsing of the element, a dfdl:discriminator with testKind="pattern" is executed. 2. if there is no discriminator, then before any parsing of the element, all dfdl:asserts (there could be several) with testKind="pattern" are executed in the order they appear in the list of DFDL statements. 3. The element itself is parsed, or its inputValueCalc property is evaluated to create its value. 4. all newVariableInstance annotations are executed and new variables are placed into scope for the duration of these remaining steps. The statements are executed in the order they appear in the list of DFDL statements. 5. all setVariable annotations are executed. The statements are executed in the order they appear in the list of DFDL statements. 6. if a discriminator is present it is executed 7. if no discriminator is present, then assert annotations can be present, and they are executed. If there are multiple assert annotations the statements are executed in the order they appear in the list of DFDL statements. If the element reference or local element declaration is an array, then this evaluation is repeated for each occurrence of the array. Discussion: The above allows the default expressions associated with any statement to refer to the value of the element itself as "." However, there's this anomaly of syntax where things don't seem right: <dfdl:newVariableInstance ref="foo:bar" default="{....some expression ...}"/> creates a new variable for the scope of the entity it annotates. It's clear what this means if this annotation is placed on a sequence or choice. For the children of that sequence/choice the new variable instance is in effect. On a simpleType or element having simple type, similarly it is clear (in that case it's a very local variable, just for the expressions in other newVariableInstance statemsnts, setVariableStatements, and discriminators and assertions). The rub: on an element declaration (or reference) when there is a complex type, its not clear. <element name="foo"> <annotation><appinfo ....> <dfdl:newVariableInstance......> <!-- what can refer to this? --> </appinfo></annotation> <complexType> <sequence> <annotation><appinfo ....> <dfdl:assert>... according to rules above this cannot refer to the newVariableInstance...</dfdl:assert> </appinfo></annotation> .... </sequence> </complexType> </element> The above rules say the newVariableInstance isn't evaluated until AFTER the element is parsed, so the assert down inside on the sequence will NOT see this new variable even tho textually, the newVariableInstance annotation looks like it would be in scope over the sequence. Possible ways to avoid this oddity without messing up evaluation order: allow newVariableInstance only on simpleType, element declarations and element references having simpleType, group references, and sequence/choice. Disallow them on element declarations of complex type or on element references to those. The schema author can always introduce an extra tier of sequence to provide the exact behavior they need, and this otherwise error-prone issue can be avoided. -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations.
I really have an issue with putting relative path on global elements, types in any of the discriminator, asserts etc.. because there is no context available.. the right place is the element reference ... I don't have an issue with specifying value expressions , concrete set of values ( glorified pattern facet) and it should be constrained to types.. this would be in-line with XML Schema spec and I would prefer not to move away from it.. Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Tim Kimber <KIMBERT@uk.ibm.com> To: Steve Hanson <smh@uk.ibm.com>, Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org Date: 10/30/2012 10:03 AM Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I agree with most of that. I certainly would want newVariableInstance to be evaluated before a complex element is parsed. The complication with the setVariable rule is that an absolute path may be equivalent to a path which uses the self-axis but it is not possible, in general, to determine this by static analysis of the xsd. Furthermore, one XPath expression can contain zero to many path refs, any or all of which might ( or might not ) start with the self-axis. I can only see two possible solutions to this. - Asserts/discriminators/setVariable are always executed after the component on which they are positioned has been fully parsed. With the exception of asserts/discriminators with test=pattern. - Asserts/discriminators/setVariable have a 'timing' flag that defaults to 'after'. If it is set to 'before' and the expression does not evaluate successfully then it is a schema definition error. This would allow earlier rejection of the wrong branch in the model ( and therefore more efficient parsing ) in cases where the expression only refers to items that have already been parsed. regards, Tim Kimber, DFDL Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 37246742 From: Steve Hanson/UK/IBM@IBMGB To: Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org Date: 30/10/2012 13:12 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org Thanks for writing this up, I have a couple of comments from thinking about this in parallel, and reading the write-up. - In 2), replace element with object as applies to sequence/choice as well - newVariableInstance should be evaluated before the object is parsed (note: self-axis is not allowed) - setVariable should be evaluated before the object is parsed unless it uses self-axis - setVariable should be evaluated after the object is parsed if it uses self-axis - consider only allowing self-axis in setVariable for simple elements/types - add statement about early evaluation of non-pattern asserts/discriminators if that can be done (as per IBM implementation) Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 29/10/2012 22:54 Subject: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I'll write this up like an errata, but this is for discussion of whether we believe this is clear and complete. ------------------------------------- Glossary: DFDL Statements are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. Errata: Locations where DFDL Statements are allowed to appear are extended to also include Global Element Declarations, and on Simple Type Definitions. Errata: Clarification about discriminators: Discriminators exclude Assertions even when combining across references. Beyond the stipulation that there can be only one dfdl:discriminator at any annotation point of the DFDL schema, there are further constraints. A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations. A dfdl:discriminator annotation may appear on a group reference or on the model group within the global group definition it refers to. But only one of those places, and similarly, if a discriminator appears in any of those places, then no dfdl:assert annotations may appear in any of those locations. Errata: Clarification about the execution order of DFDL Statements when they appear on an element reference or element declaration. DFDL Statement annotations for a given schema component are executed as follows: 1) all relevant DFDL statement annotations are gathered to form a single list which preserves schema-definition order. For a simpleType definition, the DFDL statement annotations found immediately on it are kept in schema-definition order, and are appended to the end of a list of those from any base simpleType definition. For an element declaration having simple type, the DFDL statement annotations found immediately on the declaration are appended to the end of the list of those from its simple type. For an element reference, DFDL statement annotations found immediately on the element reference are appended to the end of the list of those from the global element declaration it references. 2) given the combined list, the annotations are executed as follows: 1. before any parsing of the element, a dfdl:discriminator with testKind="pattern" is executed. 2. if there is no discriminator, then before any parsing of the element, all dfdl:asserts (there could be several) with testKind="pattern" are executed in the order they appear in the list of DFDL statements. 3. The element itself is parsed, or its inputValueCalc property is evaluated to create its value. 4. all newVariableInstance annotations are executed and new variables are placed into scope for the duration of these remaining steps. The statements are executed in the order they appear in the list of DFDL statements. 5. all setVariable annotations are executed. The statements are executed in the order they appear in the list of DFDL statements. 6. if a discriminator is present it is executed 7. if no discriminator is present, then assert annotations can be present, and they are executed. If there are multiple assert annotations the statements are executed in the order they appear in the list of DFDL statements. If the element reference or local element declaration is an array, then this evaluation is repeated for each occurrence of the array. Discussion: The above allows the default expressions associated with any statement to refer to the value of the element itself as "." However, there's this anomaly of syntax where things don't seem right: <dfdl:newVariableInstance ref="foo:bar" default="{....some expression ...}"/> creates a new variable for the scope of the entity it annotates. It's clear what this means if this annotation is placed on a sequence or choice. For the children of that sequence/choice the new variable instance is in effect. On a simpleType or element having simple type, similarly it is clear (in that case it's a very local variable, just for the expressions in other newVariableInstance statemsnts, setVariableStatements, and discriminators and assertions). The rub: on an element declaration (or reference) when there is a complex type, its not clear. <element name="foo"> <annotation><appinfo ....> <dfdl:newVariableInstance......> <!-- what can refer to this? --> </appinfo></annotation> <complexType> <sequence> <annotation><appinfo ....> <dfdl:assert>... according to rules above this cannot refer to the newVariableInstance...</dfdl:assert> </appinfo></annotation> .... </sequence> </complexType> </element> The above rules say the newVariableInstance isn't evaluated until AFTER the element is parsed, so the assert down inside on the sequence will NOT see this new variable even tho textually, the newVariableInstance annotation looks like it would be in scope over the sequence. Possible ways to avoid this oddity without messing up evaluation order: allow newVariableInstance only on simpleType, element declarations and element references having simpleType, group references, and sequence/choice. Disallow them on element declarations of complex type or on element references to those. The schema author can always introduce an extra tier of sequence to provide the exact behavior they need, and this otherwise error-prone issue can be avoided. -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Suman - we could make that a limitation - but why is that any different to putting a relative expression in a dfdl:xxxx property on a global object - we don't disallow that. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Suman Kalia <kalia@ca.ibm.com> To: Tim Kimber/UK/IBM@IBMGB, Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org, Mike Beckerle <mbeckerle.dfdl@gmail.com>, Steve Hanson/UK/IBM@IBMGB Date: 30/10/2012 14:46 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance)
A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations.
I really have an issue with putting relative path on global elements, types in any of the discriminator, asserts etc.. because there is no context available.. the right place is the element reference ... I don't have an issue with specifying value expressions , concrete set of values ( glorified pattern facet) and it should be constrained to types.. this would be in-line with XML Schema spec and I would prefer not to move away from it.. Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Tim Kimber <KIMBERT@uk.ibm.com> To: Steve Hanson <smh@uk.ibm.com>, Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org Date: 10/30/2012 10:03 AM Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I agree with most of that. I certainly would want newVariableInstance to be evaluated before a complex element is parsed. The complication with the setVariable rule is that an absolute path may be equivalent to a path which uses the self-axis but it is not possible, in general, to determine this by static analysis of the xsd. Furthermore, one XPath expression can contain zero to many path refs, any or all of which might ( or might not ) start with the self-axis. I can only see two possible solutions to this. - Asserts/discriminators/setVariable are always executed after the component on which they are positioned has been fully parsed. With the exception of asserts/discriminators with test=pattern. - Asserts/discriminators/setVariable have a 'timing' flag that defaults to 'after'. If it is set to 'before' and the expression does not evaluate successfully then it is a schema definition error. This would allow earlier rejection of the wrong branch in the model ( and therefore more efficient parsing ) in cases where the expression only refers to items that have already been parsed. regards, Tim Kimber, DFDL Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 37246742 From: Steve Hanson/UK/IBM@IBMGB To: Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org Date: 30/10/2012 13:12 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org Thanks for writing this up, I have a couple of comments from thinking about this in parallel, and reading the write-up. - In 2), replace element with object as applies to sequence/choice as well - newVariableInstance should be evaluated before the object is parsed (note: self-axis is not allowed) - setVariable should be evaluated before the object is parsed unless it uses self-axis - setVariable should be evaluated after the object is parsed if it uses self-axis - consider only allowing self-axis in setVariable for simple elements/types - add statement about early evaluation of non-pattern asserts/discriminators if that can be done (as per IBM implementation) Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 29/10/2012 22:54 Subject: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I'll write this up like an errata, but this is for discussion of whether we believe this is clear and complete. ------------------------------------- Glossary: DFDL Statements are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. Errata: Locations where DFDL Statements are allowed to appear are extended to also include Global Element Declarations, and on Simple Type Definitions. Errata: Clarification about discriminators: Discriminators exclude Assertions even when combining across references. Beyond the stipulation that there can be only one dfdl:discriminator at any annotation point of the DFDL schema, there are further constraints. A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations. A dfdl:discriminator annotation may appear on a group reference or on the model group within the global group definition it refers to. But only one of those places, and similarly, if a discriminator appears in any of those places, then no dfdl:assert annotations may appear in any of those locations. Errata: Clarification about the execution order of DFDL Statements when they appear on an element reference or element declaration. DFDL Statement annotations for a given schema component are executed as follows: 1) all relevant DFDL statement annotations are gathered to form a single list which preserves schema-definition order. For a simpleType definition, the DFDL statement annotations found immediately on it are kept in schema-definition order, and are appended to the end of a list of those from any base simpleType definition. For an element declaration having simple type, the DFDL statement annotations found immediately on the declaration are appended to the end of the list of those from its simple type. For an element reference, DFDL statement annotations found immediately on the element reference are appended to the end of the list of those from the global element declaration it references. 2) given the combined list, the annotations are executed as follows: 1. before any parsing of the element, a dfdl:discriminator with testKind="pattern" is executed. 2. if there is no discriminator, then before any parsing of the element, all dfdl:asserts (there could be several) with testKind="pattern" are executed in the order they appear in the list of DFDL statements. 3. The element itself is parsed, or its inputValueCalc property is evaluated to create its value. 4. all newVariableInstance annotations are executed and new variables are placed into scope for the duration of these remaining steps. The statements are executed in the order they appear in the list of DFDL statements. 5. all setVariable annotations are executed. The statements are executed in the order they appear in the list of DFDL statements. 6. if a discriminator is present it is executed 7. if no discriminator is present, then assert annotations can be present, and they are executed. If there are multiple assert annotations the statements are executed in the order they appear in the list of DFDL statements. If the element reference or local element declaration is an array, then this evaluation is repeated for each occurrence of the array. Discussion: The above allows the default expressions associated with any statement to refer to the value of the element itself as "." However, there's this anomaly of syntax where things don't seem right: <dfdl:newVariableInstance ref="foo:bar" default="{....some expression ...}"/> creates a new variable for the scope of the entity it annotates. It's clear what this means if this annotation is placed on a sequence or choice. For the children of that sequence/choice the new variable instance is in effect. On a simpleType or element having simple type, similarly it is clear (in that case it's a very local variable, just for the expressions in other newVariableInstance statemsnts, setVariableStatements, and discriminators and assertions). The rub: on an element declaration (or reference) when there is a complex type, its not clear. <element name="foo"> <annotation><appinfo ....> <dfdl:newVariableInstance......> <!-- what can refer to this? --> </appinfo></annotation> <complexType> <sequence> <annotation><appinfo ....> <dfdl:assert>... according to rules above this cannot refer to the newVariableInstance...</dfdl:assert> </appinfo></annotation> .... </sequence> </complexType> </element> The above rules say the newVariableInstance isn't evaluated until AFTER the element is parsed, so the assert down inside on the sequence will NOT see this new variable even tho textually, the newVariableInstance annotation looks like it would be in scope over the sequence. Possible ways to avoid this oddity without messing up evaluation order: allow newVariableInstance only on simpleType, element declarations and element references having simpleType, group references, and sequence/choice. Disallow them on element declarations of complex type or on element references to those. The schema author can always introduce an extra tier of sequence to provide the exact behavior they need, and this otherwise error-prone issue can be avoided. -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Steve - For the same reasons I mentioned, I think a relative expression in dfdl:xxx property on global object should not be supported.. we should flag a validation error.. Again there is no context available to build that expression.. Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Steve Hanson <smh@uk.ibm.com> To: Suman Kalia/Toronto/IBM@IBMCA, Cc: dfdl-wg@ogf.org, Mike Beckerle <mbeckerle.dfdl@gmail.com>, Tim Kimber <KIMBERT@uk.ibm.com> Date: 10/30/2012 11:01 AM Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Suman - we could make that a limitation - but why is that any different to putting a relative expression in a dfdl:xxxx property on a global object - we don't disallow that. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Suman Kalia <kalia@ca.ibm.com> To: Tim Kimber/UK/IBM@IBMGB, Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org, Mike Beckerle <mbeckerle.dfdl@gmail.com>, Steve Hanson/UK/IBM@IBMGB Date: 30/10/2012 14:46 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance)
A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations.
I really have an issue with putting relative path on global elements, types in any of the discriminator, asserts etc.. because there is no context available.. the right place is the element reference ... I don't have an issue with specifying value expressions , concrete set of values ( glorified pattern facet) and it should be constrained to types.. this would be in-line with XML Schema spec and I would prefer not to move away from it.. Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Tim Kimber <KIMBERT@uk.ibm.com> To: Steve Hanson <smh@uk.ibm.com>, Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org Date: 10/30/2012 10:03 AM Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I agree with most of that. I certainly would want newVariableInstance to be evaluated before a complex element is parsed. The complication with the setVariable rule is that an absolute path may be equivalent to a path which uses the self-axis but it is not possible, in general, to determine this by static analysis of the xsd. Furthermore, one XPath expression can contain zero to many path refs, any or all of which might ( or might not ) start with the self-axis. I can only see two possible solutions to this. - Asserts/discriminators/setVariable are always executed after the component on which they are positioned has been fully parsed. With the exception of asserts/discriminators with test=pattern. - Asserts/discriminators/setVariable have a 'timing' flag that defaults to 'after'. If it is set to 'before' and the expression does not evaluate successfully then it is a schema definition error. This would allow earlier rejection of the wrong branch in the model ( and therefore more efficient parsing ) in cases where the expression only refers to items that have already been parsed. regards, Tim Kimber, DFDL Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 37246742 From: Steve Hanson/UK/IBM@IBMGB To: Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org Date: 30/10/2012 13:12 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org Thanks for writing this up, I have a couple of comments from thinking about this in parallel, and reading the write-up. - In 2), replace element with object as applies to sequence/choice as well - newVariableInstance should be evaluated before the object is parsed (note: self-axis is not allowed) - setVariable should be evaluated before the object is parsed unless it uses self-axis - setVariable should be evaluated after the object is parsed if it uses self-axis - consider only allowing self-axis in setVariable for simple elements/types - add statement about early evaluation of non-pattern asserts/discriminators if that can be done (as per IBM implementation) Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 29/10/2012 22:54 Subject: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I'll write this up like an errata, but this is for discussion of whether we believe this is clear and complete. ------------------------------------- Glossary: DFDL Statements are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. Errata: Locations where DFDL Statements are allowed to appear are extended to also include Global Element Declarations, and on Simple Type Definitions. Errata: Clarification about discriminators: Discriminators exclude Assertions even when combining across references. Beyond the stipulation that there can be only one dfdl:discriminator at any annotation point of the DFDL schema, there are further constraints. A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations. A dfdl:discriminator annotation may appear on a group reference or on the model group within the global group definition it refers to. But only one of those places, and similarly, if a discriminator appears in any of those places, then no dfdl:assert annotations may appear in any of those locations. Errata: Clarification about the execution order of DFDL Statements when they appear on an element reference or element declaration. DFDL Statement annotations for a given schema component are executed as follows: 1) all relevant DFDL statement annotations are gathered to form a single list which preserves schema-definition order. For a simpleType definition, the DFDL statement annotations found immediately on it are kept in schema-definition order, and are appended to the end of a list of those from any base simpleType definition. For an element declaration having simple type, the DFDL statement annotations found immediately on the declaration are appended to the end of the list of those from its simple type. For an element reference, DFDL statement annotations found immediately on the element reference are appended to the end of the list of those from the global element declaration it references. 2) given the combined list, the annotations are executed as follows: 1. before any parsing of the element, a dfdl:discriminator with testKind="pattern" is executed. 2. if there is no discriminator, then before any parsing of the element, all dfdl:asserts (there could be several) with testKind="pattern" are executed in the order they appear in the list of DFDL statements. 3. The element itself is parsed, or its inputValueCalc property is evaluated to create its value. 4. all newVariableInstance annotations are executed and new variables are placed into scope for the duration of these remaining steps. The statements are executed in the order they appear in the list of DFDL statements. 5. all setVariable annotations are executed. The statements are executed in the order they appear in the list of DFDL statements. 6. if a discriminator is present it is executed 7. if no discriminator is present, then assert annotations can be present, and they are executed. If there are multiple assert annotations the statements are executed in the order they appear in the list of DFDL statements. If the element reference or local element declaration is an array, then this evaluation is repeated for each occurrence of the array. Discussion: The above allows the default expressions associated with any statement to refer to the value of the element itself as "." However, there's this anomaly of syntax where things don't seem right: <dfdl:newVariableInstance ref="foo:bar" default="{....some expression ...}"/> creates a new variable for the scope of the entity it annotates. It's clear what this means if this annotation is placed on a sequence or choice. For the children of that sequence/choice the new variable instance is in effect. On a simpleType or element having simple type, similarly it is clear (in that case it's a very local variable, just for the expressions in other newVariableInstance statemsnts, setVariableStatements, and discriminators and assertions). The rub: on an element declaration (or reference) when there is a complex type, its not clear. <element name="foo"> <annotation><appinfo ....> <dfdl:newVariableInstance......> <!-- what can refer to this? --> </appinfo></annotation> <complexType> <sequence> <annotation><appinfo ....> <dfdl:assert>... according to rules above this cannot refer to the newVariableInstance...</dfdl:assert> </appinfo></annotation> .... </sequence> </complexType> </element> The above rules say the newVariableInstance isn't evaluated until AFTER the element is parsed, so the assert down inside on the sequence will NOT see this new variable even tho textually, the newVariableInstance annotation looks like it would be in scope over the sequence. Possible ways to avoid this oddity without messing up evaluation order: allow newVariableInstance only on simpleType, element declarations and element references having simpleType, group references, and sequence/choice. Disallow them on element declarations of complex type or on element references to those. The schema author can always introduce an extra tier of sequence to provide the exact behavior they need, and this otherwise error-prone issue can be avoided. -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

That means I can try and refactor a local element into an element ref and global element and get new errors. I don't think that is acceptable. Plus your suggestion does not close the hole - because a local object can use a relative expression that reaches out of its global container. Plus IBM DFDL already allows relative expressions on global objects so we can't withdraw the behaviour. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Suman Kalia <kalia@ca.ibm.com> To: Steve Hanson/UK/IBM@IBMGB, Cc: dfdl-wg@ogf.org Date: 30/10/2012 15:09 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org Steve - For the same reasons I mentioned, I think a relative expression in dfdl:xxx property on global object should not be supported.. we should flag a validation error.. Again there is no context available to build that expression.. Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Steve Hanson <smh@uk.ibm.com> To: Suman Kalia/Toronto/IBM@IBMCA, Cc: dfdl-wg@ogf.org, Mike Beckerle <mbeckerle.dfdl@gmail.com>, Tim Kimber <KIMBERT@uk.ibm.com> Date: 10/30/2012 11:01 AM Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Suman - we could make that a limitation - but why is that any different to putting a relative expression in a dfdl:xxxx property on a global object - we don't disallow that. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Suman Kalia <kalia@ca.ibm.com> To: Tim Kimber/UK/IBM@IBMGB, Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org, Mike Beckerle <mbeckerle.dfdl@gmail.com>, Steve Hanson/UK/IBM@IBMGB Date: 30/10/2012 14:46 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance)
A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations.
I really have an issue with putting relative path on global elements, types in any of the discriminator, asserts etc.. because there is no context available.. the right place is the element reference ... I don't have an issue with specifying value expressions , concrete set of values ( glorified pattern facet) and it should be constrained to types.. this would be in-line with XML Schema spec and I would prefer not to move away from it.. Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Tim Kimber <KIMBERT@uk.ibm.com> To: Steve Hanson <smh@uk.ibm.com>, Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org Date: 10/30/2012 10:03 AM Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I agree with most of that. I certainly would want newVariableInstance to be evaluated before a complex element is parsed. The complication with the setVariable rule is that an absolute path may be equivalent to a path which uses the self-axis but it is not possible, in general, to determine this by static analysis of the xsd. Furthermore, one XPath expression can contain zero to many path refs, any or all of which might ( or might not ) start with the self-axis. I can only see two possible solutions to this. - Asserts/discriminators/setVariable are always executed after the component on which they are positioned has been fully parsed. With the exception of asserts/discriminators with test=pattern. - Asserts/discriminators/setVariable have a 'timing' flag that defaults to 'after'. If it is set to 'before' and the expression does not evaluate successfully then it is a schema definition error. This would allow earlier rejection of the wrong branch in the model ( and therefore more efficient parsing ) in cases where the expression only refers to items that have already been parsed. regards, Tim Kimber, DFDL Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 37246742 From: Steve Hanson/UK/IBM@IBMGB To: Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org Date: 30/10/2012 13:12 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org Thanks for writing this up, I have a couple of comments from thinking about this in parallel, and reading the write-up. - In 2), replace element with object as applies to sequence/choice as well - newVariableInstance should be evaluated before the object is parsed (note: self-axis is not allowed) - setVariable should be evaluated before the object is parsed unless it uses self-axis - setVariable should be evaluated after the object is parsed if it uses self-axis - consider only allowing self-axis in setVariable for simple elements/types - add statement about early evaluation of non-pattern asserts/discriminators if that can be done (as per IBM implementation) Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 29/10/2012 22:54 Subject: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org I'll write this up like an errata, but this is for discussion of whether we believe this is clear and complete. ------------------------------------- Glossary: DFDL Statements are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. Errata: Locations where DFDL Statements are allowed to appear are extended to also include Global Element Declarations, and on Simple Type Definitions. Errata: Clarification about discriminators: Discriminators exclude Assertions even when combining across references. Beyond the stipulation that there can be only one dfdl:discriminator at any annotation point of the DFDL schema, there are further constraints. A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations. A dfdl:discriminator annotation may appear on a group reference or on the model group within the global group definition it refers to. But only one of those places, and similarly, if a discriminator appears in any of those places, then no dfdl:assert annotations may appear in any of those locations. Errata: Clarification about the execution order of DFDL Statements when they appear on an element reference or element declaration. DFDL Statement annotations for a given schema component are executed as follows: 1) all relevant DFDL statement annotations are gathered to form a single list which preserves schema-definition order. For a simpleType definition, the DFDL statement annotations found immediately on it are kept in schema-definition order, and are appended to the end of a list of those from any base simpleType definition. For an element declaration having simple type, the DFDL statement annotations found immediately on the declaration are appended to the end of the list of those from its simple type. For an element reference, DFDL statement annotations found immediately on the element reference are appended to the end of the list of those from the global element declaration it references. 2) given the combined list, the annotations are executed as follows: 1. before any parsing of the element, a dfdl:discriminator with testKind="pattern" is executed. 2. if there is no discriminator, then before any parsing of the element, all dfdl:asserts (there could be several) with testKind="pattern" are executed in the order they appear in the list of DFDL statements. 3. The element itself is parsed, or its inputValueCalc property is evaluated to create its value. 4. all newVariableInstance annotations are executed and new variables are placed into scope for the duration of these remaining steps. The statements are executed in the order they appear in the list of DFDL statements. 5. all setVariable annotations are executed. The statements are executed in the order they appear in the list of DFDL statements. 6. if a discriminator is present it is executed 7. if no discriminator is present, then assert annotations can be present, and they are executed. If there are multiple assert annotations the statements are executed in the order they appear in the list of DFDL statements. If the element reference or local element declaration is an array, then this evaluation is repeated for each occurrence of the array. Discussion: The above allows the default expressions associated with any statement to refer to the value of the element itself as "." However, there's this anomaly of syntax where things don't seem right: <dfdl:newVariableInstance ref="foo:bar" default="{....some expression ...}"/> creates a new variable for the scope of the entity it annotates. It's clear what this means if this annotation is placed on a sequence or choice. For the children of that sequence/choice the new variable instance is in effect. On a simpleType or element having simple type, similarly it is clear (in that case it's a very local variable, just for the expressions in other newVariableInstance statemsnts, setVariableStatements, and discriminators and assertions). The rub: on an element declaration (or reference) when there is a complex type, its not clear. <element name="foo"> <annotation><appinfo ....> <dfdl:newVariableInstance......> <!-- what can refer to this? --> </appinfo></annotation> <complexType> <sequence> <annotation><appinfo ....> <dfdl:assert>... according to rules above this cannot refer to the newVariableInstance...</dfdl:assert> </appinfo></annotation> .... </sequence> </complexType> </element> The above rules say the newVariableInstance isn't evaluated until AFTER the element is parsed, so the assert down inside on the sequence will NOT see this new variable even tho textually, the newVariableInstance annotation looks like it would be in scope over the sequence. Possible ways to avoid this oddity without messing up evaluation order: allow newVariableInstance only on simpleType, element declarations and element references having simpleType, group references, and sequence/choice. Disallow them on element declarations of complex type or on element references to those. The schema author can always introduce an extra tier of sequence to provide the exact behavior they need, and this otherwise error-prone issue can be avoided. -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Revision 1 (based on discussion on WG call 2012-10-30) Principles: disallow statements except where their scope and timing are clear and where the timing is easy to understand from the way it appears textually in the schema document. See changes in RED. On Mon, Oct 29, 2012 at 6:54 PM, Mike Beckerle <mbeckerle.dfdl@gmail.com>wrote:
I'll write this up like an errata, but this is for discussion of whether we believe this is clear and complete.
-------------------------------------
*Glossary*: DFDL Statements are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance.
*Errata*: Locations where dfdl:assert, dfdl:discriminator and dfdl:setVariable are allowed to appear are extended to also include Global Element Declarations for elements of simple type, and on Simple Type Definitions.
Errata: dfdl:newVariableInstance, may appear only as an annotation on a sequence or choice. Errata: dfdl:setVariable, dfdl:assert, dfdl:discriminator may appear only as an annotation on a sequence, choice, or a simpleType definition, or an element declaration/reference having simpleType. Note this removes ability for complex typed element decls/refs carry any DFDL Statement annotations including asserts/discriminators. (I'm trying to get really minimal here. Not even assert on complexType elements.)
*Errata*: Clarification about discriminators: Discriminators exclude Assertions even when combining across references.
Beyond the stipulation that there can be only one dfdl:discriminator at any annotation point of the DFDL schema, there are further constraints.
A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations.
A dfdl:discriminator annotation may appear on a group reference or on the model group within the global group definition it refers to. But only one of those places, and similarly, if a discriminator appears in any of those places, then no dfdl:assert annotations may appear in any of those locations.
(TBD: constraints that you can't have multiple setVariable statements of the same variable in these places either, just as you can't have multiple setVariables of the same variable at one annotation point.)
*Errata*: Clarification about the execution order of DFDL Statements when they appear on an element reference or element declaration.
DFDL Statement annotations for a given element are executed as follows: (Keep in mind that this element will have simpleType, as complexType elements cannot carry statement annotations at all.)
1) all relevant DFDL statement annotations are gathered to form a single list which preserves schema-definition order.
- For a simpleType definition, the DFDL statement annotations found immediately on it are kept in schema-definition order, and are appended to the end of a list of those from any base simpleType definition. - For an element declaration having simple type, the DFDL statement annotations found immediately on the declaration are appended to the end of the list of those from its simple type. - For an element reference, DFDL statement annotations found immediately on the element reference are appended to the end of the list of those from the global element declaration it references.
2) given the combined list, the annotations are executed as follows:
1. before any parsing of the element, a dfdl:discriminator with testKind="pattern" is executed. 2. if there is no discriminator, then all dfdl:asserts (there could be several) with testKind="pattern" are executed in the order they appear in the list of DFDL statements. 3. Any properties having runtime evaluation are evaluated. (e.g., delimiters with expressions) 4. The element itself is parsed, or its inputValueCalc property is evaluated to create its value. 5. *REMOVED: no longer allowed on elements at all: all newVariableInstance annotations are executed and new variables are placed into scope for the duration of these remaining steps. The statements are executed in the order they appear in the list of DFDL statements.* 6. all setVariable annotations are executed. The statements are executed in the order they appear in the list of DFDL statements. 7. if a discriminator is present it is executed 8. if no discriminator is present, then assert annotations can be present, and they are executed. If there are multiple assert annotations the statements are executed in the order they appear in the list of DFDL statements.
If the element reference or local element declaration is an array, then this evaluation is repeated for each occurrence of the array. **
A DFDL implementation that wishes to optimize is free to analyze the expressions used, and evaluate them sooner so long as the behavior is equivalent to the above description. *Discussion/Illustration: *(this is all revised, so I'm switching back to black ink) Suppose you have this situation: <sequence> ... <element ref="foo"/> <!-- I want to add DFDL statement annotations before and after this --> ... </sequence> To add them before, so they are scoped over or visible to the parsing of the entire foo element: <sequence> ... <sequence> <!-- inserted sequence --> <annotation><appinfo ...> <dfdl:newVariableInstance ref="myVar" default="{...}"/> <dfdl:setVariable ref="myOtherVar" value="{...}"/> </appinfo></annotation> <element ref="foo"/> <!-- I want to add DFDL statement annotations before this --> </sequence> ... </sequence> To add them after so they can reference downward into that element: <sequence> ... <element ref="foo"/> <!-- I want to add DFDL statement annotations after this --> <sequence> <!-- inserted sequence --> <annotation><appinfo ...> <dfdl:assert>{ foo/bar/baz.... }</dfdl:assert> <dfdl:setVariable ref="yetAnotherVar" value="{ foo/bar[2]/baz + 1 }"/> </appinfo></annotation> </sequence> ... </sequence> The above illustration is about complex type elements, but note that the timing issue really doesn't care whether the element/element-ref had simple or complex type. You can perfectly control when evaluation occurs relative to the element itself. The timing is then clear (to me anyway). The optimization opportunity to hoist the assert for earlier evaluation is clear, but it's also clear that the assert CAN look into foo how it wishes to make its decision. If you put the annotation directly on the element (which then must be of simple type), then it is exactly equivalent to putting it in a sequence AFTER. (Modulo replacing "." in expressions with "foo")

Mike IBM DFDL already supports asserts and discriminators on complex elements, so that must remain. There's a clear use case for this - a choice with branches that are element refs to complex global elements. Discrimination is possible at the time the choice is processed. You would put a discriminator on the element refs. Also, asserts and discriminators are intended to be the equivalent of WTX component rules which are allowed on complex elements. The last point about WTX made me realise why we had disallowed asserts and discriminators on global elements. In xsd terms they are associated with a particle. If we are going to allow them on global elements then we need to be clear that this is no longer the case. Agree that only one setVariable annotation for a given variable can exist when annotations are combined from multiple objects. Same for newVariableInstance. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 30/10/2012 18:26 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org Revision 1 (based on discussion on WG call 2012-10-30) Principles: disallow statements except where their scope and timing are clear and where the timing is easy to understand from the way it appears textually in the schema document See changes in RED. On Mon, Oct 29, 2012 at 6:54 PM, Mike Beckerle <mbeckerle.dfdl@gmail.com> wrote: I'll write this up like an errata, but this is for discussion of whether we believe this is clear and complete. ------------------------------------- Glossary: DFDL Statements are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. Errata: Locations where dfdl:assert, dfdl:discriminator and dfdl:setVariable are allowed to appear are extended to also include Global Element Declarations for elements of simple type, and on Simple Type Definitions. Errata: dfdl:newVariableInstance, may appear only as an annotation on a sequence or choice. Errata: dfdl:setVariable, dfdl:assert, dfdl:discriminator may appear only as an annotation on a sequence, choice, or a simpleType definition, or an element declaration/reference having simpleType. Note this removes ability for complex typed element decls/refs carry any DFDL Statement annotations including asserts/discriminators. (I'm trying to get really minimal here. Not even assert on complexType elements.) Errata: Clarification about discriminators: Discriminators exclude Assertions even when combining across references. Beyond the stipulation that there can be only one dfdl:discriminator at any annotation point of the DFDL schema, there are further constraints. A single dfdl:discriminator annotation may appear on an element reference, or on the global element declaration it refers to, or on the simple type appearing immediately within or referenced from the global element declaration. But only one of those places. In addition, if a discriminator occupies one of those places, then no dfdl:assert annotations may appear in any of those locations. A dfdl:discriminator annotation may appear on a group reference or on the model group within the global group definition it refers to. But only one of those places, and similarly, if a discriminator appears in any of those places, then no dfdl:assert annotations may appear in any of those locations. (TBD: constraints that you can't have multiple setVariable statements of the same variable in these places either, just as you can't have multiple setVariables of the same variable at one annotation point.) Errata: Clarification about the execution order of DFDL Statements when they appear on an element reference or element declaration. DFDL Statement annotations for a given element are executed as follows: (Keep in mind that this element will have simpleType, as complexType elements cannot carry statement annotations at all.) 1) all relevant DFDL statement annotations are gathered to form a single list which preserves schema-definition order. For a simpleType definition, the DFDL statement annotations found immediately on it are kept in schema-definition order, and are appended to the end of a list of those from any base simpleType definition. For an element declaration having simple type, the DFDL statement annotations found immediately on the declaration are appended to the end of the list of those from its simple type. For an element reference, DFDL statement annotations found immediately on the element reference are appended to the end of the list of those from the global element declaration it references. 2) given the combined list, the annotations are executed as follows: 1. before any parsing of the element, a dfdl:discriminator with testKind="pattern" is executed. 2. if there is no discriminator, then all dfdl:asserts (there could be several) with testKind="pattern" are executed in the order they appear in the list of DFDL statements. 3. Any properties having runtime evaluation are evaluated. (e.g., delimiters with expressions) 4. The element itself is parsed, or its inputValueCalc property is evaluated to create its value. 5. REMOVED: no longer allowed on elements at all: all newVariableInstance annotations are executed and new variables are placed into scope for the duration of these remaining steps. The statements are executed in the order they appear in the list of DFDL statements. 6. all setVariable annotations are executed. The statements are executed in the order they appear in the list of DFDL statements. 7. if a discriminator is present it is executed 8. if no discriminator is present, then assert annotations can be present, and they are executed. If there are multiple assert annotations the statements are executed in the order they appear in the list of DFDL statements. If the element reference or local element declaration is an array, then this evaluation is repeated for each occurrence of the array. A DFDL implementation that wishes to optimize is free to analyze the expressions used, and evaluate them sooner so long as the behavior is equivalent to the above description. Discussion/Illustration: (this is all revised, so I'm switching back to black ink) Suppose you have this situation: <sequence> ... <element ref="foo"/> <!-- I want to add DFDL statement annotations before and after this --> ... </sequence> To add them before, so they are scoped over or visible to the parsing of the entire foo element: <sequence> ... <sequence> <!-- inserted sequence --> <annotation><appinfo ...> <dfdl:newVariableInstance ref="myVar" default="{...}"/> <dfdl:setVariable ref="myOtherVar" value="{...}"/> </appinfo></annotation> <element ref="foo"/> <!-- I want to add DFDL statement annotations before this --> </sequence> ... </sequence> To add them after so they can reference downward into that element: <sequence> ... <element ref="foo"/> <!-- I want to add DFDL statement annotations after this --> <sequence> <!-- inserted sequence --> <annotation><appinfo ...> <dfdl:assert>{ foo/bar/baz.... }</dfdl:assert> <dfdl:setVariable ref="yetAnotherVar" value="{ foo/bar[2]/baz + 1 }"/> </appinfo></annotation> </sequence> ... </sequence> The above illustration is about complex type elements, but note that the timing issue really doesn't care whether the element/element-ref had simple or complex type. You can perfectly control when evaluation occurs relative to the element itself. The timing is then clear (to me anyway). The optimization opportunity to hoist the assert for earlier evaluation is clear, but it's also clear that the assert CAN look into foo how it wishes to make its decision. If you put the annotation directly on the element (which then must be of simple type), then it is exactly equivalent to putting it in a sequence AFTER. (Modulo replacing "." in expressions with "foo") -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Revision 2 per workgroup call on 2012-10-31 This is a rewrite, not a set of edits. --------------------------------------------- *Clarification:* At any single annotation point of the schema, there can be only one format annotation (dfdl:format, dfdl:element, dfdl:sequence, dfdl:choice, dfdl:group, dfdl:simpleType). *Glossary*: DFDL Statement annotations, or just DFDL Statements, are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. *Glossary*: *Combined annotations*: When annotations are combined between a group reference and the sequence or choice of the referenced global group, or among an element reference, an element declaration, and its type definition, the combined set of is referred to as the *combined annotations*. *DFDL Statement Annotation Placement* dfdl:assert and dfdl:discriminator can be placed as annotations on sequence, choice, group references, local and global element declarations, element references, and simple type definitions. dfdl:setVariable may be placed as an annotation on sequence, choice, group references, local and global element declarations for elements of simple type, element references to elements of simple type, and simple type definitions. dfdl:newVariableInstance can be placed as an annotation on sequence, choice, and group references. The combined annotations for any schema component can contain only a single dfdl:discriminator, or any number of dfdl:assert statements, but not both asserts and a discriminator. It is a schema definition error otherwise. The combined annotations for any schema component can contain multiple dfdl:setVariable annotations, but they must each refer to a different variable. It is a schema definition error otherwise. The combined annotations for any schema component can contain multiple dfdl:newVariableInstance annotations, but they must each refer to a different variable. It is a schema definition error otherwise. *Evaluation Order for Stateme**nt Annotations* *Assertions Before*: dfdl:discriminator or dfdl:assert with testKind='pattern' are executed before parsing the annotated construct. Note that the pattern is used to match against the entire representation of the component; hence, the framing (including initiators, etc.) are all visible to the pattern. The dfdl:encoding property is used when decoding the data to characters before matching. It is a schema definition error if alignment is not 1 and a dfdl:discriminator or dfdl:assert with testKind='pattern' is used. (TBD: restrictions on lengthKind='prefixed' as well? Any other framing-based incompatibilities? where assertions with testKind='pattern' are really incompatible?) If there are multiple dfdl:assert statements with testKind='pattern' theorder of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. *Assertions After:* dfdl:discriminator or dfdl:assert with testKind='expression' (the default) are executed after parsing the annotated construct. Furthermore, an attempt to evaluate a discriminator must be made even if the parse of the annotated construct ended in a parse error. This is because a discriminator could evaluate to true thereby resolving a point of uncertainty even if the complete parsing of the construct ultimately caused a parse error. Such discriminator evaluation has access to the DFDL Infoset of the attempted parse as it existed immediately before detecting the parse failure. Implementations are free to optimize by recognizing and executing discriminators or assertions earlier so long as the resulting behavior is consistent with what results from the above description. If there are multiple dfdl:assert statements with testKind='expression', then the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. *The dfdl:newVariableInstance Statement* These statements are evaluated before the parsing of the annotated construct. When there is more than one newVariableInstance statement the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. All dfdl:newVariableInstance statements are executed before any dfdl:setVariable statements on the same annotated construct. *The dfdl:setVariable Statement* When a dfdl:setVariable annotation is found on an element reference, element declaration, or simple type definition, then it is executed after the parsing of the element, which implies after the evaluation of expressions corresponding to any computed format properties. That is, if an expression is used to provide the value of a format property such as dfdl:terminator, the evaluation of that expression occurs before any dfdl:setVariable annotation is executed; hence, the expression providing the value of the format property may not reference the variable. When a dfdl:setVariable annotation is found in the combined set of annotations for a sequence, choice, or group reference, then it is executed after any dfdl:newVariableInstance statements in that same combined set, but it is executed before the parsing of the sequence, choice, or group reference. If there are multiple dfdl:setVariable statements in one combined set of annotations, then the order of evaluation among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely.

Mike thanks for writing this up, I think we are close. Comments in-line. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 31/10/2012 19:08 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Sent by: dfdl-wg-bounces@ogf.org Revision 2 per workgroup call on 2012-10-31 This is a rewrite, not a set of edits. --------------------------------------------- Clarification: At any single annotation point of the schema, there can be only one format annotation (dfdl:format, dfdl:element, dfdl:sequence, dfdl:choice, dfdl:group, dfdl:simpleType). Glossary: DFDL Statement annotations, or just DFDL Statements, are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. SMH: Nice idea. What about dfdl:defineVariable, is that a statement annotation too? Where does that leave dfdl:defineFormat and dfdl:defineEscapeScheme - they are not format annotations (that's their content). Do we have 'global', 'statement' and 'format' annotations? Glossary: Combined annotations: When annotations are combined between a group reference and the sequence or choice of the referenced global group, or among an element reference, an element declaration, and its type definition, the combined set of is referred to as the combined annotations . DFDL Statement Annotation Placement dfdl:assert and dfdl:discriminator can be placed as annotations on sequence, choice, group references, local and global element declarations, element references, and simple type definitions. dfdl:setVariable may be placed as an annotation on sequence, choice, group references, local and global element declarations for elements of simple type, element references to elements of simple type, and simple type definitions. dfdl:newVariableInstance can be placed as an annotation on sequence, choice, and group references. The combined annotations for any schema component can contain only a single dfdl:discriminator, or any number of dfdl:assert statements, but not both asserts and a discriminator. It is a schema definition error otherwise. The combined annotations for any schema component can contain multiple dfdl:setVariable annotations, but they must each refer to a different variable. It is a schema definition error otherwise. The combined annotations for any schema component can contain multiple dfdl:newVariableInstance annotations, but they must each refer to a different variable. It is a schema definition error otherwise. Evaluation Order for Statement Annotations Assertions Before: dfdl:discriminator or dfdl:assert with testKind='pattern' are executed before parsing the annotated construct. SMH: Wording needs to cater for combined annotations. Note that the pattern is used to match against the entire representation of the component; hence, the framing (including initiators, etc.) are all visible to the pattern. The dfdl:encoding property is used when decoding the data to characters before matching. It is a schema definition error if alignment is not 1 and a dfdl:discriminator or dfdl:assert with testKind='pattern' is used. (TBD: restrictions on lengthKind='prefixed' as well? Any other framing-based incompatibilities? where assertions with testKind='pattern' are really incompatible?) SMH: If alignment <> 1 is schema definition error then so should leadingSkip <> 0. I'd leave it there though. Also schema definition error if encoding not set. If there are multiple dfdl:assert statements with testKind='pattern' the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. Assertions After: dfdl:discriminator or dfdl:assert with testKind='expression' (the default) are executed after parsing the annotated construct. SMH: Wording needs to cater for combined annotations. Furthermore, an attempt to evaluate a discriminator must be made even if the parse of the annotated construct ended in a parse error. This is because a discriminator could evaluate to true thereby resolving a point of uncertainty even if the complete parsing of the construct ultimately caused a parse error. Such discriminator evaluation has access to the DFDL Infoset of the attempted parse as it existed immediately before detecting the parse failure. Implementations are free to optimize by recognizing and executing discriminators or assertions earlier so long as the resulting behavior is consistent with what results from the above description. If there are multiple dfdl:assert statements with testKind='expression', then the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. The dfdl:newVariableInstance Statement These statements are evaluated before the parsing of the annotated construct. When there is more than one newVariableInstance statement the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. All dfdl:newVariableInstance statements are executed before any dfdl:setVariable statements on the same annotated construct. SMH: SMH: Wording needs to cater for combined annotations. The dfdl:setVariable Statement When a dfdl:setVariable annotation is found on an element reference, element declaration, or simple type definition, then it is executed after the parsing of the element, which implies after the evaluation of expressions corresponding to any computed format properties. That is, if an expression is used to provide the value of a format property such as dfdl:terminator, the evaluation of that expression occurs before any dfdl:setVariable annotation is executed; hence, the expression providing the value of the format property may not reference the variable. When a dfdl:setVariable annotation is found in the combined set of annotations for a sequence, choice, or group reference, then it is executed after any dfdl:newVariableInstance statements in that same combined set, but it is executed before the parsing of the sequence, choice, or group reference. If there are multiple dfdl:setVariable statements in one combined set of annotations, then the order of evaluation among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. SMH: Wording needs to cater for combined annotations. -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

* Clarification:* At any single annotation point of the schema, there can be only one format annotation (dfdl:format, dfdl:element, dfdl:sequence, dfdl:choice, dfdl:group, dfdl:simpleType). * Glossary*: DFDL Statement annotations, or just DFDL Statements, are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. SMH: Nice idea. What about dfdl:defineVariable, is that a statement annotation too? Where does that leave dfdl:defineFormat and dfdl:defineEscapeScheme - they are not format annotations (that's their content). Do we have 'global', 'statement' and 'format' annotations?
Yes, the idea is that there are "defining annotations", "format annotations", and "statement annotations" as the 3 distinct kinds.
* Glossary*: *Combined annotations*: When annotations are combined between a group reference and the sequence or choice of the referenced global group, or among an element reference, an element declaration, and its type definition, the combined set of is referred to as the *combined annotations*. * DFDL Statement Annotation Placement*
dfdl:assert and dfdl:discriminator can be placed as annotations on sequence, choice, group references, local and global element declarations, element references, and simple type definitions.
dfdl:setVariable may be placed as an annotation on sequence, choice, group references, local and global element declarations for elements of simple type, element references to elements of simple type, and simple type definitions.
dfdl:newVariableInstance can be placed as an annotation on sequence, choice, and group references.
The combined annotations for any schema component can contain only a single dfdl:discriminator, or any number of dfdl:assert statements, but not both asserts and a discriminator. It is a schema definition error otherwise.
The combined annotations for any schema component can contain multiple dfdl:setVariable annotations, but they must each refer to a different variable. It is a schema definition error otherwise.
The combined annotations for any schema component can contain multiple dfdl:newVariableInstance annotations, but they must each refer to a different variable. It is a schema definition error otherwise. * Evaluation Order for Statement Annotations* * Assertions Before*:
dfdl:discriminator or dfdl:assert with testKind='pattern' are executed before parsing the annotated construct. SMH: Wording needs to cater for combined annotations.
Another problem is "the annotated construct". I want to say "the thing we're talking about parsing here." What is the right term for that?
Note that the pattern is used to match against the entire representation of the component; hence, the framing (including initiators, etc.) are all visible to the pattern. The dfdl:encoding property is used when decoding the data to characters before matching.
It is a schema definition error if alignment is not 1 and a dfdl:discriminator or dfdl:assert with testKind='pattern' is used. (TBD: restrictions on lengthKind='prefixed' as well? Any other framing-based incompatibilities? where assertions with testKind='pattern' are really incompatible?)
SMH: If alignment <> 1 is schema definition error then so should leadingSkip <> 0. I'd leave it there though. Also schema definition error if encoding not set.
Good. Those are improvements. I would like to just say cannot have lengthKind="prefixed" also. (We can add it back, we can't take it away.)
If there are multiple dfdl:assert statements with testKind='pattern' the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. * Assertions After:*
dfdl:discriminator or dfdl:assert with testKind='expression' (the default) are executed after parsing the annotated construct. SMH: Wording needs to cater for combined annotations.
Furthermore, an attempt to evaluate a discriminator must be made even if the parse of the annotated construct ended in a parse error. This is because a discriminator could evaluate to true thereby resolving a point of uncertainty even if the complete parsing of the construct ultimately caused a parse error. Such discriminator evaluation has access to the DFDL Infoset of the attempted parse as it existed immediately before detecting the parse failure.
Implementations are free to optimize by recognizing and executing discriminators or assertions earlier so long as the resulting behavior is consistent with what results from the above description.
If there are multiple dfdl:assert statements with testKind='expression', then the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. * The dfdl:newVariableInstance Statement*
These statements are evaluated before the parsing of the annotated construct. When there is more than one newVariableInstance statement the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely.
All dfdl:newVariableInstance statements are executed before any dfdl:setVariable statements on the same annotated construct. SMH:
SMH: Wording needs to cater for combined annotations.
*The dfdl:setVariable Statement*
When a dfdl:setVariable annotation is found on an element reference, element declaration, or simple type definition, then it is executed after the parsing of the element, which implies after the evaluation of expressions corresponding to any computed format properties. That is, if an expression is used to provide the value of a format property such as dfdl:terminator, the evaluation of that expression occurs before any dfdl:setVariable annotation is executed; hence, the expression providing the value of the format property may not reference the variable.
When a dfdl:setVariable annotation is found in the combined set of annotations for a sequence, choice, or group reference, then it is executed after any dfdl:newVariableInstance statements in that same combined set, but it is executed before the parsing of the sequence, choice, or group reference.
If there are multiple dfdl:setVariable statements in one combined set of annotations, then the order of evaluation among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely.
SMH: Wording needs to cater for combined annotations.
-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412

I don't have a problem with lengthKind 'prefixed'. It's no worse to me than initiator, and in a text format will likely be text anyway in which case it is easily consumed by a regex. Alignment and leadingSkip are the dodgy ones as they are almost incompatibilities. Maybe we should use the term 'resolved statement annotations for a schema component' when referring to the combined set. Then we can say eg: "Resolved dfdl:discriminator or dfdl:assert annotations with testKind='pattern' for a component are executed before parsing the component." Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson/UK/IBM@IBMGB, Cc: dfdl-wg@ogf.org Date: 01/11/2012 17:19 Subject: Re: [DFDL-WG] DFDL Statement Evaluation Timing (Assert, Discriminator, SetVariable, NewVariableInstance) Clarification: At any single annotation point of the schema, there can be only one format annotation (dfdl:format, dfdl:element, dfdl:sequence, dfdl:choice, dfdl:group, dfdl:simpleType). Glossary: DFDL Statement annotations, or just DFDL Statements, are the annotation elements dfdl:assert, dfdl:discriminator, dfdl:setVariable, and dfdl:newVariableInstance. SMH: Nice idea. What about dfdl:defineVariable, is that a statement annotation too? Where does that leave dfdl:defineFormat and dfdl:defineEscapeScheme - they are not format annotations (that's their content). Do we have 'global', 'statement' and 'format' annotations? Yes, the idea is that there are "defining annotations", "format annotations", and "statement annotations" as the 3 distinct kinds. Glossary: Combined annotations: When annotations are combined between a group reference and the sequence or choice of the referenced global group, or among an element reference, an element declaration, and its type definition, the combined set of is referred to as the combined annotations . DFDL Statement Annotation Placement dfdl:assert and dfdl:discriminator can be placed as annotations on sequence, choice, group references, local and global element declarations, element references, and simple type definitions. dfdl:setVariable may be placed as an annotation on sequence, choice, group references, local and global element declarations for elements of simple type, element references to elements of simple type, and simple type definitions. dfdl:newVariableInstance can be placed as an annotation on sequence, choice, and group references. The combined annotations for any schema component can contain only a single dfdl:discriminator, or any number of dfdl:assert statements, but not both asserts and a discriminator. It is a schema definition error otherwise. The combined annotations for any schema component can contain multiple dfdl:setVariable annotations, but they must each refer to a different variable. It is a schema definition error otherwise. The combined annotations for any schema component can contain multiple dfdl:newVariableInstance annotations, but they must each refer to a different variable. It is a schema definition error otherwise. Evaluation Order for Statement Annotations Assertions Before: dfdl:discriminator or dfdl:assert with testKind='pattern' are executed before parsing the annotated construct. SMH: Wording needs to cater for combined annotations. Another problem is "the annotated construct". I want to say "the thing we're talking about parsing here." What is the right term for that? Note that the pattern is used to match against the entire representation of the component; hence, the framing (including initiators, etc.) are all visible to the pattern. The dfdl:encoding property is used when decoding the data to characters before matching. It is a schema definition error if alignment is not 1 and a dfdl:discriminator or dfdl:assert with testKind='pattern' is used. (TBD: restrictions on lengthKind='prefixed' as well? Any other framing-based incompatibilities? where assertions with testKind='pattern' are really incompatible?) SMH: If alignment <> 1 is schema definition error then so should leadingSkip <> 0. I'd leave it there though. Also schema definition error if encoding not set. Good. Those are improvements. I would like to just say cannot have lengthKind="prefixed" also. (We can add it back, we can't take it away.) If there are multiple dfdl:assert statements with testKind='pattern' the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. Assertions After: dfdl:discriminator or dfdl:assert with testKind='expression' (the default) are executed after parsing the annotated construct. SMH: Wording needs to cater for combined annotations. Furthermore, an attempt to evaluate a discriminator must be made even if the parse of the annotated construct ended in a parse error. This is because a discriminator could evaluate to true thereby resolving a point of uncertainty even if the complete parsing of the construct ultimately caused a parse error. Such discriminator evaluation has access to the DFDL Infoset of the attempted parse as it existed immediately before detecting the parse failure. Implementations are free to optimize by recognizing and executing discriminators or assertions earlier so long as the resulting behavior is consistent with what results from the above description. If there are multiple dfdl:assert statements with testKind='expression', then the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. The dfdl:newVariableInstance Statement These statements are evaluated before the parsing of the annotated construct. When there is more than one newVariableInstance statement the order of execution among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. All dfdl:newVariableInstance statements are executed before any dfdl:setVariable statements on the same annotated construct. SMH: SMH: Wording needs to cater for combined annotations. The dfdl:setVariable Statement When a dfdl:setVariable annotation is found on an element reference, element declaration, or simple type definition, then it is executed after the parsing of the element, which implies after the evaluation of expressions corresponding to any computed format properties. That is, if an expression is used to provide the value of a format property such as dfdl:terminator, the evaluation of that expression occurs before any dfdl:setVariable annotation is executed; hence, the expression providing the value of the format property may not reference the variable. When a dfdl:setVariable annotation is found in the combined set of annotations for a sequence, choice, or group reference, then it is executed after any dfdl:newVariableInstance statements in that same combined set, but it is executed before the parsing of the sequence, choice, or group reference. If there are multiple dfdl:setVariable statements in one combined set of annotations, then the order of evaluation among them is not specified. Schema authors can insert sequences to control the timing of evaluation of statements more precisely. SMH: Wording needs to cater for combined annotations. -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (4)
-
Mike Beckerle
-
Steve Hanson
-
Suman Kalia
-
Tim Kimber