New scoping rules

All Attached is the description of the new DFDL scoping rules. We did not discuss the rules for simpleType derivations so I have assumed that it uses the same rules as simpleType reference, namely that the properties are merged and there must not be any duplicate properties specified. I have removed most of the complicated examples as they no longer apply. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Alan, I've done some thinking on the scoping, and I think we've talked ourselves into a bad position.
From the note on scoping:
*The proposal currently under consideration is:* - *Schema objects inherit DFDL properties from a lexically enclosing xs:complexType or xs:group declaration* - *DFDL properties on a referenced global schema object (except simpleTypes) cannot be overridden unless explicitly parameterized by the global object.* *The above is problematic. This breaks referential transparency.* - *DFDL properties explicity defined on an element and it's referenced simpeType are merged into a single set. It is an error is the same property is defined on both the element and simpleType.* - *It must be possible to validate all global objects , except simpleTypes.* * This last bullet is an unreasonable requirement, depending on how you define validity. This was put in to simplify a tooling requirement of some sort that I believe is likely not a good goal for us to accept. Validity can mean "is consistent", but should not require property specifications to be "complete". This is an area of some confusion in DFDL. We have stated that a schema must have "all required properties" specified, and that there is no defaulting of property values by implementations. The purpose of this is to avoid implementation-specific or platform specific assumptions from creeping in so that DFDL schemas are more likely to be portable. This statement has been misinterpreted in the following sense. Some have interpreted this as meaning that all properties that are defined in the DFDL spec must have values set in order for a schema to be "valid". But when stating the "all required properties" rule (largely at my insistance), this was definitely not my intention. Consider for example if a format is all text, and uses a single-byte character set encoding, then I claim that dfdl:byteOrder need not be specified as it will never be needed to interpret the data. The point of saying there are no defaults for property values is NOT to require dfdl:byteOrder to always be specified, it is to say that if the format requires dfdl:byteOrder - because it has binary multi-byte representations in it, or wide characters which have endianness, then dfdl:byteOrder must be specified by the schema, either directly, by an included schema referenced by the schema, or must be specified explicitly via some external mechanism - section 21 of draft 035. The point is that the implementation cannot just say "there is an unstated default" in this implementation for dfdl:byteOrder based on the platform you are installed on. If an implementation were to do that, then the schemas usable with that implementation will not be portable for use with other implementations - something we are trying to avoid. The difference here is subtle but important. Section 22 of draft 035 is a place holder for some pre-defined include-files the inclusion of which will provide dfdl:defineFormat specifications for useful sets of properties. It is important for everyone to understand that including these in a DFDL schema is 100% optional, and is for convenience of obtaining consistent and meaningful sets of properties only. However, simple formats can be described without any inclusion of these at all. As another example: if a file contains only an array of binary floating point numbers, then no dfdl:encoding property is needed. Just a handful of properties are needed to parse/unparse such a file format, and those are the ones about binary floating point numbers, and in the case of an array, about multiple occurrences. Getting back to scoping and the validation of a global decl/def.... Upshot of all this: it means from the perspective of "validating" a global decl/def, one can't have conflicting DFDL properties in a global type or element declaration, but properties can be unspecified/unstated also, to be provided by the way that global decl/def is used. If a top-level element declaration is incomplete in this style, then it is unsuitable for use as the document element of a data file/stream unless augmented by external information - something possible and which we discuss in chapter 21 (version 035) of the spec without giving specific mechanism. If a top-level element declaration is incomplete in this style, then it can be made complete by way of being used by reference from another point in the schema which surrounds it with a scope providing the needed properties, or which provides the needed properties directly at the point of reference. This preserves referential transparency, and makes the semantics of referential transparency be just plain textual substitution, which is the semantics in XML Schema in general. I believe total validity (Consistency AND completeness) for global decls/defs is not worth trying to achieve for the sake of a tooling goal. Tooling may have to be more sophisticated, but discarding referential transparency is not something we should do for the sake of simplifying some goal for tooling that isn't even clearly a requirement. A tooling "goal" might be to allow an interactive user to point at a schema anywhere and see a list of properties in effect at that point. Total validity (consistency and completeness) is required for a concrete answer to this. However, why do we think this tooling goal should be a requirement? The answer presented back to the user could be that some properties are "unspecified", while other properties have specific values. I don't see this as problematic. We carefully decided not to allow any lexical invocation of DFDL formats at top level in order to eliminate the issue of lexical closure for top level objects. This allows ordinary textual referential integrity to work. I.e., reference semantics is exactly that of textual substitution. This is very desirable, as it allows ordinary refactoring of DFDL schemas to share common decls/defs to work in the expected manner. To me this is very desirable, and is a primary composition principle which will allow creation of complex schemas from simpler parts. * On Mon, Sep 7, 2009 at 11:55 AM, Alan Powell <alan_powell@uk.ibm.com> wrote:
All
Attached is the description of the new DFDL scoping rules.
We did not discuss the rules for simpleType derivations so I have assumed that it uses the same rules as simpleType reference, namely that the properties are merged and there must not be any duplicate properties specified.
I have removed most of the complicated examples as they no longer apply.
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898
------------------------------
* *
*Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU *
-- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg

Mike I agree with you that the new scoping rules are not workable. I also agree with you that 'all the required dfdl properties' does not mean 'all the dfdl properties'. Unfortunately because the only way to turn some properties off is to set them to the empty string you require a lot more properties that you might expect. 'initiator', 'terminator', inputValuCalc, 'outputValueCalc, etc, etc all fall into this category so must be set for every element. While it may be acceptable to say that global components don't have to be complete it must be possible to verify that a schema definition is complete and correct so are we back to designating the starting points? We have discussed scoping a lot without finding an ideal solution so I am beginning to wonder if we should give up and exclude it from DFDL V1. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Alan Powell/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org Date: 29/09/2009 13:52 Subject: Re: [DFDL-WG] New scoping rules Alan, I've done some thinking on the scoping, and I think we've talked ourselves into a bad position.
From the note on scoping: The proposal currently under consideration is: Schema objects inherit DFDL properties from a lexically enclosing xs:complexType or xs:group declaration DFDL properties on a referenced global schema object (except simpleTypes) cannot be overridden unless explicitly parameterized by the global object. The above is problematic. This breaks referential transparency.
DFDL properties explicity defined on an element and it's referenced simpeType are merged into a single set. It is an error is the same property is defined on both the element and simpleType. It must be possible to validate all global objects , except simpleTypes. This last bullet is an unreasonable requirement, depending on how you define validity. This was put in to simplify a tooling requirement of some sort that I believe is likely not a good goal for us to accept. Validity can mean "is consistent", but should not require property specifications to be "complete". This is an area of some confusion in DFDL. We have stated that a schema must have "all required properties" specified, and that there is no defaulting of property values by implementations. The purpose of this is to avoid implementation-specific or platform specific assumptions from creeping in so that DFDL schemas are more likely to be portable. This statement has been misinterpreted in the following sense. Some have interpreted this as meaning that all properties that are defined in the DFDL spec must have values set in order for a schema to be "valid". But when stating the "all required properties" rule (largely at my insistance), this was definitely not my intention. Consider for example if a format is all text, and uses a single-byte character set encoding, then I claim that dfdl:byteOrder need not be specified as it will never be needed to interpret the data. The point of saying there are no defaults for property values is NOT to require dfdl:byteOrder to always be specified, it is to say that if the format requires dfdl:byteOrder - because it has binary multi-byte representations in it, or wide characters which have endianness, then dfdl:byteOrder must be specified by the schema, either directly, by an included schema referenced by the schema, or must be specified explicitly via some external mechanism - section 21 of draft 035. The point is that the implementation cannot just say "there is an unstated default" in this implementation for dfdl:byteOrder based on the platform you are installed on. If an implementation were to do that, then the schemas usable with that implementation will not be portable for use with other implementations - something we are trying to avoid. The difference here is subtle but important. Section 22 of draft 035 is a place holder for some pre-defined include-files the inclusion of which will provide dfdl:defineFormat specifications for useful sets of properties. It is important for everyone to understand that including these in a DFDL schema is 100% optional, and is for convenience of obtaining consistent and meaningful sets of properties only. However, simple formats can be described without any inclusion of these at all. As another example: if a file contains only an array of binary floating point numbers, then no dfdl:encoding property is needed. Just a handful of properties are needed to parse/unparse such a file format, and those are the ones about binary floating point numbers, and in the case of an array, about multiple occurrences. Getting back to scoping and the validation of a global decl/def.... Upshot of all this: it means from the perspective of "validating" a global decl/def, one can't have conflicting DFDL properties in a global type or element declaration, but properties can be unspecified/unstated also, to be provided by the way that global decl/def is used. If a top-level element declaration is incomplete in this style, then it is unsuitable for use as the document element of a data file/stream unless augmented by external information - something possible and which we discuss in chapter 21 (version 035) of the spec without giving specific mechanism. If a top-level element declaration is incomplete in this style, then it can be made complete by way of being used by reference from another point in the schema which surrounds it with a scope providing the needed properties, or which provides the needed properties directly at the point of reference. This preserves referential transparency, and makes the semantics of referential transparency be just plain textual substitution, which is the semantics in XML Schema in general. I believe total validity (Consistency AND completeness) for global decls/defs is not worth trying to achieve for the sake of a tooling goal. Tooling may have to be more sophisticated, but discarding referential transparency is not something we should do for the sake of simplifying some goal for tooling that isn't even clearly a requirement. A tooling "goal" might be to allow an interactive user to point at a schema anywhere and see a list of properties in effect at that point. Total validity (consistency and completeness) is required for a concrete answer to this. However, why do we think this tooling goal should be a requirement? The answer presented back to the user could be that some properties are "unspecified", while other properties have specific values. I don't see this as problematic. We carefully decided not to allow any lexical invocation of DFDL formats at top level in order to eliminate the issue of lexical closure for top level objects. This allows ordinary textual referential integrity to work. I.e., reference semantics is exactly that of textual substitution. This is very desirable, as it allows ordinary refactoring of DFDL schemas to share common decls/defs to work in the expected manner. To me this is very desirable, and is a primary composition principle which will allow creation of complex schemas from simpler parts. On Mon, Sep 7, 2009 at 11:55 AM, Alan Powell <alan_powell@uk.ibm.com> wrote: All Attached is the description of the new DFDL scoping rules. We did not discuss the rules for simpleType derivations so I have assumed that it uses the same rules as simpleType reference, namely that the properties are merged and there must not be any duplicate properties specified. I have removed most of the complicated examples as they no longer apply. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Mike - We have to be pragmatic and ensure that vendors can provide efficient implementations for the specification in order for DFDL to take off. All we required was to explicitly specify a ref property to global format on global complex and group definitions so we can validate the contents during the development phase rather than deferring to runtime phase where the problem determination becomes more complex and time consuming. Even if we had the original proposal in place, we would have allowed the ref property to be specified on these global constructs in which case it would have overridden all properties in scope from the element or inherited from parent in case this global element was included through element reference. You can view the current proposal as a restricted form of the original proposal. Also when we say referential transparency; your reference is to element declaration and group definition -> I think it is there to a large extent but what we have restricted is that the properties from element do not scope over its contents unless you explicitly model using variables which I think is creating lot more complexity based on the examples seen from last week. We may want to remove this from V1 specification. In my personal opinion, I think it is a reasonable restriction for V1 of the specification keeping in view the complexity of initial implementation (tooling and runtime). We can relax these restrictions in the later version of the specification. We will have more discussion on this topic tomorrow.. Suman Kalia IBM Toronto Lab WMB Toolkit Architect and Development Lead WebSphere Business Integration Application Connectivity Tools http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... Tel : 905-413-3923 T/L 969-3923 Fax : 905-413-4850 T/L 969-4850 Internet ID : kalia@ca.ibm.com From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Alan Powell <alan_powell@uk.ibm.com> Cc: dfdl-wg@ogf.org Date: 09/29/2009 08:52 AM Subject: Re: [DFDL-WG] New scoping rules Alan, I've done some thinking on the scoping, and I think we've talked ourselves into a bad position.
From the note on scoping: The proposal currently under consideration is: Schema objects inherit DFDL properties from a lexically enclosing xs:complexType or xs:group declaration DFDL properties on a referenced global schema object (except simpleTypes) cannot be overridden unless explicitly parameterized by the global object. The above is problematic. This breaks referential transparency.
DFDL properties explicity defined on an element and it's referenced simpeType are merged into a single set. It is an error is the same property is defined on both the element and simpleType. It must be possible to validate all global objects , except simpleTypes. This last bullet is an unreasonable requirement, depending on how you define validity. This was put in to simplify a tooling requirement of some sort that I believe is likely not a good goal for us to accept. Validity can mean "is consistent", but should not require property specifications to be "complete". This is an area of some confusion in DFDL. We have stated that a schema must have "all required properties" specified, and that there is no defaulting of property values by implementations. The purpose of this is to avoid implementation-specific or platform specific assumptions from creeping in so that DFDL schemas are more likely to be portable. This statement has been misinterpreted in the following sense. Some have interpreted this as meaning that all properties that are defined in the DFDL spec must have values set in order for a schema to be "valid". But when stating the "all required properties" rule (largely at my insistance), this was definitely not my intention. Consider for example if a format is all text, and uses a single-byte character set encoding, then I claim that dfdl:byteOrder need not be specified as it will never be needed to interpret the data. The point of saying there are no defaults for property values is NOT to require dfdl:byteOrder to always be specified, it is to say that if the format requires dfdl:byteOrder - because it has binary multi-byte representations in it, or wide characters which have endianness, then dfdl:byteOrder must be specified by the schema, either directly, by an included schema referenced by the schema, or must be specified explicitly via some external mechanism - section 21 of draft 035. The point is that the implementation cannot just say "there is an unstated default" in this implementation for dfdl:byteOrder based on the platform you are installed on. If an implementation were to do that, then the schemas usable with that implementation will not be portable for use with other implementations - something we are trying to avoid. The difference here is subtle but important. Section 22 of draft 035 is a place holder for some pre-defined include-files the inclusion of which will provide dfdl:defineFormat specifications for useful sets of properties. It is important for everyone to understand that including these in a DFDL schema is 100% optional, and is for convenience of obtaining consistent and meaningful sets of properties only. However, simple formats can be described without any inclusion of these at all. As another example: if a file contains only an array of binary floating point numbers, then no dfdl:encoding property is needed. Just a handful of properties are needed to parse/unparse such a file format, and those are the ones about binary floating point numbers, and in the case of an array, about multiple occurrences. Getting back to scoping and the validation of a global decl/def.... Upshot of all this: it means from the perspective of "validating" a global decl/def, one can't have conflicting DFDL properties in a global type or element declaration, but properties can be unspecified/unstated also, to be provided by the way that global decl/def is used. If a top-level element declaration is incomplete in this style, then it is unsuitable for use as the document element of a data file/stream unless augmented by external information - something possible and which we discuss in chapter 21 (version 035) of the spec without giving specific mechanism. If a top-level element declaration is incomplete in this style, then it can be made complete by way of being used by reference from another point in the schema which surrounds it with a scope providing the needed properties, or which provides the needed properties directly at the point of reference. This preserves referential transparency, and makes the semantics of referential transparency be just plain textual substitution, which is the semantics in XML Schema in general. I believe total validity (Consistency AND completeness) for global decls/defs is not worth trying to achieve for the sake of a tooling goal. Tooling may have to be more sophisticated, but discarding referential transparency is not something we should do for the sake of simplifying some goal for tooling that isn't even clearly a requirement. A tooling "goal" might be to allow an interactive user to point at a schema anywhere and see a list of properties in effect at that point. Total validity (consistency and completeness) is required for a concrete answer to this. However, why do we think this tooling goal should be a requirement? The answer presented back to the user could be that some properties are "unspecified", while other properties have specific values. I don't see this as problematic. We carefully decided not to allow any lexical invocation of DFDL formats at top level in order to eliminate the issue of lexical closure for top level objects. This allows ordinary textual referential integrity to work. I.e., reference semantics is exactly that of textual substitution. This is very desirable, as it allows ordinary refactoring of DFDL schemas to share common decls/defs to work in the expected manner. To me this is very desirable, and is a primary composition principle which will allow creation of complex schemas from simpler parts. On Mon, Sep 7, 2009 at 11:55 AM, Alan Powell <alan_powell@uk.ibm.com> wrote: All Attached is the description of the new DFDL scoping rules. We did not discuss the rules for simpleType derivations so I have assumed that it uses the same rules as simpleType reference, namely that the properties are merged and there must not be any duplicate properties specified. I have removed most of the complicated examples as they no longer apply. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg

From an IBM point of view, if DFDL is not at least as easy to use as WTX type trees or WMB MRM, then something has gone wrong. It's the scoping rules that concern me most.
From the note on scoping: The proposal currently under consideration is: Schema objects inherit DFDL properties from a lexically enclosing xs:complexType or xs:group declaration DFDL properties on a referenced global schema object (except simpleTypes) cannot be overridden unless explicitly parameterized by the global object. The above is problematic. This breaks referential transparency. DFDL properties explicity defined on an element and it's referenced simpeType are merged into a single set. It is an error is the same
The IBM WTX type tree model today does not have scoping rules. There is no equivalent of a defineFormat block. Every property you need for an object must be set on that object. An object can 'inherit' properties from another object, but that is a static inheritance performed at creation time, it is not dynamic. In other words the properties are copied once and once only. All properties have implicit model defaults. There is no concept of local v global. Everything is global. When an object is used in a 'group', there is no overriding of properties at point of use. The IBM MRM model today is annotated schema like DFDL, and has the equivalent of a defineFormat block, which must be referenced from all objects in the schema. The MRM Format block contains schema wide properties of two kinds: a) ones that are only at schema level and not on objects (eg, escape scheme, timezone, encoding, byte order), and b) ones that are at both schema and object level (eg, separator, terminator) and therefore act as defaults. The majority of MRM properties are not in the Format block, are only on objects, and therefore have implicit model defaults. There are no MRM properties on simple types, so no issue about merging element/type properties. There is local v global, but only a handful of properties are able to be overridden. Both the above models have issues over their flexibility. DFDL is intended to address these - but it should not do so at the expense of decreased usability. I think we are all agreed that implicit defaults cause problems. We have a huge amount of code in the MRM model that sets defaults depending on the values of other properties, the simple type, and so on. It is a maintenance headache. It is much cleaner to push this back to the user, by saying that there are no defaults and you must supply a format block and ref it if you want defaults. The only problem this causes is cited by Alan - that there are some properties where a default is obvious - eg, initiator default is that there is no initiator. Yet we force the user to set dfdl:initiator="". The mitigation for this is that there will be a range of example defineFormat blocks available that people will invariably use. I am ok with that. The requirements I get from MRM customers are that they want to set a default for a property for the whole schema, and for some properties to set a default that depends on logical type - for example pad character. I have yet to see a requirement for setting defaults at the level of a complex type. I think Suman is suggesting that dfdl:ref is made mandatory. I actually don't have a problem with that, as it's how MRM works today. But I think making it optional is better. At the end of the day, the user either specifies all properties needed by the object on the object, or he uses dfdl:ref plus one or more properties on the object. If he wants to do the former, then let him. As a starter, I think we should consider dropping the use of dfdl:format on a complex type. If the user wants to pick up a set of defaults, he must use dfdl:ref. I think the proposed merging of element/simple type properties makes sense, and is the only sensible rule we can apply. The sticking point is element/group references. Regards Steve Hanson Programming Model Architect, WebSphere Message Brokers, OGF DFDL WG Co-Chair, Hursley, UK, Internet: smh@uk.ibm.com, Phone (+44)/(0) 1962-815848 From: Suman Kalia <kalia@ca.ibm.com> To: Mike Beckerle <mbeckerle.dfdl@gmail.com> Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org Date: 29/09/2009 19:03 Subject: Re: [DFDL-WG] New scoping rules Sent by: dfdl-wg-bounces@ogf.org Mike - We have to be pragmatic and ensure that vendors can provide efficient implementations for the specification in order for DFDL to take off. All we required was to explicitly specify a ref property to global format on global complex and group definitions so we can validate the contents during the development phase rather than deferring to runtime phase where the problem determination becomes more complex and time consuming. Even if we had the original proposal in place, we would have allowed the ref property to be specified on these global constructs in which case it would have overridden all properties in scope from the element or inherited from parent in case this global element was included through element reference. You can view the current proposal as a restricted form of the original proposal. Also when we say referential transparency; your reference is to element declaration and group definition -> I think it is there to a large extent but what we have restricted is that the properties from element do not scope over its contents unless you explicitly model using variables which I think is creating lot more complexity based on the examples seen from last week. We may want to remove this from V1 specification. In my personal opinion, I think it is a reasonable restriction for V1 of the specification keeping in view the complexity of initial implementation (tooling and runtime). We can relax these restrictions in the later version of the specification. We will have more discussion on this topic tomorrow.. Suman Kalia IBM Toronto Lab WMB Toolkit Architect and Development Lead WebSphere Business Integration Application Connectivity Tools http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... Tel : 905-413-3923 T/L 969-3923 Fax : 905-413-4850 T/L 969-4850 Internet ID : kalia@ca.ibm.com From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Alan Powell <alan_powell@uk.ibm.com> Cc: dfdl-wg@ogf.org Date: 09/29/2009 08:52 AM Subject: Re: [DFDL-WG] New scoping rules Alan, I've done some thinking on the scoping, and I think we've talked ourselves into a bad position. property is defined on both the element and simpleType. It must be possible to validate all global objects , except simpleTypes. This last bullet is an unreasonable requirement, depending on how you define validity. This was put in to simplify a tooling requirement of some sort that I believe is likely not a good goal for us to accept. Validity can mean "is consistent", but should not require property specifications to be "complete". This is an area of some confusion in DFDL. We have stated that a schema must have "all required properties" specified, and that there is no defaulting of property values by implementations. The purpose of this is to avoid implementation-specific or platform specific assumptions from creeping in so that DFDL schemas are more likely to be portable. This statement has been misinterpreted in the following sense. Some have interpreted this as meaning that all properties that are defined in the DFDL spec must have values set in order for a schema to be "valid". But when stating the "all required properties" rule (largely at my insistance), this was definitely not my intention. Consider for example if a format is all text, and uses a single-byte character set encoding, then I claim that dfdl:byteOrder need not be specified as it will never be needed to interpret the data. The point of saying there are no defaults for property values is NOT to require dfdl:byteOrder to always be specified, it is to say that if the format requires dfdl:byteOrder - because it has binary multi-byte representations in it, or wide characters which have endianness, then dfdl:byteOrder must be specified by the schema, either directly, by an included schema referenced by the schema, or must be specified explicitly via some external mechanism - section 21 of draft 035. The point is that the implementation cannot just say "there is an unstated default" in this implementation for dfdl:byteOrder based on the platform you are installed on. If an implementation were to do that, then the schemas usable with that implementation will not be portable for use with other implementations - something we are trying to avoid. The difference here is subtle but important. Section 22 of draft 035 is a place holder for some pre-defined include-files the inclusion of which will provide dfdl:defineFormat specifications for useful sets of properties. It is important for everyone to understand that including these in a DFDL schema is 100% optional, and is for convenience of obtaining consistent and meaningful sets of properties only. However, simple formats can be described without any inclusion of these at all. As another example: if a file contains only an array of binary floating point numbers, then no dfdl:encoding property is needed. Just a handful of properties are needed to parse/unparse such a file format, and those are the ones about binary floating point numbers, and in the case of an array, about multiple occurrences. Getting back to scoping and the validation of a global decl/def.... Upshot of all this: it means from the perspective of "validating" a global decl/def, one can't have conflicting DFDL properties in a global type or element declaration, but properties can be unspecified/unstated also, to be provided by the way that global decl/def is used. If a top-level element declaration is incomplete in this style, then it is unsuitable for use as the document element of a data file/stream unless augmented by external information - something possible and which we discuss in chapter 21 (version 035) of the spec without giving specific mechanism. If a top-level element declaration is incomplete in this style, then it can be made complete by way of being used by reference from another point in the schema which surrounds it with a scope providing the needed properties, or which provides the needed properties directly at the point of reference. This preserves referential transparency, and makes the semantics of referential transparency be just plain textual substitution, which is the semantics in XML Schema in general. I believe total validity (Consistency AND completeness) for global decls/defs is not worth trying to achieve for the sake of a tooling goal. Tooling may have to be more sophisticated, but discarding referential transparency is not something we should do for the sake of simplifying some goal for tooling that isn't even clearly a requirement. A tooling "goal" might be to allow an interactive user to point at a schema anywhere and see a list of properties in effect at that point. Total validity (consistency and completeness) is required for a concrete answer to this. However, why do we think this tooling goal should be a requirement? The answer presented back to the user could be that some properties are "unspecified", while other properties have specific values. I don't see this as problematic. We carefully decided not to allow any lexical invocation of DFDL formats at top level in order to eliminate the issue of lexical closure for top level objects. This allows ordinary textual referential integrity to work. I.e., reference semantics is exactly that of textual substitution. This is very desirable, as it allows ordinary refactoring of DFDL schemas to share common decls/defs to work in the expected manner. To me this is very desirable, and is a primary composition principle which will allow creation of complex schemas from simpler parts. On Mon, Sep 7, 2009 at 11:55 AM, Alan Powell <alan_powell@uk.ibm.com> wrote: All Attached is the description of the new DFDL scoping rules. We did not discuss the rules for simpleType derivations so I have assumed that it uses the same rules as simpleType reference, namely that the properties are merged and there must not be any duplicate properties specified. I have removed most of the complicated examples as they no longer apply. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

All- This is the conclusion that I've come to, as well, and I was going to make some of the same points Mike made (much better than I could) in tomorrow's weekly call. I'm still mostly looking at this from the perspective of a user of DFDL, not someone attempting to write a "reference implementation" or validator. While all these points of view need to be considered, I think the most important factor is to make sure the DFDL spec is useful, usable, and actually used by people. If the choice in scoping comes down to "ease of use" vs. "ease of writing a validator," ease of use should be the default unless there's a very good reason to do otherwise. The major problem here seems to be validating a DFDL schema as "correct and complete." If I'm not mistaken, any top-level element could be verified as "correct" (there are no errors in the DFDL declarations or schema), but to verify an element as "complete," either: a) every top-level element has all necessary DFDL properties defined (the current spec seems to require this), or b) the validator needs to know which top-level element will be the "root" of the schema. For usability's sake, I would much prefer b) over a). There already seems to be a decision that the indication of the top-level document element will not be part of the DFDL schema itself, which is consistent with the way XML schema works, so I would suggest an alternative. A validator can verify that a DFDL schema is "correct" without any additional information, but to verify "completeness," it would need to be given the top-level document element as an argument. This way, DFDL properties can be inherited by reference, instead of lexically scoped, which would make DFDL much more usable. The other issue Mike has brought up is which properties are necessary to specify to have a "complete" definition. He does not seem to want default values for the properties, which is understandable when you consider the case of byteOrder or other platform-specific information. A default that is the opposite of the particular platform DFDL is being used on would be very confusing at least, and defaulting to "whatever the current platform uses" would make the DFDL schema ambiguous, and far less useful for cross-platform communications. I would be in favor of the rule he proposes that every property necessary for the implementation of the particular schema has to be declared, but under one condition - the spec needs to clearly state what that set of properties is. Anyone who uses DFDL will need to know this. If I am defining a text format, I need to know whether byteOrder is necessary in my dfdl:format. This is especially true if (as in Mike's example) it depends on the character encoding. Please let me know if I've made any errors, or if I've left something unclear. Otherwise, I'm looking forward to discussing this tomorrow morning (for me). Thanks, -Steve -- Steve Marting, Progeny Systems Corp. Manassas, VA (World HQ) 703-368-6107 x162 / smarting@progeny.net ________________________________ From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On Behalf Of Mike Beckerle Sent: Tuesday, September 29, 2009 8:52 AM To: Alan Powell Cc: dfdl-wg@ogf.org Subject: Re: [DFDL-WG] New scoping rules Alan, I've done some thinking on the scoping, and I think we've talked ourselves into a bad position.
From the note on scoping:
The proposal currently under consideration is: * Schema objects inherit DFDL properties from a lexically enclosing xs:complexType or xs:group declaration * DFDL properties on a referenced global schema object (except simpleTypes) cannot be overridden unless explicitly parameterized by the global object. The above is problematic. This breaks referential transparency. * DFDL properties explicity defined on an element and it's referenced simpeType are merged into a single set. It is an error is the same property is defined on both the element and simpleType. * It must be possible to validate all global objects , except simpleTypes. This last bullet is an unreasonable requirement, depending on how you define validity. This was put in to simplify a tooling requirement of some sort that I believe is likely not a good goal for us to accept.
participants (5)
-
Alan Powell
-
Marting, Steve
-
Mike Beckerle
-
Steve Hanson
-
Suman Kalia