Grammar issue - simple and complex asymetry

The draft 034 grammar productions do not allow for a separate prefix/suffix for a simple type as distinguished from the element having that type. Draft 034 does allow for an element of complex type to have a separate prefix and suffix for the element itself and another one for the sequence or choice inside it. I've come to believe this is a mistake and I suggest a fix below. Right now the grammar is: Element = SimpleElement | ComplexElement SimpleElement = Prefix SimpleContent Suffix SimpleContent = StringText // terminal. No more prefixes/suffixes ComplexElement = Prefix ComplexContent Suffix ComplexContent = Sequence | Choice Sequence = Prefix SequenceContent Suffix Choice = Prefix ChoiceContent Suffix So, if I do: <complexType dfdl:initiator="[" dfdl:terminator="]"> ... <element name="y"> <complexType> <sequence dfdl:separator="," > <element name="x" type="int"/> <element name="z" type="int"/> </sequence> </complexType> </element> ... </complexType> I have two prefix opportunities. I can flatten the productions above to: ComplexElement = Prefix Prefix SequenceContent Suffix Suffix An instance of this type would look like [[[5],[6]]]. That is, for complex types, there are separate prefix and suffix regions for the element, and for the model-group which makes up its content. The first [ initiates element y. The second [ initiates the sequence The third [ initiates element x. This same behavior is not true for simple types: <complexType dfdl:initiator="[" dfdl:terminator="]"> ... <element name="y" > <simpleType> <restriction base="int"/> </simpleType> </element> ... </complexType> This can only mean [5]. The grammar, as formulated in draft 034, does not allow for more than one prefix or suffix. The [ is the initiator of element y. I believe we should fix this as follows. New grammar: Element = SimpleElement | ComplexElement SimpleElement = Prefix SimpleContent Suffix SimpleContent = StringText ComplexElement = ComplexContent // Note: no more surrounding prefix suffix. ComplexContent = Sequence | Choice Sequence = Prefix SequenceContent Suffix Choice = Prefix ChoiceContent Suffix The above grammar arranges for an element of complex type and its model group to both taken together specify a single prefix and suffix. Revisiting our example (just repeating it here): <complexType dfdl:initiator="[" dfdl:terminator="]"> ... <element name="y"> <complexType> <sequence dfdl:separator="," > <element name="x" type="int"/> <element name="z" type="int"/> </sequence> </complexType> </element> ... </complexType> An instance now would look like [[5],[6]] The first [ is the initiator of element y, which is the same as the initiator of the sequence that is its type. The second [ is the initiator of element x. (which is the same as the initiator of the int that is its type) I believe this is more sensible, as it makes the behavior of simple and complex types more similar. It begs the question of how one combines conflicting properties on an element with the properties on the type, and even the model group inside the type in the complex case. Because all these properties are describing the same syntax fields in the grammar. That's a separate topic in a subsequent email.

Mike That looks reasonable. However as you must still be able to specify dfdl:initiator/terminator on the complexType for scoping we need to somehow make it clear that the grammar describes where the properties APPLY not where they are SPECIFIED. Do any properties APPLY to a complexType? Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 From: "Mike Beckerle" <mbeckerle.dfdl@gmail.com> To: <dfdl-wg@ogf.org> Date: 13/05/2009 20:09 Subject: [DFDL-WG] Grammar issue - simple and complex asymetry The draft 034 grammar productions do not allow for a separate prefix/suffix for a simple type as distinguished from the element having that type. Draft 034 does allow for an element of complex type to have a separate prefix and suffix for the element itself and another one for the sequence or choice inside it. I've come to believe this is a mistake and I suggest a fix below. Right now the grammar is: Element = SimpleElement | ComplexElement SimpleElement = Prefix SimpleContent Suffix SimpleContent = StringText // terminal. No more prefixes/suffixes ComplexElement = Prefix ComplexContent Suffix ComplexContent = Sequence | Choice Sequence = Prefix SequenceContent Suffix Choice = Prefix ChoiceContent Suffix So, if I do: <complexType dfdl:initiator="[" dfdl:terminator="]"> ... <element name="y"> <complexType> <sequence dfdl:separator="," > <element name="x" type="int"/> <element name="z" type="int"/> </sequence> </complexType> </element> ... </complexType> I have two prefix opportunities. I can flatten the productions above to: ComplexElement = Prefix Prefix SequenceContent Suffix Suffix An instance of this type would look like [[[5],[6]]]. That is, for complex types, there are separate prefix and suffix regions for the element, and for the model-group which makes up its content. The first [ initiates element y. The second [ initiates the sequence The third [ initiates element x. This same behavior is not true for simple types: <complexType dfdl:initiator="[" dfdl:terminator="]"> ... <element name="y" > <simpleType> <restriction base="int"/> </simpleType> </element> ... </complexType> This can only mean [5]. The grammar, as formulated in draft 034, does not allow for more than one prefix or suffix. The [ is the initiator of element y. I believe we should fix this as follows. New grammar: Element = SimpleElement | ComplexElement SimpleElement = Prefix SimpleContent Suffix SimpleContent = StringText ComplexElement = ComplexContent // Note: no more surrounding prefix suffix. ComplexContent = Sequence | Choice Sequence = Prefix SequenceContent Suffix Choice = Prefix ChoiceContent Suffix The above grammar arranges for an element of complex type and its model group to both taken together specify a single prefix and suffix. Revisiting our example (just repeating it here): <complexType dfdl:initiator="[" dfdl:terminator="]"> ... <element name="y"> <complexType> <sequence dfdl:separator="," > <element name="x" type="int"/> <element name="z" type="int"/> </sequence> </complexType> </element> ... </complexType> An instance now would look like [[5],[6]] The first [ is the initiator of element y, which is the same as the initiator of the sequence that is its type. The second [ is the initiator of element x. (which is the same as the initiator of the int that is its type) I believe this is more sensible, as it makes the behavior of simple and complex types more similar. It begs the question of how one combines conflicting properties on an element with the properties on the type, and even the model group inside the type in the complex case. Because all these properties are describing the same syntax fields in the grammar. That's a separate topic in a subsequent email. -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

To me, no properties apply to a complex type, rather they apply to the model group (sequence or choice) which is the meaning of the complex type. That is, we don't have to distinguish a complex type from the model group that defines it. ...mike Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc. Tel: 781-810-2125 | 100 Fifth Ave., 4th Floor, Waltham MA 02451 | <mailto:mbeckerle.dfdl@gmail.com> mbeckerle.dfdl@gmail.com _____ From: Alan Powell [mailto:alan_powell@uk.ibm.com] Sent: Tuesday, May 19, 2009 11:50 AM To: mbeckerle.dfdl@gmail.com Cc: dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org Subject: Re: [DFDL-WG] Grammar issue - simple and complex asymetry Mike That looks reasonable. However as you must still be able to specify dfdl:initiator/terminator on the complexType for scoping we need to somehow make it clear that the grammar describes where the properties APPLY not where they are SPECIFIED. Do any properties APPLY to a complexType? Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 From: "Mike Beckerle" <mbeckerle.dfdl@gmail.com> To: <dfdl-wg@ogf.org> Date: 13/05/2009 20:09 Subject: [DFDL-WG] Grammar issue - simple and complex asymetry _____ The draft 034 grammar productions do not allow for a separate prefix/suffix for a simple type as distinguished from the element having that type. Draft 034 does allow for an element of complex type to have a separate prefix and suffix for the element itself and another one for the sequence or choice inside it. I've come to believe this is a mistake and I suggest a fix below. Right now the grammar is: Element = SimpleElement | ComplexElement SimpleElement = Prefix SimpleContent Suffix SimpleContent = StringText // terminal. No more prefixes/suffixes ComplexElement = Prefix ComplexContent Suffix ComplexContent = Sequence | Choice Sequence = Prefix SequenceContent Suffix Choice = Prefix ChoiceContent Suffix So, if I do: <complexType dfdl:initiator="[" dfdl:terminator="]"> ... <element name="y"> <complexType> <sequence dfdl:separator="," > <element name="x" type="int"/> <element name="z" type="int"/> </sequence> </complexType> </element> ... </complexType> I have two prefix opportunities. I can flatten the productions above to: ComplexElement = Prefix Prefix SequenceContent Suffix Suffix An instance of this type would look like [[[5],[6]]]. That is, for complex types, there are separate prefix and suffix regions for the element, and for the model-group which makes up its content. The first [ initiates element y. The second [ initiates the sequence The third [ initiates element x. This same behavior is not true for simple types: <complexType dfdl:initiator="[" dfdl:terminator="]"> ... <element name="y" > <simpleType> <restriction base="int"/> </simpleType> </element> ... </complexType> This can only mean [5]. The grammar, as formulated in draft 034, does not allow for more than one prefix or suffix. The [ is the initiator of element y. I believe we should fix this as follows. New grammar: Element = SimpleElement | ComplexElement SimpleElement = Prefix SimpleContent Suffix SimpleContent = StringText ComplexElement = ComplexContent // Note: no more surrounding prefix suffix. ComplexContent = Sequence | Choice Sequence = Prefix SequenceContent Suffix Choice = Prefix ChoiceContent Suffix The above grammar arranges for an element of complex type and its model group to both taken together specify a single prefix and suffix. Revisiting our example (just repeating it here): <complexType dfdl:initiator="[" dfdl:terminator="]"> ... <element name="y"> <complexType> <sequence dfdl:separator="," > <element name="x" type="int"/> <element name="z" type="int"/> </sequence> </complexType> </element> ... </complexType> An instance now would look like [[5],[6]] The first [ is the initiator of element y, which is the same as the initiator of the sequence that is its type. The second [ is the initiator of element x. (which is the same as the initiator of the int that is its type) I believe this is more sensible, as it makes the behavior of simple and complex types more similar. It begs the question of how one combines conflicting properties on an element with the properties on the type, and even the model group inside the type in the complex case. Because all these properties are describing the same syntax fields in the grammar. That's a separate topic in a subsequent email. -- dfdl-wg mailing list dfdl-wg@ogf.org <http://www.ogf.org/mailman/listinfo/dfdl-wg> http://www.ogf.org/mailman/listinfo/dfdl-wg _____ Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Alan Powell
-
Mike Beckerle