
Hi Steve, Yes, we're not really proposing that subset for the "final" language, rather we're trying to be as minimalist as we can up front to facilitate the prototype. I think it's fine if the prototype is a subset of what we specify for v1.0. I just don't want it to be inconsistent with it. The prototype is "non-normative" in w3c lingo, so there will be two documents. One describing what the prototype does and implements, and the draft standard document can be different. Per your point 2 below. Let's split this into "single top level global element" and the others. The others cause no issue far as I can tell. In fact in our prototype it turns out that we don't even see any difference between an element with anonymous type and an element reference. The code just sees an element with name, type, etc. So those are easy to support. Attributes are easily supported also. We need to add a flag bit to our prototype that's all. The single top level issue is a tiny bit deeper. In XML you can get away with more than one global top level element because the documents are always tagged to make it unambiguous which one of the possible global top level elements describes the file. In DFDL we need specific information about which of the possible global element declarations applies to the actual file since there may be nothing in the data which makes it clear. We could do this with an annotation that indicates "this is the one that actually applies to the file". What we did in the prototype is just require there be only one global element declaration to make this unambiguous. The primary reason we left out element references is to be minimal. There's nothing you can do with them that you can't do with a type definition and an ordinary element declaration, so they seem simply unnecessary, and that solved our ambiguity problem too. Resusable groups - yes these are easily supported also modulo that they can have separate minOccurs/maxOccurs at point of use. Again I think our prototype never even sees them. The EMF XSD library essentially forward substitutes them for us so our code never deals with them. Right now we'd miss any additional min/max occurs information though, so that is a bug. Re: hexbinary - I don't understand your use of hexbinary. Can you clarify? Other simple types: yes we could put all of the date types in. We just chose to keep it minimal. I left out the obscure 'date fragment' types because I've never seen data containing things like that, but it's a very minor thing. If you think we need them, then we need them. However I would argue against putting in things for the sake of having more of XSD "covered". Substitution groups - I agree these could be a pseudo choice construct, but I prefer to make an XSD subset and be explicit about it being a subset rather than go for a way to assign meaning to everything in XSD. -----Original Message----- From: owner-dfdl-wg@ggf.org [mailto:owner-dfdl-wg@ggf.org] On Behalf Of Steve Hanson Sent: Wednesday, April 13, 2005 6:59 AM To: dfdl-wg@gridforum.org Subject: [dfdl-wg] DFDL subset of XML schema Mike, looking at your proposal working draft, I don't agree with the DFDL subset you are proposing. I think it is too restrictive. Specifically: 1) xsd:all - we have discussed this as part of the unordered mail exchange last week so I think we now agree this is needed. 2) Single top-level global element, global attributes, element references, attribute references. This prevents re-use. 3) Reusable groups. Ditto. 4) Simple type hexBinary. This is the MRM model's default mapping for binary data. 5) Other simple types. Some of these could be discussed - eg, the date restrictions. 6) Substitution groups. We basically treat these a choice in non-XML data. But I would be ok with deferring support post 1.0. Regards, Steve Steve Hanson WebSphere Business Integration Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848

Hi Mike Replies in-lined below thus >>SMH>>. Regards, Steve Steve Hanson WebSphere Business Integration Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 mike.beckerle@asc entialsoftware.co m To Sent by: Steve Hanson/UK/IBM@IBMGB, owner-dfdl-wg@ggf dfdl-wg@gridforum.org .org cc Subject 13/04/2005 15:30 RE: [dfdl-wg] DFDL subset of XML schema Hi Steve, Yes, we're not really proposing that subset for the "final" language, rather we're trying to be as minimalist as we can up front to facilitate the prototype. I think it's fine if the prototype is a subset of what we specify for v1.0. I just don't want it to be inconsistent with it. The prototype is "non-normative" in w3c lingo, so there will be two documents. One describing what the prototype does and implements, and the draft standard document can be different.
SMH>>: OK - it was not clear that this applied to the prototype only.
Per your point 2 below. Let's split this into "single top level global element" and the others. The others cause no issue far as I can tell. In fact in our prototype it turns out that we don't even see any difference between an element with anonymous type and an element reference. The code just sees an element with name, type, etc. So those are easy to support. Attributes are easily supported also. We need to add a flag bit to our prototype that's all.
SMH>> Good.
The single top level issue is a tiny bit deeper. In XML you can get away with more than one global top level element because the documents are always tagged to make it unambiguous which one of the possible global top level elements describes the file. In DFDL we need specific information about which of the possible global element declarations applies to the actual file since there may be nothing in the data which makes it clear. We could do this with an annotation that indicates "this is the one that actually applies to the file". What we did in the prototype is just require there be only one global element declaration to make this unambiguous.
SMH>> I would expect that one of the inputs to a DFDL parser would be the name of a top-level global element as well as the name of a DFDL schema. That way a single schema can cope with multiple different files.
The primary reason we left out element references is to be minimal. There's nothing you can do with them that you can't do with a type definition and an ordinary element declaration, so they seem simply unnecessary, and that solved our ambiguity problem too.
SMH>> Not sure how this ties in with your answer re element references above?
SMH>> minOccurs and maxOccurs are also allowed on local groups, not just group references. The fact that the EMF XSD library loses embedded groups, whether local or named, is a big problem. When folk model data, we have found that embedded groups are used a lot as a convenient way to change group-level rep properties, without the need to create elements. For our existing
Resusable groups - yes these are easily supported also modulo that they can have separate minOccurs/maxOccurs at point of use. Again I think our prototype never even sees them. The EMF XSD library essentially forward substitutes them for us so our code never deals with them. Right now we'd miss any additional min/max occurs information though, so that is a bug. parser, this means the EMF XSD library is insufficient and an alternative is needed at runtime that preserves group structure. I would say this same would be true for any DFDL parser too. Re: hexbinary - I don't understand your use of hexbinary. Can you clarify?
SMH>> A data structure includes a BLOB of known length. Real binary data and not subject to code page conversion. We would model that as having a type of xsd:hexBinary.
Other simple types: yes we could put all of the date types in. We just chose to keep it minimal. I left out the obscure 'date fragment' types because I've never seen data containing things like that, but it's a very minor thing. If you think we need them, then we need them. However I would argue against putting in things for the sake of having more of XSD "covered".
SMH>> Even if data were in 'date fragment' form most users would be happy to treat it as string data. I agree that it could be excluded in 1.0.
Substitution groups - I agree these could be a pseudo choice construct, but I prefer to make an XSD subset and be explicit about it being a subset rather than go for a way to assign meaning to everything in XSD.
SMH>> I am ok with substitution groups being omitted in 1.0.
-----Original Message----- From: owner-dfdl-wg@ggf.org [mailto:owner-dfdl-wg@ggf.org] On Behalf Of Steve Hanson Sent: Wednesday, April 13, 2005 6:59 AM To: dfdl-wg@gridforum.org Subject: [dfdl-wg] DFDL subset of XML schema Mike, looking at your proposal working draft, I don't agree with the DFDL subset you are proposing. I think it is too restrictive. Specifically: 1) xsd:all - we have discussed this as part of the unordered mail exchange last week so I think we now agree this is needed. 2) Single top-level global element, global attributes, element references, attribute references. This prevents re-use. 3) Reusable groups. Ditto. 4) Simple type hexBinary. This is the MRM model's default mapping for binary data. 5) Other simple types. Some of these could be discussed - eg, the date restrictions. 6) Substitution groups. We basically treat these a choice in non-XML data. But I would be ok with deferring support post 1.0. Regards, Steve Steve Hanson WebSphere Business Integration Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848
participants (2)
-
mike.beckerleï¼ ascentialsoftware.com
-
Steve Hanson