RE: [dfdl-wg] More documents

Hi Steve, That is a good suggestion. An important point that I can try to make, in the abstract is that both cases you outline would have very similar annotations on the logical model. That part of it would not need to change. The explicit declarations are additional material that would (typically) be presented to the compiler in advance. I don't think its particularly important for us to get hung up on the declaration of new conversions at this stage. After discussing this with Mike, I think it makes sense to separate that section off into a new document, that we could discuss separately. At the heart of what I think is important in this conversions document is the semantics for describing the conversions and how they are chosen. In order for the standard to be comprehensible and comprehensive in its description of DFDL I think we need to nail down these details. It is, if you like a useful side effect, that doing so provides the hooks to allow extensibility, but I think it is important in of itself to have a clear description of how that all works. In principle we should be able to hand this standards doc to two developers, lock them in separate rooms and have them build two DFDL parsers that do the same thing for all the many complex corner cases and I think we need a framework like the one I have outlined to get that to work. That said I think there are a lot of issues with what I laid out, Mike identified many of them, which need attention and group discussion. Talk to you tomorrow, Martin
-----Original Message----- From: owner-dfdl-wg@ggf.org [mailto:owner-dfdl-wg@ggf.org] On Behalf Of Steve Hanson Sent: Tuesday, January 31, 2006 1:41 AM To: dfdl-wg@ggf.org Subject: Re: [dfdl-wg] More documents
I've read the Conversions document and to be honest I had a great deal of trouble trying to work out what was going on. I'd like to see an example of a DFDL schema that models a text stream that contains (say) a complex type with string, integer and decimal children, with DFDL properties used to handle the implicit conversions from text to logical data type. Then I'd like to see the same DFDL schema but with equivalent conversions explicitly declared and used instead of the DFDL properties.
Regards, Steve
Steve Hanson WebSphere Message Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848
Mike Beckerle <beckerle@us.ibm. com> To Sent by: Mike Beckerle <beckerle@us.ibm.com> owner-dfdl-wg@ggf cc .org dfdl-wg@ggf.org, owner-dfdl-wg@ggf.org, "Westhead, Martin (Martin)" 28/01/2006 04:17 <westhead@avaya.com>
Subject
Re: [dfdl-wg] More documents
Oh yeah, forgot the attachment. Mike Beckerle STSM, Architect, Scalable Computing IBM Software Group Information Integration Solutions Westborough, MA 01581 voice and FAX 508-599-7148 home/mobile office 508-915-4767
Mike Beckerle/Worcester/IBM@IBMUS Sent by: owner-dfdl-wg@ggf.org
To
"Westhead, Martin
(Martin)"
01/27/2006 07:50 PM <westhead@avaya.com>
cc
dfdl-wg@ggf.org, owner-dfdl-wg@ggf.org
Subject
Re: [dfdl-wg] More
documents
Sorry I missed the call this past week. Here are my comments and some in-line editing on the
I actually think the primary reason to push this document forward is
NOT
extensibility, but because it is needed to clarify the basic semantics of DFDL. That is, it allows us to make very clear exactly what properties in the annotations are guiding the selection of which conversion functions.
...mikeb
Mike Beckerle STSM, Architect, Scalable Computing IBM Software Group Information Integration Solutions Westborough, MA 01581 voice and FAX 508-599-7148 home/mobile office 508-915-4767
"Westhead, Martin (Martin)" <westhead@avaya.com> Sent by: owner-dfdl-wg@ggf.org
To
<dfdl-wg@ggf
01/23/2006 04:58 PM .org>
cc
Subject
[dfdl-wg] More documents
These are documents 3 and 4 of a set of 3 :).
The documents are: - bundles - a reusable block of DFDL statements - Conversions - a proposal for the semantic details as to how conversions are chosen during the operation of a DFDL parser. These semantics include a description as to how the conversions can be
extended.
These documents round off the current set of proposals from the extensibility design team. The team has reasonable consensus on these documents, with the (relatively trivial) exception of bundles where
there
may be some outstanding issues. We believe that with these two documents there is enough material to support the necessary extensibility. However, we have not yet worked through enough examples of round tripping to understand whether that is sufficiently well covered and we are expecting that all the ideas in these documents will require refining by the group as a whole.
I would like to discuss these two documents on Wednesday (if possible).
Thanks,
Martin[attachment "DFDL_Conversions_4.doc" deleted by Mike Beckerle/Worcester/IBM] [attachment "DFDL_Bundles_2.doc" deleted by Mike Beckerle/Worcester/IBM]
(See attached file: DFDL_Conversions_4.doc)

Martin, before I can review this properly, I would like to see how some concrete instances of black-box extensibility appear with your conversions proposal. That is, I would like to see actual, complete DFDL schema and not just the odd snippet. The instances I have in mind are: - A portion of the data is encrypted, with fields in the message prior to the encrypted section providing the decryption keys etc. (X12 security segment motivates this) - Data where some XML is embedded in the middle - Data where decimal fields (say) are in a wacky encoding not supported by stock DFDL properties (TLOG retail standard motivates here) Underlying this is a major concern I have with the consumability of the DFDL standard, If we don't provide something that is fairly straightforward for users to understand out-of-the-box so to speak then a lot of potential users will simply not adopt DFDL. I would suggest that the way to address this is by keeping the 'core' standard as simple as possible, then defining 'extensions'. We can discuss what these extensions might be, I would suggest that support for non-essential XML schema constructs such as attributes be one extension, also multiple input/output streams. Extensibility is a big topic, my gut feel is that some extensibility will appear in the core, some will be an extension. I would also suggest that it be up to an implementor as to whether they support extensions - for some, it might not make any sense, but that should not preclude them providing a certifiable core implementation. Bottom line is that reading the core spec should leave users thinking 'this is great' not 'this is complicated'. None of what I say in the last paragraph invalidates your semantics observations below. It's more on how we portray DFDL to consumers. If you take W3C XML specifications as an example, whilst they are rigorous to the nth degree, many are totally unreadable. While a primer obviously helps things, on its own I don't think it goes far enough. Regards, Steve Steve Hanson WebSphere Message Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 "Westhead, Martin (Martin)" <westhead@avaya.c To om> Steve Hanson/UK/IBM@IBMGB, Sent by: <dfdl-wg@ggf.org> owner-dfdl-wg@ggf cc .org Subject RE: [dfdl-wg] More documents 01/02/2006 04:56 Hi Steve, That is a good suggestion. An important point that I can try to make, in the abstract is that both cases you outline would have very similar annotations on the logical model. That part of it would not need to change. The explicit declarations are additional material that would (typically) be presented to the compiler in advance. I don't think its particularly important for us to get hung up on the declaration of new conversions at this stage. After discussing this with Mike, I think it makes sense to separate that section off into a new document, that we could discuss separately. At the heart of what I think is important in this conversions document is the semantics for describing the conversions and how they are chosen. In order for the standard to be comprehensible and comprehensive in its description of DFDL I think we need to nail down these details. It is, if you like a useful side effect, that doing so provides the hooks to allow extensibility, but I think it is important in of itself to have a clear description of how that all works. In principle we should be able to hand this standards doc to two developers, lock them in separate rooms and have them build two DFDL parsers that do the same thing for all the many complex corner cases and I think we need a framework like the one I have outlined to get that to work. That said I think there are a lot of issues with what I laid out, Mike identified many of them, which need attention and group discussion. Talk to you tomorrow, Martin
-----Original Message----- From: owner-dfdl-wg@ggf.org [mailto:owner-dfdl-wg@ggf.org] On Behalf Of Steve Hanson Sent: Tuesday, January 31, 2006 1:41 AM To: dfdl-wg@ggf.org Subject: Re: [dfdl-wg] More documents
I've read the Conversions document and to be honest I had a great deal of trouble trying to work out what was going on. I'd like to see an example of a DFDL schema that models a text stream that contains (say) a complex type with string, integer and decimal children, with DFDL properties used to handle the implicit conversions from text to logical data type. Then I'd like to see the same DFDL schema but with equivalent conversions explicitly declared and used instead of the DFDL properties.
Regards, Steve
Steve Hanson WebSphere Message Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848
Mike Beckerle <beckerle@us.ibm. com> To Sent by: Mike Beckerle <beckerle@us.ibm.com> owner-dfdl-wg@ggf cc .org dfdl-wg@ggf.org, owner-dfdl-wg@ggf.org, "Westhead, Martin (Martin)" 28/01/2006 04:17 <westhead@avaya.com>
Subject
Re: [dfdl-wg] More documents
Oh yeah, forgot the attachment. Mike Beckerle STSM, Architect, Scalable Computing IBM Software Group Information Integration Solutions Westborough, MA 01581 voice and FAX 508-599-7148 home/mobile office 508-915-4767
Mike Beckerle/Worcester/IBM@IBMUS Sent by: owner-dfdl-wg@ggf.org
To
"Westhead, Martin
(Martin)"
01/27/2006 07:50 PM <westhead@avaya.com>
cc
dfdl-wg@ggf.org, owner-dfdl-wg@ggf.org
Subject
Re: [dfdl-wg] More
documents
Sorry I missed the call this past week. Here are my comments and some in-line editing on the
I actually think the primary reason to push this document forward is
NOT
extensibility, but because it is needed to clarify the basic semantics of DFDL. That is, it allows us to make very clear exactly what properties in the annotations are guiding the selection of which conversion functions.
...mikeb
Mike Beckerle STSM, Architect, Scalable Computing IBM Software Group Information Integration Solutions Westborough, MA 01581 voice and FAX 508-599-7148 home/mobile office 508-915-4767
"Westhead, Martin (Martin)" <westhead@avaya.com> Sent by: owner-dfdl-wg@ggf.org
To
<dfdl-wg@ggf
01/23/2006 04:58 PM .org>
cc
Subject
[dfdl-wg] More documents
These are documents 3 and 4 of a set of 3 :).
The documents are: - bundles - a reusable block of DFDL statements - Conversions - a proposal for the semantic details as to how conversions are chosen during the operation of a DFDL parser. These semantics include a description as to how the conversions can be
extended.
These documents round off the current set of proposals from the extensibility design team. The team has reasonable consensus on these documents, with the (relatively trivial) exception of bundles where
there
may be some outstanding issues. We believe that with these two documents there is enough material to support the necessary extensibility. However, we have not yet worked through enough examples of round tripping to understand whether that is sufficiently well covered and we are expecting that all the ideas in these documents will require refining by the group as a whole.
I would like to discuss these two documents on Wednesday (if possible).
Thanks,
Martin[attachment "DFDL_Conversions_4.doc" deleted by Mike Beckerle/Worcester/IBM] [attachment "DFDL_Bundles_2.doc" deleted by Mike Beckerle/Worcester/IBM]
(See attached file: DFDL_Conversions_4.doc)

On Wednesday 01 February 2006 07:03, Steve Hanson wrote:
- A portion of the data is encrypted, with fields in the message prior to the encrypted section providing the decryption keys etc. (X12 security segment motivates this) - Data where some XML is embedded in the middle - Data where decimal fields (say) are in a wacky encoding not supported by stock DFDL properties (TLOG retail standard motivates here)
Thease are great examples. Can someone give me fully documented data files from which to try to construct such examples? By the way, I'm not sure whether embedded XML is within the scope of DFDL--it gets insanely hairy. But the others are exactly the kinds of things that the core standard must either cover or have an extension mechanism that covers. -- --- Robert E. McGrath, Ph.D. National Center for Supercomputing Applications University of Illinois, Urbana-Champaign 1205 West Clark Urbana, Illinois 61801 (217)-333-6549 mcgrath@ncsa.uiuc.edu

I'll see what I can come up with. As far as the embedded XML goes, I put it there as we will be asked this question. Thinking it through, maybe we should simply treat it as a BLOB and leave it to the user to take and parse using an XML parser as an independent operation. This is symmetric with an XML document containing a non-XML BLOB as CDATA that needed to be parsed using DFDL. Regards, Steve Steve Hanson WebSphere Message Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 "Robert E. McGrath" <mcgrath@ncsa.uiu To c.edu> dfdl-wg@ggf.org Sent by: cc owner-dfdl-wg@ggf .org Subject Re: [dfdl-wg] More documents 01/02/2006 15:03 On Wednesday 01 February 2006 07:03, Steve Hanson wrote:
- A portion of the data is encrypted, with fields in the message prior to the encrypted section providing the decryption keys etc. (X12 security segment motivates this) - Data where some XML is embedded in the middle - Data where decimal fields (say) are in a wacky encoding not supported by stock DFDL properties (TLOG retail standard motivates here)
Thease are great examples. Can someone give me fully documented data files from which to try to construct such examples? By the way, I'm not sure whether embedded XML is within the scope of DFDL--it gets insanely hairy. But the others are exactly the kinds of things that the core standard must either cover or have an extension mechanism that covers. -- --- Robert E. McGrath, Ph.D. National Center for Supercomputing Applications University of Illinois, Urbana-Champaign 1205 West Clark Urbana, Illinois 61801 (217)-333-6549 mcgrath@ncsa.uiuc.edu

I would second this approach. A payload string of XML data is just a string of value content to us. Note however that in our proposed set of properties there is one "isXML" which is intended to facilitate the usage pattern of XML payload strings. This property is a boolean you can set to say that the string's content is a well formed XML document or a well-formed fragment of XML. This is just a shorthand for what would otherwise be a large set of quoting/escaping conventions, the use of a dynamic character set selected based on the encoding attribute in the <?xml version="1.0" encoding="US-ASCII"?> slug line (if present), etc. (We would need to specify what the concept of "well formed fragment of XML" means. I think intuitively people know what this means, something intelligible to an XML parser, but we need to be explicit. It means a fragment of XML that begins and ends between two elements. Hence, is not a fragment that starts in the middle of any quoting construct, nor in the middle of a tag or attribute, etc. ) Mike Beckerle STSM, Architect, Scalable Computing IBM Software Group Information Integration Solutions Westborough, MA 01581 voice and FAX 508-599-7148 home/mobile office 508-915-4767 Steve Hanson <smh@uk.ibm.com> Sent by: owner-dfdl-wg@ggf.org 02/01/2006 01:27 PM To dfdl-wg@ggf.org cc Subject Re: [dfdl-wg] More documents I'll see what I can come up with. As far as the embedded XML goes, I put it there as we will be asked this question. Thinking it through, maybe we should simply treat it as a BLOB and leave it to the user to take and parse using an XML parser as an independent operation. This is symmetric with an XML document containing a non-XML BLOB as CDATA that needed to be parsed using DFDL. Regards, Steve Steve Hanson WebSphere Message Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 "Robert E. McGrath" <mcgrath@ncsa.uiu To c.edu> dfdl-wg@ggf.org Sent by: cc owner-dfdl-wg@ggf .org Subject Re: [dfdl-wg] More documents 01/02/2006 15:03 On Wednesday 01 February 2006 07:03, Steve Hanson wrote:
- A portion of the data is encrypted, with fields in the message prior to the encrypted section providing the decryption keys etc. (X12 security segment motivates this) - Data where some XML is embedded in the middle - Data where decimal fields (say) are in a wacky encoding not supported by stock DFDL properties (TLOG retail standard motivates here)
Thease are great examples. Can someone give me fully documented data files from which to try to construct such examples? By the way, I'm not sure whether embedded XML is within the scope of DFDL--it gets insanely hairy. But the others are exactly the kinds of things that the core standard must either cover or have an extension mechanism that covers. -- --- Robert E. McGrath, Ph.D. National Center for Supercomputing Applications University of Illinois, Urbana-Champaign 1205 West Clark Urbana, Illinois 61801 (217)-333-6549 mcgrath@ncsa.uiuc.edu

I'd forgotten the isXML property - that covers this nicely. I'll stick with the remaining two: - A portion of the data is encrypted, with fields in the message prior to the encrypted section providing the decryption keys etc. (X12 security segment motivates this) - Data where decimal fields (say) are in a wacky encoding not supported by stock DFDL properties (TLOG retail standard motivates this) Regards, Steve Steve Hanson WebSphere Message Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 Mike Beckerle <beckerle@us.ibm. com> To Steve Hanson/UK/IBM@IBMGB 02/02/2006 13:41 cc dfdl-wg@ggf.org, owner-dfdl-wg@ggf.org Subject Re: [dfdl-wg] More documents I would second this approach. A payload string of XML data is just a string of value content to us. Note however that in our proposed set of properties there is one "isXML" which is intended to facilitate the usage pattern of XML payload strings. This property is a boolean you can set to say that the string's content is a well formed XML document or a well-formed fragment of XML. This is just a shorthand for what would otherwise be a large set of quoting/escaping conventions, the use of a dynamic character set selected based on the encoding attribute in the <?xml version="1.0" encoding="US-ASCII"?> slug line (if present), etc. (We would need to specify what the concept of "well formed fragment of XML" means. I think intuitively people know what this means, something intelligible to an XML parser, but we need to be explicit. It means a fragment of XML that begins and ends between two elements. Hence, is not a fragment that starts in the middle of any quoting construct, nor in the middle of a tag or attribute, etc. ) Mike Beckerle STSM, Architect, Scalable Computing IBM Software Group Information Integration Solutions Westborough, MA 01581 voice and FAX 508-599-7148 home/mobile office 508-915-4767 Steve Hanson <smh@uk.ibm.com> Sent by: owner-dfdl-wg@ggf.org To 02/01/2006 01:27 PM dfdl-wg@ggf.org cc Subject Re: [dfdl-wg] More documents I'll see what I can come up with. As far as the embedded XML goes, I put it there as we will be asked this question. Thinking it through, maybe we should simply treat it as a BLOB and leave it to the user to take and parse using an XML parser as an independent operation. This is symmetric with an XML document containing a non-XML BLOB as CDATA that needed to be parsed using DFDL. Regards, Steve Steve Hanson WebSphere Message Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 "Robert E. McGrath" <mcgrath@ncsa.uiu To c.edu> dfdl-wg@ggf.org Sent by: cc owner-dfdl-wg@ggf .org Subject Re: [dfdl-wg] More documents 01/02/2006 15:03 On Wednesday 01 February 2006 07:03, Steve Hanson wrote:
- A portion of the data is encrypted, with fields in the message prior to the encrypted section providing the decryption keys etc. (X12 security segment motivates this) - Data where some XML is embedded in the middle - Data where decimal fields (say) are in a wacky encoding not supported by stock DFDL properties (TLOG retail standard motivates here)
Thease are great examples. Can someone give me fully documented data files from which to try to construct such examples? By the way, I'm not sure whether embedded XML is within the scope of DFDL--it gets insanely hairy. But the others are exactly the kinds of things that the core standard must either cover or have an extension mechanism that covers. -- --- Robert E. McGrath, Ph.D. National Center for Supercomputing Applications University of Illinois, Urbana-Champaign 1205 West Clark Urbana, Illinois 61801 (217)-333-6549 mcgrath@ncsa.uiuc.edu
participants (4)
-
Mike Beckerle
-
Robert E. McGrath
-
Steve Hanson
-
Westhead, Martin (Martin)