All,

I have attached my first draft of recommendations for the DFDL Data Model. It is based heavily on XDM and should be mostly compatible. Where there are incompatibilities, they are by convention and not related to the actual representation so I believe a system that is set up to work with XDM should be easily adaptable to work with DFDL. This was important to me – in my project I would be using DFDL as part of a pipeline process that also involved transforming the results of parsing using XSLT and then unparsing the results of transformation. I imagine many other potential users of DFDL will be engaging in similar XML pipeline scenarios. In these situations, the closer the DFDL model is to those used by other XML technologies, the better. Hopefully the recommendation meets the requirements Steve mentions below – it turns out most of the existing infoset could be mapped directly to XDM concepts. I introduced a new node type for the unresolvable concept to match the existing infoset, but discuss how it is related to XDM and other XML technologies.

This exercise brought to major questions to mind:

- Must an instance of the XML Infoset (and XML document) be valid against a given DFDL Schema (as determined by an XML Schema validation engine) to be available for unparsing? If not, is it up to the DFDL implementation to determine the suitability of the XML Infoset Character Information Items for their given unparsed data type? The real question is: for the unparsing process can a DFDL Data Model be constructed from an XML Infoset directly or only from a PVSI (where does the data for unparsing really come from)? It would seem to me that if the input XML isn't valid against the DFDL Schema, then it probably can't be unparsable - otherwise, how would the invalid portions be handled (such as strings that should be numeric or a structure that doesn't match)?

- I am confused by the notion of the "augmented infoset". The regular infoset appears to be based on the logical structure of the data post-parsing. In other words, choices are resolved and the result looks something like an XML Infoset, PSVI, or XDM tree might following something like XSLT transformation. The augmented infoset on the other hand appears to be based on the logical structure of the DFDL Schema being used for processing and therefore contains branches for all choice possibilities, etc. It is "filled in" as parsing takes place. This doesn't make a lot of sense to me - what about the branches for which there was no data to "fill-in" (such as choice branches that weren't followed)? Are they dropped following parsing? If not, then there are a lot of information items in the final tree that have no value. It made more sense to me to consider the DFDL Data Model as being constructed during parsing and at any given time in the parsing process a portion of the model (that which has already been parsed) is available.

Hopefully those questions made sense... I should (finally) be on the call this Wednesday to discuss.

Dave

From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Wednesday, May 06, 2009 11:15 AM
To: Dave Glick
Cc: Alan Powell; dfdl-wg@ogf.org
Subject: Re: [DFDL-WG] Agenda for OGF WG call 6 May 2009

Dave

Two intents of the infoset was that it should be a) simple and b) easily related to the grammar in 11.3, so whatever you come up with needs to take those requirements into account.

"Parts of XDM that have no relevance to DFDL but are also not conflicting should probably be left in for conciseness and compatibility." - a) above would imply the opposite.

The XDM spec defines the rules for how an XDM can be created from an XML Infoset or a PSVI. We can do a similar exercise for DFDL Infoset, for those users who want to use XSL for any post-DFDL transformation.

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848

Dave Glick <dglick@dracorp.com>
Sent by: dfdl-wg-bounces@ogf.org

06/05/2009 13:55

To	Alan Powell/UK/IBM@IBMGB, "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>
cc
Subject	Re: [DFDL-WG] Agenda for OGF WG call 6 May 2009

All,

My apologies, but I will be unable to make the call again this week. I was hoping to have some suggestions regarding the infoset/data model for discussion today, but it's not quite ready (I still have a little more digging to do through the rest of the spec to make sure what I'm suggesting can adequately capture all the representation cases in DFDL). I'll try to get something out by the end of the day for review and discussion on next week's call.

In general, it appears to me (and I'm admittedly not as versed in the various XML standards as the other members of the group) that we can bring the DFDL Infoset very closely in line with the XDM. Specifically, I've been looking at the way XSLT 2.0 treats XDM as it's data model. It states clearly that XDM is the model for XSLT with certain explicit caveats and additions. This follows the XDM guidance of how it should be used by other standards (specifically in XDM Section 7 and Appendix A). The task for DFDL therefore consists of two parts: what parts of the XDM are in conflict with DFDL and should be explicitly excluded, and what parts of DFDL have no corresponding support in XDM and should be appended. Parts of XDM that have no relevance to DFDL but are also not conflicting should probably be left in for conciseness and compatibility.

My biggest concern is over the use of two different types of Element Information Items in the DFDL specification as this seems so contrary to convention in XDM. My recommendations include treating all element nodes similarly to XDM as complex and those element nodes that actually only contain simple content should have a single child of the XDM text node type or a new DFDL value node type (not sure the best way to go here).

In any case, I'll pass along a full recommendation soon.

Dave

From: dfdl-wg-bounces@ogf.org [dfdl-wg-bounces@ogf.org] On Behalf Of Alan Powell [alan_powell@uk.ibm.com]
Sent: Wednesday, May 06, 2009 6:01 AM
To: dfdl-wg@ogf.org
Subject: [DFDL-WG] Agenda for OGF WG call 6 May 2009

Agenda:

1. Go through actions.

2. LengthKind on Sequences and choices.

LengthKind on sequences and choices and their parent element has proved confusing to new users of DFDL. It is proposed that lengthKind is removed from groups and only allow it to be set on parent element. See email from SH

3. Discuss UnorderedInitated email from SH

4. Infoset codepage and encoding

The spec does not say what codepage and encoding is used for string fields.

5. AOB
Next version (034)

Current Actions:

No	Action
012	AP/SH: Update decimalCalendarScheme 10/9: Not allocated yet 17/9: No update 24/9: Add calendar binary formats to actions 22/10: No progress 16/1: proposal distributed and discussed. Will be redistributed 21/1: add locale, 04/02: changed from locale to specific properties 18/2: Need more investigation of ICU strict/lax behaviour. 08/04: Not discussed 22/04: AP to complete asap once the ICU strict/lax behaviour is understood. 29/04: No progress
020	SH: Resolve packedDecimalSignCodes behaviour depends on NumberCheckPolicy 22/10: No progress 10/12: added how to decide to overpunch and sign position 11/02: proposal largely agreed. SH to make minor changes 18/02: AP to document unsigned type behaviour 25/02: no progress 08/04: Not discussed 22/04: SH to complete last remaining issue, which is the behaviour when logical type is signed/unsigned and the physical type is unsigned/signed. 29/04: SH had identified a problem with definition values and types in the infoset and will email proposal. DG to be asked to accelerate action 032 to see if helps
024	<No owner> String XML type 08/04: Not discussed 22/04: Need to allocate owner. Work is to describe the semantics of using dfdl:representation="xml" to model a well-formed XML fragment in an overall non-XML document described by a DFDL schema. 29/04: As no resource availbel to progress this action agreed to defer from V1. Will close next week if no objections

026	SH: Envelopes and Payloads 08/04: Not discussed explicity, but recursive use of DFDL is tied up with this 22/04: Two aspects. Firstly compositional - do sufficient mechanisms exist to model an envelope with a payload that varies. Secondly markup syntax - this might be defined in the envelope. The second of these is very much tied up with the variable markup action 028, so will be considered there. SH to verify the composition aspect. 29/04: SH and AP working on proposal. related to Action 028
027	SH: Property precedence tables 08/04: Not discussed 22/04: Two things missing from the existing precedence trees. Firstly, does not show alternates (eg, initiator v initiatorkind). Secondly, need a tree per concrete DFDL object (eg, element). SH to update. 29/04: No progress
028	SH: Variable markup 08/04: Discussed briefly at end of call, IBM to see whether there any use cases that require recursive use of DFDL. 15/04: Use case was distributed and will be discussed on next call. 22/04: The use case in question is EDI where the terminating markup for the payload segments is defined in the ISA envelope segment. The markup is modelled as an element of simple type where the allowable markup values are defined as enums on the type. But we need to handle two cases - firstly where the envelope is present, so the value used by the payload is taken from the envelope. Secondly where only the payload is present. Here we need a way of scanning for all the enum values, and adopting the one we actually find, when parsing. And using a default when unparsing. SH to explore use of a DFDL variable, where the variable has a default, but also has a type that is the same as the markup element - that way we get to use the enums without defining everything twice. 29/04: SH and AP working on proposal.
029	MB: valueCalc (output length calculation) 08/04: Not discussed 22/04: Action allocated to MB, this is to complete the work started at the Hursley WG F2F meeting. 29/04: No progress
032	DG: Investigate compatibility between DFDL infoset and XDM 08/04: No update 22/04: No update 29/04: No update
033	AP/TK: Assert/Discriminator semantics. AP to document. TK to check uses of discriminator besides choice. 08/04: In progress within IBM 22/04: Waiting for TK to return from leave to complete. 29/04: TK has sent examples shown need for discriminators beyond choice. Agreed. MB to respond to TK
036	SH: Provide use case for floating component in a sequence 08/04: Raised 15/04: Use case sent and discussed. SH to do further investigation 22/04: IBM feedback from WTX team is that alternate suggested ways of modelling the EDI floating NTE segment have significant usability issues. The DFDL principle is that for a problem that can be expressed as two-layered, then two DFDL models are needed. The EDI NTE segment does not fall into this though, as its use is on a per sequence basis. Ongoing. 29/04: Agreed that need to be in V1. SH to make a proposal
037	All: Approach for XML Schema 1.0 UPA checks. 22/04: Several non-XML models, when expressed in their most obvious DFDL Schema form, would fail XML Schema 1.0 Unique Particle Attribution checks that police model ambiguity. And even re-jigging the model sometimes fails to fix this. Note this is equally applicable to XMl Schema 1.1 and 1.0. While the DFDL parser/unparser can happily resolve the ambiguities, the issue is one of definition. If an XSD editor that implements UPA checks is used to create DFDL Schema, then errors will be flagged. DFDL may have to adopt the position that: a)DFDL parser/unparser will not implement some/all UPA checks (exact checks tbd) b) XML Schema editors that implement UPA checks will not be suitable for all DFDL models c) If DFDL annotations are removed, the resulting pure XSD will not always be valid (ie, the equivalent XML is ambiguous and can't be modelled by XML Schema 1.0) Ongoing in case another solution can be found. 29/04: Will ask DG and S Gao for oppinion before closing
038	MB: Submit response to OMG RFI for non-XML standardization 22/04: First step is for MB to mail the OGF Data Area chair to say that we want to submit 29/04: MB has been in contact with OMG and will sunbit dfdl.
039	SKK: Approach for creating Schema-For-DFDL xsds. 22/04: Resolve issue around multiple declarations needed for DFDL properties, perhaps using MB's meta approach 29/04: Don't like qualified attributes in long form. SKK to check there are no code gen implications, eg EMF.

Alan Powell

MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU