All,
I have attached my first draft of recommendations for the DFDL
Data Model. It is based heavily on XDM and should be mostly compatible. Where
there are incompatibilities, they are by convention and not related to the
actual representation so I believe a system that is set up to work with XDM
should be easily adaptable to work with DFDL. This was important to me –
in my project I would be using DFDL as part of a pipeline process that also
involved transforming the results of parsing using XSLT and then unparsing the
results of transformation. I imagine many other potential users of DFDL will be
engaging in similar XML pipeline scenarios. In these situations, the closer the
DFDL model is to those used by other XML technologies, the better. Hopefully the
recommendation meets the requirements Steve mentions below – it turns out
most of the existing infoset could be mapped directly to XDM concepts. I
introduced a new node type for the unresolvable concept to match the existing
infoset, but discuss how it is related to XDM and other XML technologies.
This exercise brought to major questions to mind:
- Must an instance of the XML Infoset (and XML document) be
valid against a given DFDL Schema (as determined by an XML Schema validation
engine) to be available for unparsing? If not, is it up to the DFDL
implementation to determine the suitability of the XML Infoset Character
Information Items for their given unparsed data type? The real question is: for
the unparsing process can a DFDL Data Model be constructed from an XML Infoset
directly or only from a PVSI (where does the data for unparsing really come
from)? It would seem to me that if the input XML isn't valid against the DFDL
Schema, then it probably can't be unparsable - otherwise, how would the invalid
portions be handled (such as strings that should be numeric or a structure that
doesn't match)?
- I am confused by the notion of the "augmented
infoset". The regular infoset appears to be based on the logical structure
of the data post-parsing. In other words, choices are resolved and the result
looks something like an XML Infoset, PSVI, or XDM tree might following
something like XSLT transformation. The augmented infoset on the other hand
appears to be based on the logical structure of the DFDL Schema being used for
processing and therefore contains branches for all choice possibilities, etc.
It is "filled in" as parsing takes place. This doesn't make a lot of
sense to me - what about the branches for which there was no data to
"fill-in" (such as choice branches that weren't followed)? Are they
dropped following parsing? If not, then there are a lot of information items in
the final tree that have no value. It made more sense to me to consider the
DFDL Data Model as being constructed during parsing and at any given time in
the parsing process a portion of the model (that which has already been parsed)
is available.
Hopefully those questions made sense... I should (finally) be on
the call this Wednesday to discuss.
Dave
From: Steve Hanson
[mailto:smh@uk.ibm.com]
Sent: Wednesday, May 06, 2009 11:15 AM
To: Dave Glick
Cc: Alan Powell; dfdl-wg@ogf.org
Subject: Re: [DFDL-WG] Agenda for OGF WG call 6 May 2009
Dave
Two intents of
the infoset was that it should be a) simple and b) easily related to the
grammar in 11.3, so whatever you come up with needs to take those requirements
into account.
"Parts of XDM that
have no relevance to DFDL but are also not conflicting should probably be left
in for conciseness and compatibility." - a) above would imply the opposite.
The XDM spec
defines the rules for how an XDM can be created from an XML Infoset or a PSVI.
We can do a similar exercise for DFDL Infoset, for those users who want
to use XSL for any post-DFDL transformation.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Dave
Glick <dglick@dracorp.com> 06/05/2009
13:55 |
|
All,
My apologies,
but I will be unable to make the call again this week. I was hoping to have
some suggestions regarding the infoset/data model for discussion today, but
it's not quite ready (I still have a little more digging to do through the rest
of the spec to make sure what I'm suggesting can adequately capture all the
representation cases in DFDL). I'll try to get something out by the end of the
day for review and discussion on next week's call.
In general, it
appears to me (and I'm admittedly not as versed in the various XML standards as
the other members of the group) that we can bring the DFDL Infoset very closely
in line with the XDM. Specifically, I've been looking at the way XSLT 2.0
treats XDM as it's data model. It states clearly that XDM is the model for XSLT
with certain explicit caveats and additions. This follows the XDM guidance of
how it should be used by other standards (specifically in XDM Section 7 and
Appendix A). The task for DFDL therefore consists of two parts: what parts of
the XDM are in conflict with DFDL and should be explicitly excluded, and what
parts of DFDL have no corresponding support in XDM and should be appended.
Parts of XDM that have no relevance to DFDL but are also not conflicting should
probably be left in for conciseness and compatibility.
My biggest
concern is over the use of two different types of Element Information Items in
the DFDL specification as this seems so contrary to convention in XDM. My
recommendations include treating all element nodes similarly to XDM as complex
and those element nodes that actually only contain simple content should have a
single child of the XDM text node type or a new DFDL value node type (not sure
the best way to go here).
In any case,
I'll pass along a full recommendation soon.
Dave
From:
dfdl-wg-bounces@ogf.org [dfdl-wg-bounces@ogf.org] On Behalf Of Alan Powell
[alan_powell@uk.ibm.com]
Sent: Wednesday, May 06, 2009 6:01 AM
To: dfdl-wg@ogf.org
Subject: [DFDL-WG] Agenda for OGF WG call 6 May 2009
Agenda:
1. Go through actions.
2. LengthKind on Sequences and choices.
LengthKind on sequences and choices and their parent element has proved
confusing to new users of DFDL. It is proposed that lengthKind is removed from
groups and only allow it to be set on parent element. See email from SH
3. Discuss UnorderedInitated email from SH
4. Infoset codepage and encoding
The spec does not say what codepage and encoding is used for string fields.
5. AOB
Next version (034)
Current Actions:
No |
Action
|
012 |
AP/SH:
Update decimalCalendarScheme |
020 |
SH:
Resolve packedDecimalSignCodes behaviour depends on NumberCheckPolicy |
024 |
<No
owner> String XML type |
026 |
SH:
Envelopes and Payloads |
027 |
SH:
Property precedence tables |
028 |
SH:
Variable markup |
029 |
MB:
valueCalc (output length calculation) |
032 |
DG:
Investigate compatibility between DFDL infoset and XDM |
033 |
AP/TK:
Assert/Discriminator semantics. AP to document. TK to check uses of
discriminator besides choice. |
036 |
SH:
Provide use case for floating component in a sequence |
037 |
All:
Approach for XML Schema 1.0 UPA checks. |
038 |
MB:
Submit response to OMG RFI for non-XML standardization |
039 |
SKK:
Approach for creating Schema-For-DFDL xsds. |
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
Unless
stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless
stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless
stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU