Tel:
781-810-2125 |
All,
I
have attached my first draft of recommendations for the DFDL Data Model. It is
based heavily on XDM and should be mostly compatible. Where there are
incompatibilities, they are by convention and not related to the actual
representation so I believe a system that is set up to work with XDM should be
easily adaptable to work with DFDL. This was important to me – in my project I
would be using DFDL as part of a pipeline process that also involved
transforming the results of parsing using XSLT and then unparsing the results of
transformation. I imagine many other potential users of DFDL will be engaging in
similar XML pipeline scenarios. In these situations, the closer the DFDL model
is to those used by other XML technologies, the better. Hopefully the
recommendation meets the requirements Steve mentions below – it turns out most
of the existing infoset could be mapped directly to XDM concepts. I introduced a
new node type for the unresolvable concept to match the existing infoset, but
discuss how it is related to XDM and other XML
technologies.
This
exercise brought to major questions to mind:
-
Must an instance of the XML Infoset (and XML document) be valid against a given
DFDL Schema (as determined by an XML Schema validation engine) to be available
for unparsing? If not, is it up to the DFDL implementation to determine the
suitability of the XML Infoset Character Information Items for their given
unparsed data type? The real question is: for the unparsing process can a DFDL
Data Model be constructed from an XML Infoset directly or only from a PVSI
(where does the data for unparsing really come from)? It would seem to me that
if the input XML isn't valid against the DFDL Schema, then it probably can't be
unparsable - otherwise, how would the invalid portions be handled (such as
strings that should be numeric or a structure that doesn't
match)?
- I
am confused by the notion of the "augmented infoset". The regular infoset
appears to be based on the logical structure of the data post-parsing. In other
words, choices are resolved and the result looks something like an XML Infoset,
PSVI, or XDM tree might following something like XSLT transformation. The
augmented infoset on the other hand appears to be based on the logical structure
of the DFDL Schema being used for processing and therefore contains branches for
all choice possibilities, etc. It is "filled in" as parsing takes place. This
doesn't make a lot of sense to me - what about the branches for which there was
no data to "fill-in" (such as choice branches that weren't followed)? Are they
dropped following parsing? If not, then there are a lot of information items in
the final tree that have no value. It made more sense to me to consider the DFDL
Data Model as being constructed during parsing and at any given time in the
parsing process a portion of the model (that which has already been parsed) is
available.
Hopefully
those questions made sense... I should (finally) be on the call this Wednesday
to discuss.
Dave
From: Steve Hanson
[mailto:smh@uk.ibm.com]
Sent: Wednesday, May 06, 2009 11:15
AM
To: Dave Glick
Cc: Alan Powell;
dfdl-wg@ogf.org
Subject: Re: [DFDL-WG] Agenda for OGF WG call 6 May
2009
Dave
Two
intents of the infoset was that it should be a) simple and b) easily related to
the grammar in 11.3, so whatever you come up with needs to take those
requirements into account.
"Parts of XDM that
have no relevance to DFDL but are also not conflicting should probably be left
in for conciseness and compatibility." - a) above would
imply the opposite.
The XDM spec defines
the rules for how an XDM can be created from an XML Infoset or a PSVI. We
can do a similar exercise for DFDL Infoset, for those users who want to use XSL
for any post-DFDL transformation.
Regards
Steve
Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley,
UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Dave Glick
<dglick@dracorp.com>
06/05/2009
13:55 |
|
All,
My apologies, but I
will be unable to make the call again this week. I was hoping to have some
suggestions regarding the infoset/data model for discussion today, but it's not
quite ready (I still have a little more digging to do through the rest of the
spec to make sure what I'm suggesting can adequately capture all the
representation cases in DFDL). I'll try to get something out by the end of the
day for review and discussion on next week's call.
In general, it
appears to me (and I'm admittedly not as versed in the various XML standards as
the other members of the group) that we can bring the DFDL Infoset very closely
in line with the XDM. Specifically, I've been looking at the way XSLT 2.0 treats
XDM as it's data model. It states clearly that XDM is the model for XSLT with
certain explicit caveats and additions. This follows the XDM guidance of how it
should be used by other standards (specifically in XDM Section 7 and Appendix
A). The task for DFDL therefore consists of two parts: what parts of the XDM are
in conflict with DFDL and should be explicitly excluded, and what parts of DFDL
have no corresponding support in XDM and should be appended. Parts of XDM that
have no relevance to DFDL but are also not conflicting should probably be left
in for conciseness and compatibility.
My biggest concern
is over the use of two different types of Element Information Items in the DFDL
specification as this seems so contrary to convention in XDM. My recommendations
include treating all element nodes similarly to XDM as complex and those element
nodes that actually only contain simple content should have a single child of
the XDM text node type or a new DFDL value node type (not sure the best way to
go here).
In any case, I'll
pass along a full recommendation soon.
Dave
From:
dfdl-wg-bounces@ogf.org [dfdl-wg-bounces@ogf.org] On Behalf Of Alan Powell
[alan_powell@uk.ibm.com]
Sent: Wednesday, May 06, 2009 6:01
AM
To: dfdl-wg@ogf.org
Subject: [DFDL-WG] Agenda for OGF WG
call 6 May 2009
Agenda:
1. Go
through actions.
2. LengthKind
on Sequences and choices.
LengthKind on
sequences and choices and their parent element has proved confusing to new users
of DFDL. It is proposed that lengthKind is removed from groups and only allow it
to be set on parent element. See email from SH
3. Discuss
UnorderedInitated email from SH
4. Infoset
codepage and encoding
The spec does
not say what codepage and encoding is used for string fields.
5. AOB
Next
version (034)
Current Actions:
No |
Action
|
012 |
AP/SH: Update
decimalCalendarScheme |
020 |
SH: Resolve
packedDecimalSignCodes behaviour depends on NumberCheckPolicy
|
024 |
<No
owner> String XML type |
026 |
SH: Envelopes
and Payloads |
027 |
SH: Property
precedence tables |
028 |
SH: Variable
markup |
029 |
MB: valueCalc
(output length calculation) |
032 |
DG: Investigate
compatibility between DFDL infoset and XDM |
033 |
AP/TK:
Assert/Discriminator semantics. AP to document. TK to check uses of
discriminator besides choice. |
036 |
SH: Provide use
case for floating component in a sequence |
037 |
All: Approach
for XML Schema 1.0 UPA checks. |
038 |
MB: Submit
response to OMG RFI for non-XML standardization |
039 |
SKK: Approach
for creating Schema-For-DFDL xsds. |
Alan
Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN,
England
Notes Id: Alan Powell/UK/IBM email:
alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962
816898
Unless stated
otherwise above:
IBM United Kingdom Limited - Registered in England and Wales
with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth,
Hampshire PO6 3AU
--
dfdl-wg
mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated
otherwise above:
IBM United Kingdom Limited - Registered in England and Wales
with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth,
Hampshire PO6 3AU
--
dfdl-wg
mailing
list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless
stated otherwise above:
IBM United Kingdom Limited - Registered in England
and Wales with number 741598.
Registered office: PO Box 41, North Harbour,
Portsmouth, Hampshire PO6 3AU