Jonathan

Yes that's the principle, but it goes further than that. The DFDL infoset is typed, whereas the XML infoset isn't. The results of parsing a DFDL-described document and then applying DFDL validation to the resultant DFDL infoset is the same as parsing the equivalent XML document and applying XML Schema 1.0 validation, where 'results' means 'the validation errors that are detected'.

This principle has guided the WG in the design of several features. To hide elements from the DFDL infoset requires the use of a 'hidden group' - simply doing the obvious thing and adding a dfdl:hidden property to an element would break the principle. To get an assert to fail without throwing a processing error meant inventing recoverable errors - re-using validation errors would break the principle.

For you inspection and sanitization capability, I would recommend looking at XDM, the model used by XPath 2.0, XSLT 2.0 and XQuery. I think this is the natural higher-level model to adopt for a common DFDL and XML framework. I created this OGF document to describe how to map DFDL infoset to/from XDM. http://redmine.ogf.org/dmsf_files/8111?download=.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848




From:        "Cranford, Jonathan W." <jcranford@mitre.org>
To:        Steve Hanson/UK/IBM@IBMGB,
Cc:        "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>
Date:        23/07/2013 18:23
Subject:        could you clarify statement made on the call today




Steve,

Could you clarify a statement made during today's DFDL WG call?

I didn't quite catch the whole statement, but it sounded like you were saying a design goal of the WG was that the result of parsing a binary format using DFDL would result in a DFDL infoset roughly equivalent to the XML infoset obtained by parsing the same data in an XML format.  I don't think I quite captured that correctly, but it sounds like an important point, and I'd like to understand it further.

For context, I've been asked to look at building an inspection and sanitization capability on top of DFDL, so I'm weighing the differences between DFDL Infoset and XML Infoset at the moment, and your comparison caught my attention.

Thanks in advance,

--
Jonathan W. Cranford
Senior Information Systems Engineer
The MITRE Corporation (
http://www.mitre.org)




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU