- FYI: The paper below describes the XML Data Type and Mapping language
which has significant overlap with DFDL, as well as some misconceptions
about what DFDL currently is. In practice, XDTM is more focused on
describing how files are organized in directories at present with the
overall goal of describing 'data collections'. Since DFDL can describe
input from multiple streams and could easily be extended to leave files
as blobs (versus being described further), I think it could do/could be
extended to do what XDTM wants.
- Some other interesting points:
- XDTM rejects annotating schema and wants to have a separate mapping
doc that uses xpath to connect to the schema. In doing so, it also
appears to go back towards describing individual data collections rather
than the reusable format/layout of a data collection.
- The XDTM paper talks about mapping from databases and via
computations, but no detail is given. I don't see any concept of layers,
intermediate data sets, etc.
- Jim
-
http://www.ecs.soton.ac.uk/~lavm/papers/egc05.pdf
- Luc Moreau, Yong Zhao, Ian Foster, Jens Voeckler, and Michael Wilde.
XDTM: the XML Dataset Typing and Mapping for Specifying Datasets. In
Proceedings of the 2005 European Grid Conference (EGC'05), Amsterdam,
Nederlands, February 2005.
- [
bib |
.pdf ]
- We are concerned with the following problem: How do we
allow a community of users to access and process diverse data stored in
many different formats? Standard data formats and data access APIs can
help but are not general solutions because of their assumption of
homogeneity. We propose a new approach based on a separation of concerns
between logical and physical structure. We use XML Schema as a type
system for expressing the logical structure of datasets and define a
separate notion of a mapping that combines declarative and procedural
elements to describe physical representations. For example, a collection
of environmental data might be mapped variously to a set of files, a
relational database, or a spreadsheet but can look the same in all three
cases to a user or program that accesses the data via its logical
structure. This separation of concerns allows us to specify workflows
that operate over complex datasets with, for example, selector constructs
being used to select and initiate computations on sets of dataset
elements-regardless of whether the sets in question are files in a
directory, tables in a database, or columns in a spreadsheet. We present
the XDTM design and also the results of application experiments with an
XDTM prototype.