
Hi Bob, Here is my list of ways in which DFDL could be used - I have probably missed some but here's enough to kick off with. If you need more details on any of the real-world examples let me know. Cheers, Martin Description ----------- In QCD physics independent research groups from all over the world have data which is always a 4d array of floating point values. However, different groups have different standards for precision, dimension order, byte order. It would be useful for them to have a simple, canonical XML language for describing the format of a file. In the first instance this need only be human readable. Archiving --------- Data needs to be stored, but the programs and systems for reading it become obsolete. DFDL provides a valuable possibility of describing all the details of a particular format so that even if there were no programs able to read a format, the description (and the standard) would provide sufficient information to access archived data. (There are lots of examples of this type they might include atmospheric measurement). A sophistication of this is that archived data may need to be transformed as it is moved to up to data physical media (changes of precision) etc. It would be nice if DFDL could (a) annotate these changes (b) (perhaps) be used to ensure that the changes did not result in data loss. Format abstraction ------------------ At the simplest level the QCD physicists (described above) would like to be able to have a single API that would allow them to read any described piece of data, and carry out all the transformations required to ensure that they get the correct array in memory. I have examples of potential users who are just interested in describing byte-order in a standard way. At the next level we would like to supply a high level DFDL description that captures a standard view of the data, and have generic DFDL logic that can transform an existing DFDL-described format into this generic view. This is one of the primary motivations for "layers" in the standard. It is a very powerful feature but it introduces scoping issues: What transformations can DFDL not describe? (also what transformations can DFDL not describe efficiently). Generic data access ------------------- A DFDL library should provide the ability to interrogate a data description and read all aspects of the data into memory. An example of a generic tool is a browser that will allow arbitrary DFDL-described data to be displayed in some sensible human-readable form. This case requires the standard to specify an API for reading and interrogating the data. The favoured suggestion for this is to extend DOM/SAX to allow the reading of data fields directly into in-memory types (float, int, char etc.) Data queries ------------ The DFDL description implies an associated XML document. This document can be queried using XPath/XQuery to extract pieces of data. [Note: If the data comes back as an XML-XPath result then this process is straight forward. With BinX we tried to return the data in a similar format to the one it is represented in with an accompanying description. We found a number of issues arose in this case that may or may not also arise for DFDL]. Data annotations ---------------- The same XPath/XQuery expressions that can be used to query a document can provide external (format independent) annotations. For example NASA stores photographic images of hurricanes. A scientist can identify a blob of pixels that correspond to the hurricane in an image. They could like to store this annotation is such a way that the will be preserved through future transformations (e.g. new image format, or different pixel depth, or compression level). Note the point here is that a byte offset into the image data cannot do this. XML without the tags -------------------- There are groups who would like to use DFDL as a sort of cheap data compression technique. An example here is particle physics collision data. This is stored as a set of sparse (hence variable sized) trees of results. The data is richly structured trees and they would like to access it and talk about it as if it were in XML but they don't want to (cannot afford to) represent it using XML markup or use conventional XML tools to parse it. The idea is that such a group would design a new binary format that could be described in XML and then they would work with the implied XML data. Note: naturally these folks do not want to access their floating point values as strings so they would want the sort of DOM extensions that we alluded to earlier. For this same reason things like Binary XML do not solve their problem. Another example comes from the astronomy community has recently moved from a long-standing binary data format (FITS) to an XML version (VOTable). FITS was very rich in metadata but also included binary images and large tables of observational data representations. VOTable is great for capturing the metadata in a standard way but leads to excessive bloat for images and large tables. The community has ended up with a complicated compromise in which they allow raw binary data in at the bottom of the XML file. A DFDL-described format could provide a cleaner solution.