re: Space - a space penalty only occurs if your DFDL implementation actually converts the data into XML. My personal plans for DFDL would do none of that. You would incur zero space penalty. I want to reemphasize here, that the "index attributes" x and y in my example, would take up exactly zero space. They have no representation. Their values are inferred by the positon of the elements of the array.

re: algorithms - DFDL doesn't address APIs for access to data at all. There's nothing stopping someone from making array access appear in a programming language exactly the way it appears in C, Fortran, or Java or any other language today. E.g.,

      Array a = ...getArrayFromDFDL(".../a"); // establish correspondence between Java array 'a', and DFDL-described array reachable via path '..../a'.
     int value = a(5, -2); // retrieve the element at these index locations

If you really want to express transformations "in this markup", i.e., as if the data had been converted to XML, then I'm unclear why XPath/XQuery would make the algorithms particularly ugly. Use of Xpath/Xquery to address elements would be very similar to basic index-oriented access in a programming language.

...mike

Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA



"Robert E. McGrath" <mcgrath@ncsa.uiuc.edu>
Sent by: owner-dfdl-wg@ggf.org

09/06/2005 12:31 PM

To
Mike Beckerle/Worcester/IBM@IBMUS
cc
dfdl-wg@gridforum.org
Subject
Re: Arrays issue - Re: [dfdl-wg] Issues: additional data types





Yes, this is one way to do arrays.

This approach emphasizes the use case where it is important to
access individual elements via XML.

There are two obvious down sides:

  1. space:  this will be >10 times the storage of the actual numbers.
     A big problem for many cases.
  2. array algorithms (e.g., scatter-gather, transpose) do
     block operations which are totally ugly in this markup.

A variant of this might mark up parts of the array, e.g., each row.


Two other general approaches can be considered:

Array as blob:  markup says 'this is an array, laid out like so',
data is a big blob. (Probably this is what Jim is talking about)

Array as external blob:  same as above, except payload is a URL,
e.g., to OpenDAP server where the data is. (Ideal for "virtual datasets")


The memo I was working on tries to lay these options out with the
advantages and disadvantages.

---
Robert E. McGrath
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign
Champaign, Illinois 61820
(217)-333-6549

mcgrath@ncsa.uiuc.edu