re: Space - a space penalty only occurs
if your DFDL implementation actually converts the data into XML. My personal
plans for DFDL would do none of that. You would incur zero space penalty.
I want to reemphasize here, that the "index attributes" x and
y in my example, would take up exactly zero space. They have no representation.
Their values are inferred by the positon of the elements of the array.
re: algorithms - DFDL doesn't address
APIs for access to data at all. There's nothing stopping someone from making
array access appear in a programming language exactly the way it appears
in C, Fortran, or Java or any other language today. E.g.,
Array a = ...getArrayFromDFDL(".../a");
// establish correspondence between Java array 'a', and DFDL-described
array reachable via path '..../a'.
int value = a(5,
-2); // retrieve the element at these index locations
If you really want to express transformations
"in this markup", i.e., as if the data had been converted to
XML, then I'm unclear why XPath/XQuery would make the algorithms particularly
ugly. Use of Xpath/Xquery to address elements would be very similar to
basic index-oriented access in a programming language.
...mike
Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA
"Robert E. McGrath"
<mcgrath@ncsa.uiuc.edu> Sent by: owner-dfdl-wg@ggf.org
09/06/2005 12:31 PM
To
Mike Beckerle/Worcester/IBM@IBMUS
cc
dfdl-wg@gridforum.org
Subject
Re: Arrays issue - Re: [dfdl-wg]
Issues: additional data types
Yes, this is one way to do arrays.
This approach emphasizes the use case where it is important to
access individual elements via XML.
There are two obvious down sides:
1. space: this will be >10 times the storage of the actual
numbers.
A big problem for many cases.
2. array algorithms (e.g., scatter-gather, transpose) do
block operations which are totally ugly in this markup.
A variant of this might mark up parts of the array, e.g., each row.
Two other general approaches can be considered:
Array as blob: markup says 'this is an array, laid out like so',
data is a big blob. (Probably this is what Jim is talking about)
Array as external blob: same as above, except payload is a URL,
e.g., to OpenDAP server where the data is. (Ideal for "virtual datasets")
The memo I was working on tries to lay these options out with the
advantages and disadvantages.
---
Robert E. McGrath
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign
Champaign, Illinois 61820
(217)-333-6549