I think I've argued before that, while this is a reasonable capability to
have, and I think we should have it, it doesn't really define an array in
the sense of a single atomic type. Rather it is analogous to saying
(though much more useful) that we want a string type and we'll represent
it as <char pos = 1>h</char><char pos=2>i</char>.
With strings, we simply say they are atomic and DFDL will not attempt to
parse/represent their parts. I think we'll want an analogous capability
to talk about an array as an atomic entity described by dimensionality,
element type (e.g. float), etc. I'd like to use that to, for example,
efficiently memory map a block for the whole array.
XDTM does this at the file level as well - it describes a dataset
composed of multiple files in terms of 'atomic' file sub-units that can
be retrieved by the parser. However, it doesn't describe the contents of
the file in terms of further types.
Jim
At 11:48 AM 9/6/2005 -0400, Mike Beckerle wrote:
The need for an
approach to arrays is clear and is acute to many DFDL
constituencies.
The first step in any approach to arrays for DFDL is
an XML model for array data and an XSD for describing it. Then DFDL can
put properties on this.
I suggest the following model. Consider a 2-d case. This
will generalize to N dimensions.
Each axis is named. The array itself is represented as
elements, with attributes used to identify the position of the value on
each axis conceptually like so:
<a x="5"
y="-2">51</a>
That is, you think of each array element as having
attributes identifying its position in the array. Of course DFDL allows
data to be processed without ever creating elements like that, so this is
a conceptual model only, particularly for a dense array.
That element is of an array named 'a', at position x=5,
y=-2, having value 51.
The declaration in XSD would be like this:
<element name="a"
maxOccurs="unbounded">
<complexType>
<extension
base="int">
<simpleContent>
<attribute name="x">
<simpleType>
<restriction base="int">
<maxInclusive value="5"/>
<minInclusive value="-5"/>
</restriction>
</simpleType>
</attribute>
<attribute
name="y">
<simpleType>
<restriction base="int">
<maxInclusive value="10"/>
<minInclusive value="-10"/>
</restriction>
</simpleType>
</attribute>
</simpleContent>
</extension>
</complexType>
</element>
Notice how the ranges of the index values are captured in
XSD by use of the simple type restriction, and can cover arbitrary
sections of the integer space, including negative indices.
DFDL would then provide properties for
1) declaring that 'a' is an array and that 'x' and 'y' are
array indices (and therefore do not have values stored anywhere in the
data).
2) declaring the storage-order of the array. This can be an
ordered list of the dimension names. E.g., "x y" or "y
x" depending on which index changes fastest in the storage
ordering.
Access to elements would be by XPath expressions like this:
..../a[x='5' and y='-2']. Processors would recognize that x and y are
array indices based on DFDL annotations and would thereby recognize
predicates involving the indices and treat them specially. For example,
we could preclude slicing arrays like this: ..../a[x='0'] that is,
where the 'y' axis is unconstrained.
James D. Myers
NCSA, M/C 476-152 CAB
605 East Springfield
Champaign, IL 61820
217-244-1934
jimmyers@ncsa.uiuc.edu