RE: [dfdl-wg] How to handle multi-dimensional arrays - version 2

Here's a slightly different formulation of the multi-dimension stuff.
1) no longer dictates the XSD for representing the array. This cuts both ways since you no longer really have an XSD model for multi-dimensional arrays. That is. It is up to the author of the DFDL Schema to insure the needed information about the array (coordinates of each element) make it in to the logical model in a useful way.
I didn't realize we were proposing to extend the XML schema to have a multidimensional array type, versus providing a way for DFDL to read and internally represent a multidimensional array. The latter seems descriptive and the former prescriptive.
2) I added in the complexity of calculating the array size, actually the lower and upper bounds of each dimension, dynamically based on data. This makes the example more real.
This still works out pretty well. I'm still pondering whether I like this better or not. I'm thinking about perhaps some sort of pseudo attributes which are guaranteed to be put into XML if you actually render to XML, but where a DFDL API-based implementation can choose not to realize them.
I think this example removes some of the prescriptive nature of the first one, but I'd like to be able to format my array however I want, e.g. as <row><elem>3</elem><elem>2</elem></row> <row><elem>5</elem><elem>6</elem></row> ... Or even <states><state>Alabama</state><state>Alaska</state></states> <population><pop>34.2</pop><pop>10.6</pop></population> .... (an array containing state names, population and other data, perhaps serialized in the file as all info for each state together). If DFDL could separate the reading of such an array from how it is output in the schema, I could do any of this. Having multiple layers is a start - DFDL reads the array in to something that is addressable along the lines Mike proposes and then the contents of that layer are referenced via xpath to provide values in some structure I define in XSD. The only piece missing (I think) is that we haven't yet defined how to access iterators, i.e. if I have an element <elem minoccurs="1" maxoccurs = "5"> , how can I say that element n (n = 1...5) has dfdl:runtimevalue <a x="n" y="1">, which would put just the first column of a into the element sequence. If, in Mike's example, I could define the x and y dimensions independent of an array-reading context, just so I can use them in value references for dfdl:runtimevalue elements, I think we'd be all set. This type of capability would allow all sorts of useful things - including the array to set of vectors conversions outlined here as well as subsampling, expansion/contraction of sprase arrays (where the array is stored as a sequence of x,y, value triples for only nonzero elements), etc. One other minor point - if the order of x and y in the DFDL file is important (as it is in the example), do we need a <dfdl:array storageOrder="firstDimensionChangesFirst"> option? OR can we just list y first and then x? Jim
participants (1)
-
Myers, James D