RE: [dfdl-wg] How to handle multi-dimensional arrays - version 2

I sort-of agree :-) I think the distinctions I'm making are subtle, but important with repect to composability/layers, but don't shift what you can do in the multidimensional array case from where you're trying to go. And if that doesn't make your eyes cross and cause fits, read on... Why haven't you haven't felt it necessary to define an XSD for vectors beyond putting dfdl:runtimeoccurs limits on how many to pull from a stream? In this case, the runtimeoccurs param is a param of the reader that populates a 'normal' XSD sequence with 'normal' XSD elements. For multidimensional arrays, the runtimeoccurs parameters for each dimension are now becoming part of the model rather than parameters of the reader. I don't know if I like that, but, if we do it, why not do it everywhere and make, for example, dfdl:byteorder an attribute on all ints and floats that are read? Of course, a byteorder attribute would only be available if you actually came from a binary stream, which may be defined elsewhere (some enclosing node, another layer). To me, any of this starts to mix the model and the method used to read the model, which gets back to the issue of how independent are readers/ does creating a new reader imply creating a new subtype, etc. I guess I'd rather see the concept of multidimensional arrays as follows: there is not, in fact a multidimensional array on disk/stream, just a serialized sequence of ints/floats/whatever. But, to assist the user in interpreting this flat data as a multidimensional array, we want DFDL to make index info available and, rather than just making a single cursor count available and requiring users to do math to have indexes that don't start at 0 (or 1 - whatever) or support multiple dimensions, we provide some convenience mechanisms that can report an index or indexes that cycle from user defined mins and maxes as elements are read, which can be used to decorate elements with attributes or be used in conditional logic, etc. This would preserve the separation of reader from model, at the expense of saying that indexes like this are different/ are not like all the dfdl reader parameters that might be in the current context.
What makes all this confusing for DFDL is that we have some representations that are complex enough to need layered multi-step descriptions, and once you have that, there's no stopping you from using it to do all sorts of transformation from one format to another. So it feels like you can have your cake and eat it too, which is to say you can pick your XML Schema and populate it from quite differently structured data. And that is probably true, but at the bottom level of the stack of layers you have to have a vocabulary and model for directly describing the structure of the data so as to get the whole ball rolling. And at this bottom layer, the needs of describing the data format completely dictate what the schema is like.
I would solve this by just saying that, at the bottom layer, there are no single or multidimensional arrays, just sequences of base types, and that any concept of dimensions is fabrication created by the user (a very common and convenient one we might want special support for...). The only reason I think we would need a multidimensional array type in DFDL is if we wanted to directly read m*n bytes and create a single XML element representing the entire array that would then have some accessor methods to get a value for a particular x,y offset pair. I'm not sure what kind of analogy will make sense to people here, but I see a similar argument for floats from strings: if you want to create a float from a sequence of characters, you need a float type. If you just want to prescribe a standard way to model a sequence of characters representing a float so that we can consistently label the mantissa and exponent chars regardless of storage order, you're not really defining a new float type in XSD. Instead, your exposing the semantics created/inferred by the reader as standardized annotations of the existing char (or string) type (with the annotations being potential or required depending on whether you let me shut them off or not). Jim
participants (1)
-
Myers, James D