RE: [dfdl-wg] CSV string worked example

EOS is up for grabs I was thinking of it as a returned value (e.g. -1) but an exception might (or might not) be easier to make sense of.
Assuming -1 is a valid value and not used as a real value, etc. - I think we need a separate mechanism to indicate the end
Regarding the new model. I don't think this is a problem at the level of your example. We could simply use a single sequence and a more complex "split" conversion. I imagine that the "split" conversion we would want to settle on should accept a regular expression (or at least a list of separators). In your example you just have to allow the separator to be a new line OR a comma and you are done.
Yes but - I want to reuse the simple conversions and therefore keep the ability to deal with all the variations you note below without special constructs - if I consider this a sequence of two steps I can deal with missing separators and terminators differently, if I put the two together, I have to account for all the possible variations. And I want to handle a data cube the same way without having to wait for someone to build a new converter...(we talked this through generally before - if you have a way to create new converters from existing ones, you could support both ways).
A note here this is intended as a rough sketch not a finished design. I am expecting the details to need to be worked out here. In particular I think Mike/IBM have some fairly complex ideas for separator/terminator/initiator/escape that we will have to try to seat in this framework.
Thanks,
Martin
---------- From: Jim Myers [mailto:jimmyers@ncsa.uiuc.edu] Sent: Wednesday, March 01, 2006 3:49 AM To: Westhead, Martin (Martin); dfdl-wg@ggf.org Subject: Re: [dfdl-wg] CSV string worked example
Martin - two types of comments - things I think are typos/inconsistencies and an alternate logic:
Clarifications: are the initial definitions on the top element defining an order to use subsequently or are they just there for us to see what you've defined? Of the four there, you only explicitly (in a comment?) invoke one - are the others implicit because of the order? You use dfdl:tokenizer as a conversion later - is that supposed to be split as well? bytetochar is used implicitly before the first split? chartostring is used implicitly before stringtoint which is implicitly used to get the int element? is EOS a returned value (and therefore of the type being returned) or is it an exception?
Logical - what happens if the rows are not in the logical model - physically there are 10 rows with 5 elements, but the logical model is 50 ints in a single sequence. To support this, you'd need to have both tokenization steps in one sequence annotation with two separate split separators - does the use of setLocal for split separator work in this case? (Is this how byteorder is now used?) Thinking about missing values - is it clear how a missing row versus a missing element is now handled (I think so) - the split conversion using comma can define a default input to use if the stream it recieves is empty (from a \n\n pair) and the stringtoint conversion can do likewise to cover a ,, pair.
Jim
At 09:25 PM 2/28/2006, Westhead, Martin (Martin) wrote:
Hi Folks,
I have tried to work through the CSV example that Mike suggested a couple of weeks ago. It has turned up some interesting issues which I have tried to address. These are less about making the underlying semantics work and more about providing a seamless default set up that makes the easy things work just as you would like.
I was pushed for time on this so I apologies if this is unclear in places, but I wanted to put it out before tomorrow's meeting.
Thanks,
Martin
James D. Myers Associate Director, Cyberenvironments and Technologies, NCSA 1205 W. Clark St, MC-257 Urbana, IL 61801 217-244-1934 jimmyers@ncsa.uiuc.edu
James D. Myers Associate Director, Cyberenvironments and Technologies, NCSA 1205 W. Clark St, MC-257 Urbana, IL 61801 217-244-1934 jimmyers@ncsa.uiuc.edu
participants (1)
-
Jim Myers