RE: [dfdl-wg] CSV string worked example

1 Mar 2006


      
...
EOS is up for grabs I was thinking of it as a returned value (e.g. 
-1) but an exception might (or might not) be easier to make sense of.
Assuming -1 is a valid value and not used as a real value, etc. - I 
think we need a separate mechanism to indicate the end
...
Regarding the new model. I don't think this is a problem at the 
level of your example. We could simply use a single sequence and a 
more complex "split" conversion. I imagine that the "split" 
conversion we would want to settle on should accept a regular 
expression (or at least a list of separators). In your example you 
just have to allow the separator to be a new line OR a comma and you are done.
Yes but - I want to reuse the simple conversions and therefore keep 
the ability to deal with all the variations you note below without 
special constructs - if I consider this a sequence of two steps I can 
deal with missing separators and terminators differently, if I put 
the two together, I have to account for all the possible variations. 
And I want to handle a data cube the same way without having to wait 
for someone to build a new converter...(we talked this through 
generally before - if you have a way to create new converters from 
existing ones, you could support both ways).
...
A note here this is intended as a rough sketch not a finished 
design. I am expecting the details to need to be worked out here. In 
particular I think Mike/IBM have some fairly complex ideas for 
separator/terminator/initiator/escape that we will have to try to 
seat in this framework.
Thanks,
Martin
----------
From: Jim Myers [mailto:jimmyers@ncsa.uiuc.edu]
Sent: Wednesday, March 01, 2006 3:49 AM
To: Westhead, Martin (Martin); dfdl-wg@ggf.org
Subject: Re: [dfdl-wg] CSV string worked example
Martin - two types of comments - things I think are 
typos/inconsistencies and an alternate logic:
Clarifications:
are the initial definitions on the top element defining an order to 
use subsequently or are they just there for us to see what you've defined?
Of the four there, you only explicitly (in a comment?) invoke one - 
are the others implicit because of the order?
You use dfdl:tokenizer as a conversion later - is that supposed to 
be split as well?
bytetochar is used implicitly before the first split?
chartostring is used implicitly before stringtoint which is 
implicitly used to get the int element?
is EOS a returned value (and therefore of the type being returned) 
or is it an exception?
Logical - what happens if the rows are not in the logical model - 
physically there are 10 rows with 5 elements, but the logical model 
is 50 ints in a single sequence. To support this, you'd need to have 
both tokenization steps in one sequence annotation with two separate 
split separators - does the use of setLocal for split separator work 
in this case? (Is this how byteorder is now used?)
Thinking about missing values - is it clear how a missing row versus 
a missing element is now handled (I think so) - the split conversion 
using comma can define a default input to use if the stream it 
recieves is empty (from a \n\n pair) and the stringtoint conversion 
can do likewise to cover a ,, pair.
Jim
At 09:25 PM 2/28/2006, Westhead, Martin (Martin) wrote:
Hi Folks,
I have tried to work through the CSV example that Mike suggested a 
couple of weeks ago. It has turned up some interesting issues which 
I have tried to address. These are less about making the underlying 
semantics work and more about providing a seamless default set up 
that makes the easy things work just as you would like.
I was pushed for time on this so I apologies if this is unclear in 
places, but I wanted to put it out before tomorrow's meeting.
Thanks,
Martin
James D. Myers
Associate Director, Cyberenvironments and Technologies, NCSA
1205 W. Clark St, MC-257
Urbana, IL 61801
217-244-1934
jimmyers@ncsa.uiuc.edu
James D. Myers
Associate Director, Cyberenvironments and Technologies, NCSA
1205 W. Clark St, MC-257
Urbana, IL 61801
217-244-1934
jimmyers@ncsa.uiuc.edu

Jim Myers

tags

participants (1)