[DFDL-WG] Action 020 - completed

8 Jun 2009

      Firstly, there had been a debate about what simple types should appear in 
the infoset.  Various proposals below.  We've decided not to go with any 
of those, but to stick with existing infoset behaviour (reprod). It means 
that the DFDL parser will do enough to convert input data to the (nearest) 
schema built-in type. This gives better interop with XML Schema (eg, 
ecore) based trees. There is an important implication when speculatively 
parsing - the parser will use the schema built-in type to distinguish 
data, but will not use user-defined restrictions.

        [datatype] String. The name of the XML Schema 1.0 built-in simple 
type to which the value corresponds. DFDL supports a subset of these types 
listed in the specification at section 4.1.
        [dataValue] The value in the value space of the [datatype] member 
or special value nil.

Secondly, given the above decision, we can complete action 020. On parse, 
if the physical data can not be handled by the logical type, it is a 
processing error. On unparsing, data must conform to the infoset type, by 
definition. 

Logical type
textNumberRepresentation=
text (4)
textNumberRepresentation=
zoned (2) (6)
binaryNumberRepresentation=
packed (5)
binaryNumberRepresentation=
bcd (1)
binaryNumberRepresentation=
binary
Signed (decimal, integer, and user restrictions thereof)
Parse: OK 
Unparse: OK
Parse: Unpunched data => +ve
Unparse: Data always punched with sign
Parse: Unsigned nibble => +ve
Unparse: Data signed as per +ve/-ve nibble specifiers, unsigned nibble 
specifier never used
Parse: Data always +ve
Unparse: -ve data is processing error 
N/A
Signed (long, int, short, byte, and user restrictions thereof)
Ditto
Ditto
Ditto
Ditto
Parse: Data assumed 2's complement binary
Unparse: Data output as 2's complement binary
Unsigned (unsigned long, unsigned int, unsigned short, unsigned byte, and 
user restrictions thereof) 
(3)
Parse: +ve data => OK, -ve data is processing error
Unparse: Data output according to pattern
Parse: +ve punched data => OK, -ve data is processing error
Unparse: Data never punched with sign
Parse: +ve nibble & unsigned nibble => OK, -ve nibble is processing error
Unparse: Unsigned nibble specifier always used
Parse: OK
Unparse: OK
Parse: Data assumed unsigned binary
Unparse: Data output as unsigned binary

Notes
(1)  Can not physically carry a sign
(2) Some systems omit to punch for +ve, but accept punched on input (eg, 
IBM iSeries)
(3) Assumes that on unparsing, the infoset can not present a -ve value
(4) The -ve sign is indicated by numberPattern property
(5) The exact sign nibbles are given by the packedDecimalSignCodes 
property
(6) The punching style to use is given by the numberZonedSignStyle 
property

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 08/06/2009 11:27 -----

Steve Hanson/UK/IBM 
13/05/2009 13:25

To
Dave Glick
cc
dfdl-wg@ogf.org
Subject
Fw: Action 020 completion

Hi Dave

We should also bear the problem below in mind when thinking about DFDL 
Infoset & XDM.  XDM assumes that an element with a concrete type-name has 
a typed-value conforming to the type-name, ie, it has been 'validated'. If 
this is not the case then the type-name is set to xs:untyped or 
xs:untypedAtomic (extra types added to XDM for this purpose).  In DFDL 
Infoset we had been assuming that the [dataType] would be set to that 
implied by the DFDL xsd, regardless of whether validation succeeded or not 
- though there are issues with this as explained below. 

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 13/05/2009 12:51 -----

Steve Hanson/UK/IBM 
29/04/2009 15:54

To
Alan Powell, Dave Glick, Mike Beckerle (Work)
cc
Suman Kalia/Toronto/IBM@IBMCA, Tim Kimber/UK/IBM
Subject
Fw: Action 020 completion

Hi Dave

We discussed this on the call and agreed that the unsigned types are just 
range restrictions so treating negative numbers in the data as parse 
errors instead of validation errors seemed inconsistent.

Various options discussed:

1) Remove the unsigned types altogether.
- Means we'd need an extra property to describe binary integer 
representation, as we could no longer infer the rep from the logical type
- Loses type information for applications where the fact that data is 
unsigned is important.
- Means DFDL modelers would have to create their own duplicate 
restrictions for common C etc data types.

2) Change [dataType] to point to the XML Schema primitive type instead of 
the XML Schema built-in type.
- Means that the value and the type would be xs:decimal which is too 
general

3) Change [dataValue] to say "The value in the value space of the 
underlying XML Schema primitive type forthe [datatype] member or special 
value nil"
- Allows the infoset to carry integer data that is invalid due to range 
regardless of value.
- Means that the value would be a decimal even though the data type was 
(say) xs:unsignedLong, ie, the datatype and datavalue are no longer in 
step unless validated

4) Option 2) with the modification that the primitive type for all integer 
types was xs:integer and not xs:decimal.

5) Option  3) with the modification that the primitive type for all 
integer types was xs:integer and not xs:decimal.

We agreed not to close on this until you had reported back on your action 
032.

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 29/04/2009 15:32 -----

Steve Hanson/UK/IBM
27/04/2009 13:42

To
Alan Powell/UK/IBM
cc
dglick@dracorp.com, mbeckerle@oco-inc.com
Subject
Re: Action 020 completion

Thanks Alan

I've added correct property names, see below. But I've omitted floats 
deliberately for clarity, the logical type is always signed and physical 
type is always signed, so there's no issue.

However, there is a problem with what I have stated, as you pointed out on 
Friday.  On parsing I am effectively validating the input data, but on 
unparsing I am not assuming the data has been validated.  This is not 
consistent and needs correcting.

But as I looked into this, I realised we have a problem with how we have 
described the DFDL infoset. The spec says "There is no requirement for 
DFDL-described data to be valid in order to have a DFDL information set.", 
which is in accordance with our agreed position on validation being 
optional. But further on it also says:

        [datatype] String. The name of the XML Schema 1.0 built-in simple 
type to which the value corresponds. DFDL supports a subset of these types 
listed in the specification at section 4.1.
        [dataValue] The value in the value space of the [datatype] member 
or special value nil.
This says to me that the DFDL parser must have done enough validation to 
ascertain that the value matched the underlying built-in type. For 
example, I have a user-defined simple type that adds a max/min range of 
+100-+200 to an xs:unsignedInt. If the input data has value 99, the value 
will be accepted into the infoset, but will not validate if validation is 
switched on. If the input data is a packed decimal with value -1, the 
value will not be accepted into the infoset. Given that xs:unsignedInt is 
itself just a range restriction of xs:integer (via xs:nonNegativeInteger), 
this seems a bit arbitrary. 

Dave - given your action item looking at DFDL Infoset versus XDM, I'd be 
interested in your opinion here.

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848

Alan Powell/UK/IBM
24/04/2009 15:13

To
Steve Hanson/UK/IBM@IBMGB
cc
dglick@dracorp.com, mbeckerle@oco-inc.com
Subject
Re: Action 020 completion

Steve

Looks OK

But can you use the correct property name eg binaryNumberRepresentation 
and for completeness add binaryFloatRepresentation (even though it may be 
obvious)

Alan Powell

 MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
 Notes Id: Alan Powell/UK/IBM     email: alan_powell@uk.ibm.com 
 Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898

From:
Steve Hanson/UK/IBM
To:
mbeckerle@oco-inc.com, Alan Powell/UK/IBM, dglick@dracorp.com
Date:
23/04/2009 12:50
Subject:
Action 020 completion

Here's my proposal for the behaviour when a logical type is signed and the 
physical data has no sign (either because it is not capable of carrying a 
sign, or it carries an unsigned indicator), and when the logical type is 
unsigned and the physical data has a sign.  The principle is to be 
flexible and only to give errors when things are clearly mis-matched

Logical type
textNumberRepresentation=
text (4)
textNumberRepresentation=
zoned (2) (6)
binaryNumberRepresentation=
packed (5)
binaryNumberRepresentation=
bcd (1)
binaryNumberRepresentation=
binary
Signed (decimal, integer)
Parse: OK 
Unparse: OK
Parse: Unsigned data => +ve
Unparse: Data always punched with sign
Parse: Unsigned data => +ve
Unparse: Data signed as per +ve/-ve nibble specifiers, unsigned nibble 
specifier never used
Parse: Data always +ve
Unparse: -ve data is error 
N/A
Signed (long, int, short, byte)
Ditto
Ditto
Ditto
Ditto
Parse: Data assumed 2's complement binary
Unparse: Data output as 2's complement binary
Unsigned (unsigned long, unsigned int, unsigned short, unsigned byte) 
(3)
Parse: -ve data is error
Unparse: -ve data is error
Parse: +ve data => OK, -ve data is error
Unparse: Sign never punched, -ve data is error
Parse: +ve data => OK, -ve data is error
Unparse: Unsigned nibble specifier always used, -ve data is error
Parse: OK
Unparse: -ve data is error
Parse: Data assumed unsigned binary
Unparse: Data output as unsigned binary

(1)  Can not physically carry a sign
(2) Some systems output unsigned for +ve, but accept +ve on input (eg, IBM 
iSeries)
(3) Assumes that on unparsing, the infoset could still present a -ve value
(4) The -ve sign is indicated by numberPattern property
(5) The exact sign nibbles are given by the packedDecimalSignCodes 
property
(6) The punching style to use is given by the numberZonedSignStyle 
property

Mail back any comments before next week's call.

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 23/04/2009 12:04 -----

Steve Hanson/UK/IBM 
18/02/2009 16:55

To
Mike Beckerle (Work)
cc
Alan Powell/UK/IBM, dglick@dracorp.com
Subject
DFDL: Packed & zoned decimals - more thoughts (was Action 020)

Hi Mike

While we are on the subject of how to handle signs, the spec does not 
fully define what happens for a number if the logical type is unsigned. We 
need to say what is expected in the physical data and what happens if the 
data contains a sign. For example, we say that for an unsigned integer, if 
the rep is binary then we treat the data as 'unsigned binary' and not twos 
complement. And we say that BCD is only allowed for unsigned logical 
types. That is good. But we don't do the same for packed, text, zoned. I 
think we need to say that no explicit sign is expected in the data (eg, 
packed should have only F or 0, no A,B,C,D) and if it does:

Alternatives:
i) Error 
ii) Positive sign discarded, negative sign gives error
iii) Sign discarded
iv) As per i) if 'strict' set, as per ii)  if 'lax' set
v) As per i) if 'strict' set, as per iii)  if 'lax' set

Personally I vote for i)

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 18/02/2009 16:41 -----

Steve Hanson/UK/IBM
12/02/2009 13:18

To
<mbeckerle.dfdl@gmail.com>
cc
dfdl-wg@ogf.org
Subject
RE: [DFDL-WG] Fw: DFDL OGF WG call - Action 020

Hi Mike

I think it's a simplification too far. Many people especially those with a 
mainframe or COBOL background know what a zoned decimal is. The wikipedia 
entry for binary coded decimal explicitly covers the BCD, packed & zoned 
'variants'. MRM and WTX both explicitly support zoned too. And it's easier 
to say that the 'decimalSignStyle' property applies to zoned decimals than 
to say it applies to any patterns that happen to have a P in them. On 
balance I would keep zoned as a representation.

So we need to decide whether zoned is only allowed for a signed decimal. 
There's no harm in allowing it for unsigned, just some redundancy, and it 
makes validation of the pattern against the rep easier (if something is 
zoned it can only have a subset of pattern chars).

Btw we don't need leading overpunched sign, only trailing - see my case 
for this below.

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848

"Mike Beckerle" <mbeckerle.dfdl@gmail.com> 
12/02/2009 12:51
Please respond to
<mbeckerle.dfdl@gmail.com>

To
Steve Hanson/UK/IBM@IBMGB, <dfdl-wg@ogf.org>
cc

Subject
RE: [DFDL-WG] Fw: DFDL OGF WG call - Action 020

This does suggest another simplification.

Zoned is so close to text....Suppose we scrap the concept of "zoned" 
altogether, and just add a character to our number pattern language to 
allow one to specify a overpunched sign digit. E.g., 

"+00000" is text
"P0000" same with overpunched leading sign. 
"00000+" text
"0000P" same with overpunched trailing sign

The decimal point would normally be implied in these,  (I still like 
having a cobol-style "V" to position this instead of separate properties 
stating the position - one of the few good features about cobol is the 
number patterns. I still think we could quite easily pre-process the "V" 
out of these strings and then hand the rest through to an ICU library as 
an implementation - however the "P" probably does need to be a change in 
that library.) 

Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc.
Tel:  781-810-2100  | 504 Totten Pond Road, Waltham MA 02451 | 
mbeckerle.dfdl@gmail.com 

From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On Behalf 
Of Steve Hanson
Sent: Wednesday, February 11, 2009 1:34 PM
To: dfdl-wg@ogf.org
Subject: [DFDL-WG] Fw: DFDL OGF WG call - Action 020

It was noted on the call this week that there is an alternative to my 
zoned decimal overpunching proposal i) below. 

I said: 

- If it is an unsigned type then DFDL expects the rightmost byte to have a 
zone nibble when parsing, and outputs a zone nibble when unparsing. 
- If it is a signed type then DFDL expects it to have a sign nibble when 
parsing, and outputs a sign nibble when unparsing. 

But my unsigned type behaviour could be achieved by specifying a rep of 
text instead of zoned.  If that is the case, the alternative is to only 
allow zoned rep for signed decimal logical types. 

Thoughts? 

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 05/02/2009 10:17 ----- 
Steve Hanson/UK/IBM 
28/01/2009 13:54 

To
DFDL Working Group 
cc

Subject
DFDL OGF WG call - Action 020Link

Action 020: 
020
SH: Resolve packedDecimalSignCodes behaviour depends on NumberCheckPolicy 
22/10: No progress 
10/12: added how to decide to overpunch and sign position 

a) Resolve packedDecimalSignCodes behaviour depends on NumberCheckPolicy 

Add new property to section 15.4 Properties Specific to Number with Binary 
representation. 
binaryNumberCheckPolicy 
Enum 
Values are “strict” and “lax”. 
Indicates how lenient to be when parsing binary numbers. 
If ‘lax” then the parser tolerates all valid alternatives where such 
alternatives exist. Specifically, for binaryNumberRepresentation = 
'packed' the sign nibble for positive, negative, unsigned and zero is 
allowed to be any of the valid respective values. 
On unparsing, the specified value is always used. 

Also suggest changing some of the other property names in 15.4: 
"decimalVirtualPoint" -> "binaryDecimalVirtualPoint" 
"packedDecimalSignCodes" -> "binaryPackedSignCodes" 

And changing binaryNumberRepresentation enumeration: 
"BCD" -> "bcd" 

b) Zoned decimals: How to decide to overpunch and sign position 

Spec assumes that overpunching of the rightmost character always takes 
place. IBM architecture allows no overpunching (ie, Fx instead of Cx/Dx) - 
this is supported by IBM MRM & WTX parsers. Additionally IBM MRM parser 
allows separate sign byte, and sign byte on left. Let's deal with these 
separately: 
i) No overpunching. 
The IBM architecture allows the rightmost byte to have a zone (Fx) or a 
sign (Cx/Dx) as the left nibble. I don't see why we can't base what to 
expect when parsing, and output when unparsing, on the logical xsd type. 
- If it is an unsigned type then DFDL expects the rightmost byte to have a 
zone nibble when parsing, and outputs a zone nibble when unparsing. 
- If it is a signed type then DFDL expects it to have a sign nibble when 
parsing, and outputs a sign nibble when unparsing. 
For analogy with DFDL packed decimals, it seems at first glance that we 
should also extend the numberCheckPolicy 'lax' setting to treat a zone 
nibble as a +ve sign nibble for a signed type. However, IBM iSeries always 
outputs Fx to mean +ve but accepts both Fx & Cx on input. It is perhaps 
better therefore that DFDL always tolerates Fx when parsing a signed zoned 
decimal, otherwise iSeries users would always have to set 
numberCheckPolicy to 'lax', which might have other implications in the 
future. 
ii) Separate sign byte. 
I don't believe the IBM architecture allows this. I don't think DFDL needs 
to support it. MRM has this, but I think it's because early on MRM did not 
explicitly support text decimals as such, just COBOL variations, and it 
was easier just to call them all zoned. 
iii) Sign byte on left. 
I don't believe the IBM architecture allows this. I don't think DFDL needs 
to support it.  MRM has this, but for the same reason as ii) 

Conclusion: No new DFDL properties needed, but words need adding to 
explain zoned parse/unparse behaviour better. 

Also suggest changing property names: 
"zonedDecimalSignStyle" -> "numberZonedSignStyle" 
"zeroNumberRep" -> "numberZeroRep" 

Should also make clear that any explicit negative pattern in numberPattern 
will be ignored if the xsd type is unsigned. (We could make this an error 
but it precludes creation of a textNumberFormat that works with both 
signed and unsigned types, plus pattern  "##0.0" implictly is equivalent 
to "##0.0;(##0.0)" ). 

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848 

Alan Powell/UK/IBM@IBMGB 
Sent by: dfdl-wg-bounces@ogf.org 
23/01/2009 13:36 

To
dfdl-wg@ogf.org 
cc

Subject
[DFDL-WG] DFDL: Minutes from OGF WG call, 21 January 2009

Open Grid Forum: Data Format Description Language Working Group

Weekly Working Group Conference Call
14:00 GMT, 21 January 2009

Attendees 
Alan Powell (IBM) 
Mike Beckerle(Oco) 

Apologies 
Steve Hanson (IBM) 

1. XSD 1.1 
Deferred to next call 

2. Calendar formats 
Discussed updated (v4) supplement emailed by AP 

Agreed millisec/secSinceEpoc cannot be implied by length of logical data 
so need seperate enumerations. Observed that these options were really 
combination of 3 properties  binary, length and sec/millisec.  Suggested 
renaming to binarySeconds and binaryMilliseconds 

Packed calendars: decided that need to be able to specify at least the 
packedDecimalSignCodes property rather than assuming a default so 
reference will be added to calendar description 

Locale needs to be specified  for numberformats and calendarFormats 
(didn't identify any other areas) as it modifies the behaviour of ICU. 
Decided to add  locale to numberFormat and CalendarFormat 

3. Escape Schemes 

Agreed need for multiple escape delimiter pairs but not nested. 
Need an escape for escape character even though in most cases this will be 
the same character, eg /n //, There are some formats that have a different 
escape, eg /n &/. Only need single escape characters and one level of 
escape characters. 
Discussed how to deal with comments of the form   /*  comment */  where 
the escape delimiters  are also the initiator and terminator of the field. 
Semantic needed is 'only look for field terminator not any parent 
terminator or any other syntax elements'. May fall out naturally from the 
speculative parsing rules. Need further discussion. 

4. AOB 
Next call 28 January 14:00 

Meeting closed, 15:00 GMT 

Actions raised at this meeting 
No
Action 
031

Current Actions: 
No
Action 

012
AP/SH: Update decimalCalendarScheme 
10/9: Not allocated yet 
17/9: No update 
24/9: Add calendar binary formats to actions 
22/10: No progress 
16/1: proposal distributed and discussed. Will be redistributed 
21/1: add locale, 

020
SH: Resolve packedDecimalSignCodes behaviour depends on NumberCheckPolicy 
22/10: No progress 
10/12: added how to decide to overpunch and sign position 
023
MB: Review Schema 1.1 
024
String XML type 
025
Escape schemes 
21/1: discussed requirements 
026
SH: Envelopes and Payloads 
027
Property precedence tables 
028
Variable markup 
029
 valueCalc (output length calculation) 
030
AP: confirm with WTX that can drop duration 
21/6: WTX confirm that they do not have a duration type so do not need it 
in dfdl. Will drop from spec. Closed 

Closed actions: 
030
AP: confirm with WTX that can drop duration 
21/6: WTX confirm that they do not have a duration type so do not need it 
in dfdl. Will drop from spec. Closed 

034 Work items: 
No
Item 

001
String XML type (Ian P) - Apr 30, 2008 

002
Escape schemes (Ian P) - Apr 30, 2008 

003
Variables - ??, 2008 (Mike) 

005
Improvements on property descriptions - ??, 2008 (All - split TBD) 

006
Envelopes and Payloads (Steve) - Apr 30, 2008 

007
(from draft 32) valueCalc (Mike) - ??, 2008   
mostly 
complete 
008
(from draft 32) Property precedence for writing (Steve) - 
under review 
009
(from draft 32) Variable markup (Steve) - Mar 31, 2008   
proposal needs writing up 
010
(from draft 32) Assertions, discriminators and choice, including 
discussion of timing option (Suman) - Mar 31, 2008 * in progress * 

011
(from draft 32) How speculative parsing works (combining choice and 
variable-occurence - currently these are separate) ??, 2008 (IBM) 
 in progress 
012
(from draft 32) Reordering the properties discussion: move representation 
earlier, improve flow of topics ??, 2008 (Alan) * not started * 

025
Augmented infoset and unparsing (Alan)   
added but needs work
complete - specification updated 

Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell@uk.ibm.com 
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 http://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU