Hi Dave
We should also bear the problem below
in mind when thinking about DFDL Infoset & XDM. XDM assumes that
an element with a concrete type-name has a typed-value conforming to the
type-name, ie, it has been 'validated'. If this is not the case then
the type-name is set to xs:untyped or xs:untypedAtomic (extra types added
to XDM for this purpose). In DFDL Infoset we had been assuming that
the [dataType] would be set to that implied by the DFDL xsd, regardless
of whether validation succeeded or not - though there are issues with this
as explained below.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 13/05/2009 12:51 -----
Steve Hanson/UK/IBM
29/04/2009 15:54
|
To
| Alan Powell, Dave Glick, Mike Beckerle
(Work)
|
cc
| Suman Kalia/Toronto/IBM@IBMCA, Tim Kimber/UK/IBM
|
Subject
| Fw: Action 020 completion |
|
Hi Dave
We discussed this on the call and agreed
that the unsigned types are just range restrictions so treating negative
numbers in the data as parse errors instead of validation errors seemed
inconsistent.
Various options discussed:
1) Remove the unsigned types altogether.
- Means we'd need an extra property
to describe binary integer representation, as we could no longer infer
the rep from the logical type
- Loses type information for applications
where the fact that data is unsigned is important.
- Means DFDL modelers would have to
create their own duplicate restrictions for common C etc data types.
2) Change [dataType] to point to the
XML Schema primitive type instead of the XML Schema built-in type.
- Means that the value and the type
would be xs:decimal which is too general
3) Change [dataValue] to say "The
value in the value space of the underlying XML Schema primitive type
forthe [datatype] member or special value nil"
- Allows the infoset to carry integer
data that is invalid due to range regardless of value.
- Means that the value would be a decimal
even though the data type was (say) xs:unsignedLong, ie, the datatype and
datavalue are no longer in step unless validated
4) Option 2) with the modification that
the primitive type for all integer types was xs:integer and not xs:decimal.
5) Option 3) with the modification
that the primitive type for all integer types was xs:integer and not xs:decimal.
We agreed not to close on this until
you had reported back on your action 032.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 29/04/2009 15:32 -----
Steve Hanson/UK/IBM
27/04/2009 13:42
|
To
| Alan Powell/UK/IBM
|
cc
| dglick@dracorp.com, mbeckerle@oco-inc.com
|
Subject
| Re: Action 020 completionLink |
|
Thanks Alan
I've added correct property names, see
below. But I've omitted floats deliberately for clarity, the logical type
is always signed and physical type is always signed, so there's no issue.
However, there is a problem with what
I have stated, as you pointed out on Friday. On parsing I am effectively
validating the input data, but on unparsing I am not assuming the data
has been validated. This is not consistent and needs correcting.
But as I looked into this, I realised
we have a problem with how we have described the DFDL infoset. The spec
says "There is no requirement for
DFDL-described data to be valid in order to have a DFDL information set.",
which is in accordance with our agreed position on validation being optional.
But further on it also says:
[datatype]
String. The name of the XML Schema 1.0 built-in simple type to which the
value corresponds. DFDL supports a subset of these types listed in the
specification at section 4.1.
[dataValue]
The value in the value space of the [datatype] member or special value
nil.
This says to me that the DFDL parser
must have done enough validation to ascertain that the value matched the
underlying built-in type. For example, I have a user-defined simple type
that adds a max/min range of +100-+200 to an xs:unsignedInt. If the input
data has value 99, the value will be accepted into the infoset, but will
not validate if validation is switched on. If the input data is a packed
decimal with value -1, the value will not be accepted into the infoset.
Given that xs:unsignedInt is itself just a range restriction of xs:integer
(via xs:nonNegativeInteger), this seems a bit arbitrary.
Dave - given your action item looking
at DFDL Infoset versus XDM, I'd be interested in your opinion here.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Alan Powell/UK/IBM
24/04/2009 15:13
|
To
| Steve Hanson/UK/IBM@IBMGB
|
cc
| dglick@dracorp.com, mbeckerle@oco-inc.com
|
Subject
| Re: Action 020 completionLink |
|
Steve
Looks OK
But can you use the correct property
name eg binaryNumberRepresentation
and for completeness add binaryFloatRepresentation
(even though it may be obvious)
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
From:
| Steve Hanson/UK/IBM
|
To:
| mbeckerle@oco-inc.com, Alan Powell/UK/IBM,
dglick@dracorp.com
|
Date:
| 23/04/2009 12:50
|
Subject:
| Action 020 completion |
Here's my proposal for the behaviour
when a logical type is signed and the physical data has no sign (either
because it is not capable of carrying a sign, or it carries an unsigned
indicator), and when the logical type is unsigned and the physical data
has a sign. The principle is to be flexible and only to give errors
when things are clearly mis-matched
Logical type
| textNumberRepresentation=
text (4)
| textNumberRepresentation=
zoned (2) (6)
| binaryNumberRepresentation=
packed (5)
| binaryNumberRepresentation=
bcd (1)
| binaryNumberRepresentation=
binary
|
Signed (decimal, integer)
| Parse: OK
Unparse: OK
| Parse: Unsigned data => +ve
Unparse: Data always punched
with sign
| Parse: Unsigned data => +ve
Unparse: Data signed as per +ve/-ve
nibble specifiers, unsigned nibble specifier never used
| Parse: Data always +ve
Unparse: -ve data is error
| N/A
|
Signed (long, int, short, byte)
| Ditto
| Ditto
| Ditto
| Ditto
| Parse: Data assumed 2's complement
binary
Unparse: Data output as 2's complement
binary
|
Unsigned (unsigned long, unsigned int,
unsigned short, unsigned byte)
(3)
| Parse: -ve data is error
Unparse: -ve data is error
| Parse: +ve data => OK, -ve
data is error
Unparse: Sign never punched,
-ve data is error
| Parse: +ve data => OK, -ve
data is error
Unparse: Unsigned nibble specifier
always used, -ve data is error
| Parse: OK
Unparse: -ve data is error
| Parse: Data assumed unsigned
binary
Unparse: Data output as unsigned
binary |
(1) Can not physically
carry a sign
(2) Some systems output unsigned
for +ve, but accept +ve on input (eg, IBM iSeries)
(3) Assumes that on unparsing,
the infoset could still present a -ve value
(4) The -ve sign is indicated
by numberPattern property
(5) The exact sign nibbles are
given by the packedDecimalSignCodes property
(6) The punching style to use
is given by the numberZonedSignStyle property
Mail back any comments before next week's
call.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 23/04/2009 12:04 -----
Steve Hanson/UK/IBM
18/02/2009 16:55
|
To
| Mike Beckerle (Work)
|
cc
| Alan Powell/UK/IBM, dglick@dracorp.com
|
Subject
| DFDL: Packed & zoned decimals -
more thoughts (was Action 020) |
|
Hi Mike
While we are on the subject of how to
handle signs, the spec does not fully define what happens for a number
if the logical type is unsigned. We need to say what is expected in the
physical data and what happens if the data contains a sign. For example,
we say that for an unsigned integer, if the rep is binary then we treat
the data as 'unsigned binary' and not twos complement. And we say that
BCD is only allowed for unsigned logical types. That is good. But we don't
do the same for packed, text, zoned. I think we need to say that no explicit
sign is expected in the data (eg, packed should have only F or 0, no A,B,C,D)
and if it does:
Alternatives:
i) Error
ii) Positive sign discarded, negative
sign gives error
iii) Sign discarded
iv) As per i) if 'strict' set, as per
ii) if 'lax' set
v) As per i) if 'strict' set, as per
iii) if 'lax' set
Personally I vote for i)
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 18/02/2009 16:41 -----
Steve Hanson/UK/IBM
12/02/2009 13:18
|
To
| <mbeckerle.dfdl@gmail.com>
|
cc
| dfdl-wg@ogf.org
|
Subject
| RE: [DFDL-WG] Fw: DFDL OGF WG call -
Action 020Link |
|
Hi Mike
I think it's a simplification too far.
Many people especially those with a mainframe or COBOL background know
what a zoned decimal is. The wikipedia entry for binary coded decimal explicitly
covers the BCD, packed & zoned 'variants'. MRM and WTX both explicitly
support zoned too. And it's easier to say that the 'decimalSignStyle' property
applies to zoned decimals than to say it applies to any patterns that happen
to have a P in them. On balance I would keep zoned as a representation.
So we need to decide whether zoned is
only allowed for a signed decimal. There's no harm in allowing it for unsigned,
just some redundancy, and it makes validation of the pattern against the
rep easier (if something is zoned it can only have a subset of pattern
chars).
Btw we don't need leading overpunched
sign, only trailing - see my case for this below.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle"
<mbeckerle.dfdl@gmail.com>
12/02/2009 12:51
Please respond to
<mbeckerle.dfdl@gmail.com> |
|
To
| Steve Hanson/UK/IBM@IBMGB, <dfdl-wg@ogf.org>
|
cc
|
|
Subject
| RE: [DFDL-WG] Fw: DFDL OGF WG call -
Action 020 |
|
This does suggest another simplification.
Zoned is so close to text....Suppose
we scrap the concept of "zoned" altogether, and just add a character
to our number pattern language to allow one to specify a overpunched sign
digit. E.g.,
"+00000" is text
"P0000" same with overpunched
leading sign.
"00000+" text
"0000P" same with overpunched
trailing sign
The decimal point would normally
be implied in these, (I
still like having a cobol-style "V" to position this instead
of separate properties stating the position - one of the few good features
about cobol is the number patterns. I still think we could quite easily
pre-process the "V" out of these strings and then hand the rest
through to an ICU library as an implementation - however the "P"
probably does need to be a change in that library.)
Mike Beckerle | OGF DFDL
WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2100 | 504 Totten Pond Road, Waltham MA 02451
| mbeckerle.dfdl@gmail.com
From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org]
On Behalf Of Steve Hanson
Sent: Wednesday, February 11, 2009 1:34 PM
To: dfdl-wg@ogf.org
Subject: [DFDL-WG] Fw: DFDL OGF WG call - Action 020
It was noted on the call this week that there is an alternative to my zoned
decimal overpunching proposal i) below.
I said:
- If it is an unsigned type then DFDL expects the rightmost byte to have
a zone nibble when parsing, and outputs a zone nibble when unparsing.
- If it is a signed type then DFDL expects it to have a sign nibble when
parsing, and outputs a sign nibble when unparsing.
But my unsigned type behaviour could be achieved by specifying a rep of
text instead of zoned. If that is the case, the alternative is to
only allow zoned rep for signed decimal logical types.
Thoughts?
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 05/02/2009 10:17 -----
Steve Hanson/UK/IBM
28/01/2009 13:54
|
To
| DFDL Working Group
|
cc
|
|
Subject
| DFDL OGF WG call - Action 020Link |
|
Action 020:
020
| SH: Resolve packedDecimalSignCodes
behaviour depends on NumberCheckPolicy
22/10: No progress
10/12: added how to decide to overpunch and sign position
|
a) Resolve packedDecimalSignCodes
behaviour depends on NumberCheckPolicy
Add new property to section 15.4 Properties
Specific to Number with Binary representation.
binaryNumberCheckPolicy
| Enum
Values are “strict” and “lax”.
Indicates how lenient to be when parsing binary
numbers.
If ‘lax” then the parser tolerates all valid
alternatives where such alternatives exist. Specifically, for binaryNumberRepresentation
= 'packed' the sign nibble for positive, negative, unsigned and zero is
allowed to be any of the valid respective values.
On unparsing, the specified value is always
used. |
Also suggest changing some of the other property names in 15.4:
"decimalVirtualPoint" -> "binaryDecimalVirtualPoint"
"packedDecimalSignCodes" -> "binaryPackedSignCodes"
And changing binaryNumberRepresentation enumeration:
"BCD" -> "bcd"
b) Zoned decimals: How to decide to overpunch and sign position
Spec assumes that overpunching of the rightmost character always takes
place. IBM architecture allows no overpunching (ie, Fx instead of Cx/Dx)
- this is supported by IBM MRM & WTX parsers. Additionally IBM MRM
parser allows separate sign byte, and sign byte on left. Let's deal with
these separately:
i) No overpunching.
The IBM architecture allows the rightmost byte to have a zone (Fx) or a
sign (Cx/Dx) as the left nibble. I don't see why we can't base what to
expect when parsing, and output when unparsing, on the logical xsd type.
- If it is an unsigned type then DFDL expects the rightmost byte to have
a zone nibble when parsing, and outputs a zone nibble when unparsing.
- If it is a signed type then DFDL expects it to have a sign nibble when
parsing, and outputs a sign nibble when unparsing.
For analogy with DFDL packed decimals, it seems at first glance that we
should also extend the numberCheckPolicy 'lax' setting to treat a zone
nibble as a +ve sign nibble for a signed type. However, IBM iSeries always
outputs Fx to mean +ve but accepts both Fx & Cx on input. It is perhaps
better therefore that DFDL always tolerates Fx when parsing a signed zoned
decimal, otherwise iSeries users would always have to set numberCheckPolicy
to 'lax', which might have other implications in the future.
ii) Separate sign byte.
I don't believe the IBM architecture allows this. I don't think DFDL needs
to support it. MRM has this, but I think it's because early on MRM did
not explicitly support text decimals as such, just COBOL variations, and
it was easier just to call them all zoned.
iii) Sign byte on left.
I don't believe the IBM architecture allows this. I don't think DFDL needs
to support it. MRM
has this, but for the same reason as ii)
Conclusion: No new DFDL properties needed, but words need adding to explain
zoned parse/unparse behaviour better.
Also suggest changing property names:
"zonedDecimalSignStyle" -> "numberZonedSignStyle"
"zeroNumberRep" -> "numberZeroRep"
Should also make clear that any explicit negative pattern in numberPattern
will be ignored if the xsd type is unsigned. (We could make this an error
but it precludes creation of a textNumberFormat that works with both signed
and unsigned types, plus pattern "##0.0" implictly is equivalent
to "##0.0;(##0.0)" ).
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Alan Powell/UK/IBM@IBMGB
Sent by: dfdl-wg-bounces@ogf.org
23/01/2009 13:36
|
To
| dfdl-wg@ogf.org
|
cc
|
|
Subject
| [DFDL-WG] DFDL: Minutes from OGF WG
call, 21 January 2009 |
|
Open Grid Forum: Data Format Description Language Working Group
Weekly Working Group Conference Call
14:00 GMT, 21 January 2009
Attendees
Alan Powell (IBM)
Mike Beckerle(Oco)
Apologies
Steve Hanson (IBM)
1. XSD 1.1
Deferred to next call
2. Calendar formats
Discussed updated (v4) supplement emailed by AP
Agreed millisec/secSinceEpoc cannot be implied by length of logical data
so need seperate enumerations. Observed that these options were really
combination of 3 properties binary, length and sec/millisec. Suggested
renaming to binarySeconds and binaryMilliseconds
Packed calendars: decided that need to be able to specify at least the
packedDecimalSignCodes
property rather than assuming a default so reference will be added to calendar
description
Locale needs to be specified for numberformats and calendarFormats
(didn't identify any other areas) as it modifies the behaviour of ICU.
Decided to add locale to numberFormat and CalendarFormat
3. Escape Schemes
Agreed need for multiple escape delimiter pairs but not nested.
Need an escape for escape character even though in most cases this will
be the same character, eg /n //, There are some formats that have a different
escape, eg /n &/. Only need single escape characters and one level
of escape characters.
Discussed how to deal with comments of the form /* comment
*/ where the escape delimiters are also the initiator and terminator
of the field. Semantic needed is 'only look for field terminator not any
parent terminator or any other syntax elements'. May fall out naturally
from the speculative parsing rules. Need further discussion.
4. AOB
Next call 28 January 14:00
Meeting closed, 15:00 GMT
Actions raised at this meeting
Current Actions:
No
| Action
|
|
|
012
| AP/SH: Update decimalCalendarScheme
10/9: Not allocated yet
17/9: No update
24/9: Add calendar binary formats to actions
22/10: No progress
16/1: proposal distributed and discussed. Will be redistributed
21/1: add locale,
|
|
|
020
| SH: Resolve packedDecimalSignCodes
behaviour depends on NumberCheckPolicy
22/10: No progress
10/12: added how to decide to overpunch and sign position
|
023
| MB: Review Schema 1.1
|
024
| String XML type
|
025
| Escape schemes
21/1: discussed requirements
|
026
| SH: Envelopes and Payloads
|
027
| Property precedence tables
|
028
| Variable markup
|
029
| valueCalc (output length calculation)
|
030
| AP: confirm with WTX that can drop duration
21/6: WTX confirm that they do not have a duration type so do not need
it in dfdl. Will drop from spec. Closed
|
| |
Closed actions:
030
| AP: confirm with WTX that can drop
duration
21/6: WTX confirm that they do not have a duration type so do not need
it in dfdl. Will drop from spec. Closed
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
034 Work items:
No
| Item
|
|
001
| String XML type (Ian
P) - Apr
30, 2008
|
|
002
| Escape schemes (Ian
P) - Apr
30, 2008
|
|
003
| Variables - ??,
2008 (Mike)
|
|
005
| Improvements on property descriptions - ??,
2008 (All - split TBD)
|
|
006
| Envelopes and Payloads (Steve)
- Apr 30, 2008
|
|
007
| (from draft 32) valueCalc (Mike)
- ??, 2008
| mostly
complete
|
008
| (from draft 32) Property precedence for writing
(Steve)
-
| under review
|
009
| (from draft 32) Variable markup (Steve)
- Mar 31, 2008
| proposal needs writing up
|
010
| (from draft 32) Assertions, discriminators
and choice, including discussion of timing option (Suman)
- Mar 31, 2008
* in progress *
|
|
011
| (from draft 32) How speculative parsing works
(combining choice and variable-occurence - currently these are separate)
??, 2008
(IBM)
| in progress
|
012
| (from draft 32) Reordering the properties
discussion: move representation earlier, improve flow of topics ??,
2008 (Alan)
* not started *
|
|
025
| Augmented infoset and unparsing (Alan)
| added but needs work |
complete - specification updated
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU