"Mike Beckerle"
<mbeckerle.dfdl@gmail.com>
09/04/2008 15:05
|
|
"Mike Beckerle"
<mbeckerle.dfdl@gmail.com>
09/04/2008 01:43
|
|
Alan Powell/UK/IBM
28/03/2008 16:45 |
|
Steve
Technically seems OK.
Need quite a bit of editorial work before it can be included in the spec
which I have started.
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
From: | Steve Hanson/UK/IBM |
To: | mbeckerle@oco-inc.com |
Cc: | Alan Powell/UK/IBM, Ian W Parkinson/UK/IBM |
Date: | 28/03/2008 13:59 |
Subject: | Fw: DFDL Decimal - proposal - correcting wrong attachment |
Steve Hanson/UK/IBM
27/03/2008 15:29 |
|
Hi Mike
I've finally got round to looking at the decimal supplement, and I'd like
to get your opinion on something. The WTX team have been reviewing draft
031 and had the following observation (actually they had quite a few good
ones, and when they've finished we need to discuss them all on a OGF WG
call).
"13.3. Is a zoned decimal textual or non-textual? If all overpunched
variants result in well-known characters then the data is scannable and
therefore more like a textual field."
It turns out that the type hierarchy in TX for decimal looks like below.
They consider Zoned as text as it always consists of reasonable characters
and is subject to encoding conversion, padding, justification, etc. There's
a lot of appeal in that. It's always bothered me a bit that MRM viewed
it as a binary type.
Number -> Character -> Decimal (meaning text decimal)
Integer (meaning text integer)
Zoned
-> Binary -> Integer (meaning binary
integer)
Float
Packed
BCD
Also, their Zoned does not have separate sign option. They point out that
a separate signed Zoned is just a Text decimal. And they are correct. We
got the separate sign thing from MRM, which after some digging turns out
to have got it from the CAM Type Descriptor model, which had no other way
of representing a text decimal number with a separate sign.
As part of my rework of the decimal supplement, I'd like to take both these
into account. The implications are:
- Zoned => overpunched only
- Zoned decimal can pick up on the textNumberxxx properties, including
textNumberFormat
=> use the numberPattern (ie, ICU pattern) property
to say which end the (overpunched) sign goes
=> can get away without a separate pattern language
for binary decimals, which as you point out has endian-ness issues
- Binary decimals are packed and BCD
- There are a lot fewer properties for decimals
- dfdl:representation = "text" can have subdivisions - that's
not occurred until now (we could think about making dfdl:representation
= "xml" a subdivision of "text"?)
If you think there is merit in this approach then let me know by return
and I'll see if I can write something up tomorrow.
I'm WAH on +44-1794-340899 if you want to discuss.
Your "crazy idea" below is interesting - but I think is a tooling
thought rather than a core spec thing.
(Sorry about call yesterday - I thought I mailed something out a couple
of calls ago about DST mismatch, but perhaps I didn't).
Regards, Steve
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 27/03/2008 15:04 -----
Mike Beckerle/Worcester/IBM@IBMUS
21/11/2007 15:26 |
|
I think decimal has signed and unsigned variants based on dfdl:decimalSigned
boolean. If this is false then it's unsigned and packedUnsignedRep specifies
the sign nibble used for unsigned. The doc doesn't specify that one can
say "" for this indicating no sign nibble at all.
I've been rereading the decimal properties supplement and starting v002
of it based on changes to dfdl:representation in the core spec. This needs
a general clean up. There's errors here in that there is a decimalType="zoned",
or "packed" or "BCD" and also a bcdIsPacked, and bcdUnpackedRep="ebcdic",
which is the same as zoned I think.
We need there to be one way to express these things. Right now the bias
is a set of orthogonal flags: signed or unsigned, what's the sign nibble
for unsigned, what sign nibbles for signed, packed or unpacked, what's
in the zones - the unused nibbles - (ebcdic, i.e., "F",
ascii, i.e., "3", or zero - but that's not enough as I've seen
data with "2" in the zones - some non IBM cobol compiler does
this.).
A better choice may be to specify decimalType as a larger enum which includes
most of these properties, so that we don't end up with too much ability
to express variants that have simply never existed.
A list of the use cases needs to be added to the doc also.
Here's a few:
-1234 as expressed as bytes in hex in increasing position order, i.e.,
LSB first.
packed ibm, signed, D01234
zoned ibm, overpunched leading sign D1F2F3F4 (are signs usually leading
or trailing.... I think trailing actually.)
big endian zoned ascii, ascii-translated overpunched leading sign 4A323334
(yuck - so much for treating decimal as "binary" data).
Here's a crazy idea: I believe there is a set of magic numbers which if
you give me their translations in bytes, I can determine exactly what the
encoding properties are.
E.g., if you give me the bytes for +0000, -1234, +789 I believe I
can determine all of the properties.
This might be a better way to specify decimal formats. I.e., give me those
byte patterns expressed as hex, and I reverse engineer all the property
settings.
e.g., decimalFormat="+0000=C00000-1234=D01234 +789=C789" (signed,
packed, leading sign, padded to even number of nibbles, big endian, zero
carries a sign, "C" is plus, "D" is minus)
or decimalFormat="+0000=00000000 -1234=D1F2F3F4 +789=C7F8F9"
(ebcdic zoned, leading overpunched sign, big endian, zero is allowed to
have zero as sign and all zero bytes, "C" is plus, "D"
is minus)
This may make more sense for the tooling than the DFDL language though.
I.e., point it at some data and it tries to guess these properties.
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU