For discussion at next DFDL WG call. See summary below but key points are:

- DFDL uses a calendar pattern to convert binary calendar 'packed', 'bcd' and 'ibm4690Packed' reps to a schema calendar type
- No trimming/padding of binary reps takes place, the parser uses what was extracted from the data
- A 'packed' rep will always present an odd number of digits (because of sign nibble)
- A 'bcd' rep will always present an even number of digits
- ICU gives an error if the number of digits presented to it exceeds the length of the calendar pattern
- Therefore the onus is on the user to ensure that the calendar pattern matches the number of digits, eg, by adding leading zeros to the pattern

Are we happy with that behaviour?

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848

----- Forwarded by Steve Hanson/UK/IBM on 23/10/2012 10:49 -----

From:        Andrew Edwards/UK/IBM
To:        Steve Hanson/UK/IBM@IBMGB,
Date:        22/10/2012 15:36
Subject:        DFDL binaryCalendarRep pattern limitations



Hi Steve - a summary of the problem is below

While adding support for IBM4690 packed representation for binaryNumberRep and binaryCalendarRep, it has become apparent that we may need to place a restriction on the calendarPattern property, depending on the choice of binaryCalendarRep.  A problem surfaces in being able to reliably and reversibly distinguish a calendar value when the pattern length is incompatible to the packing type.  If we use 'ibm4690packed' and a pattern that is of odd length, we end up matching an even-length string against an odd-length pattern and there isn't necessarily a well-defined defined answer.

This is best understood with an example:
 - Consider a pattern with an odd number of characters, such as calendarPattern=yyyyMMddDDD
 - A value will require a bytestream of length 6 bytes, which would be serialised as 0x0{y}{y}{y}{y}{M}{M}{d}{d}{D}{D}{D}
 - For example, 2012-10-22 (day295) would be represented as 0x020121022295.

If a parser is represented with this value and pattern, then it will try to match the string "020121022295" against "yyyyMMddDDD".  ICU returns an error because the string is longer than the pattern and I can't say I blame it.  Should it ignore the zero at the start, or the 5 at the end?  Without understanding the pattern, a DFDL parser cannot know.  ICU can't resolve the value as it has more than one group in the pattern so it can't resolve to one single solution.  This becomes more problematic when we consider behaviour for a pattern of "DDD" and of "SSS" as they expect padding at different ends by default (a value of "0100" resolves as "DDD"=100days and "SSS"=0.010 seconds).

For packed representation, the opposite problem occurs: There is always a sign nibble in the packed form, so we will always have a value made up of an odd number of digits.  This can't match against all patterns of even length.

The solution that we discussed was to require the calendar pattern to have a certain digit count depending on the choice of binaryCalendarRep, and allow the pattern to include number characters as a form of "padding".  So for the pattern in the example above, this would have to be changed so that calendarPattern='0'yyyyMMddDDD to force it to have an even digit count.

If that fully explains the problem, do you want to take it to the DFDL workgroup and check what the consensus opinion is?

Cheers,
Andy
Andy Edwards - WebSphere Message Broker - DFDL

Email: andy.edwards@uk.ibm.com
Snail Mail:   MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel int: 247222
Tel ext: +44 (0)1962 817222
Desk: DE2 U20

The Feynman problem solving Algorithm
 1) Write down the problem
 2) Think real hard
 3) Write down the answer
-- Murray Gell-mann in the NY Times


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU