For discussion at next DFDL WG call. See
summary below but key points are:
- DFDL uses a calendar pattern to convert
binary calendar 'packed', 'bcd' and 'ibm4690Packed' reps to a schema calendar
type
- No trimming/padding of binary reps
takes place, the parser uses what was extracted from the data
- A 'packed' rep will always present
an odd number of digits (because of sign nibble)
- A 'bcd' rep will always present an
even number of digits
- ICU gives an error if the number of
digits presented to it exceeds the length of the calendar pattern
- Therefore the onus is on the user
to ensure that the calendar pattern matches the number of digits, eg, by
adding leading zeros to the pattern
Are we happy with that behaviour?
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 23/10/2012 10:49 -----
From:
Andrew Edwards/UK/IBM
To:
Steve Hanson/UK/IBM@IBMGB,
Date:
22/10/2012 15:36
Subject:
DFDL binaryCalendarRep
pattern limitations
Hi Steve - a summary of the problem
is below
While adding support for IBM4690 packed
representation for binaryNumberRep and binaryCalendarRep, it has become
apparent that we may need to place a restriction on the calendarPattern
property, depending on the choice of binaryCalendarRep. A problem
surfaces in being able to reliably and reversibly distinguish a calendar
value when the pattern length is incompatible to the packing type. If
we use 'ibm4690packed' and a pattern that is of odd length, we end up matching
an even-length string against an odd-length pattern and there isn't necessarily
a well-defined defined answer.
This is best understood with an example:
- Consider a pattern with an odd
number of characters, such as calendarPattern=yyyyMMddDDD
- A value will require a bytestream
of length 6 bytes, which would be serialised as 0x0{y}{y}{y}{y}{M}{M}{d}{d}{D}{D}{D}
- For example, 2012-10-22 (day295)
would be represented as 0x020121022295.
If a parser is represented with this
value and pattern, then it will try to match the string "020121022295"
against "yyyyMMddDDD". ICU returns an error because the
string is longer than the pattern and I can't say I blame it. Should
it ignore the zero at the start, or the 5 at the end? Without understanding
the pattern, a DFDL parser cannot know. ICU can't resolve the value
as it has more than one group in the pattern so it can't resolve to one
single solution. This becomes more problematic when we consider behaviour
for a pattern of "DDD" and of "SSS" as they expect
padding at different ends by default (a value of "0100" resolves
as "DDD"=100days and "SSS"=0.010 seconds).
For packed representation, the opposite
problem occurs: There is always a sign nibble in the packed form, so we
will always have a value made up of an odd number of digits. This
can't match against all patterns of even length.
The solution that we discussed was to
require the calendar pattern to have a certain digit count depending on
the choice of binaryCalendarRep, and allow the pattern to include number
characters as a form of "padding". So for the pattern in
the example above, this would have to be changed so that calendarPattern='0'yyyyMMddDDD
to force it to have an even digit count.
If that fully explains the problem,
do you want to take it to the DFDL workgroup and check what the consensus
opinion is?
MP211,
Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel
int:
247222
Tel
ext:
+44
(0)1962 817222
Desk:
DE2
U20
The
Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU