In that case which 2 digits of 1997 you get should perhaps depend on the number justification. If left truncate off the right so you get 19. If right truncate off left to get 97.
Since numbers are usually right justified this is the typical behavior and I think what the ICU lib is assuming, patterns being most common in fixed length data.
In variable length data I agree with your analysis which is to never truncate most significant digits.
On Jul 9, 2012 2:00 PM, "Steve Hanson" <smh@uk.ibm.com>
wrote:
The ICU web documentation says the following
about formatting (unparsing) numbers, which has been copied into the DFDL
specification:
· The term maximum fraction digits is the total number of ‘0’ and ‘#’ characters in the fraction sub-pattern above.
· The term minimum fraction digits is the total number of ‘0’ characters (only) in the fraction sub-pattern above.
· The term maximum integer digits is the total number of ‘0’ and ‘#’ characters in the integer sub-pattern above.
· The term minimum integer digits is the total number of ‘0’ characters (only) in the integer sub-pattern above.
That all looks to make sense, but on
close reading the ICU behaviour of maximum integer digits appears to be
undesirable, in that it will silently truncate oversize integer
portions. From above "For example, 1997
is formatted as "97" if the maximum integer digits is set to
2."
Interestingly, while ICU derives minimum integer digits, minimum fraction
digits and maximum fraction digits from the pattern, ICU does not derive
maximum integer digits from the pattern and instead uses a default of 309.
There is an explicit ICU API call that you have to make to set it.
Because of this inconsistent ICU behaviour, the IBM DFDL implementation
has omitted to use this ICU API today, and so allows up to 309 digits to
be formatted regardless of pattern. Eg, "#0" works for infoset
values"1", "12", "123456789" without any
loss. As well as avoiding the silent truncation, this is convenient for
variable length text numbers, as a single textNumberPattern value such
as "#0" can be set in scope and widely used, but means variable
length text numbers do not have their integer digit length policed (fixed
length text numbers are policed by length of element).
I think it is worth ratifying that the spec words above are the true intended
behaviour, and if so noting that an implementation should a) not set the
ICU API, b) let the maximum integer digits default to 309, and c) implement
maximum integer digits processing itself.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU