Clarification from CLDR via ICU on the
single 'V' re-use issue:
"V" was originally designed for a variant
of pattern "z" in CLDR. It had a valid reason to define a pattern
different from "z", but we found that CLDR cannot collect such
information differentiating "V" from "z". Also, there
were no users who requested us (including ICU) to maintain proper data
for differentiating "V" from "z". As the result, the
data making the difference between "V" and "z" was
deprecated, and we no longer have reason to keep pattern "V"
except backward compatibility reason.
So, strictly speaking, it's a backward compatibility
problem. But -
- It still produces a text representation of the same
calendar field.
- "V" was not adopted by any other known implementations
other than ICU.
- The distinction between "V" and "z"
had never worked as users would expect.
- "z" is a Java compatible pattern for short
abbreviated zone format, and it is available from the very beginning. On
the other hand, "V" was a recent addition.
CLDR
technical committee including myself understand the risk, but all of us
preferred cleaner pattern definition over the potential backward compatibility
problem.
Sound like DFDL should withdraw 'V',
I doubt that any IBM customer will be using it. 'VVVV' is ok.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 21/01/2013 18:17 -----
From:
Steve Hanson/UK/IBM
To:
dfdl-wg@ogf.org,
Date:
17/01/2013 10:04
Subject:
Fw: [DFDL-WG]
date & time and latest ICU possible issues/conflicts
The ICU ticket has been answered, with
reference to the following document:
http://cldr.unicode.org/development/development-process/design-proposals/time-zone-offset-patterns
'x'/'X' symbols
In a not-too-distant ICU there will
be new symbols 'x' and 'X' to handle Z as a time zone for ISO8601 date/times,
effectively replacing 'ZZZZZ'. Both 'x' and 'X' will tolerate Z or +00:00
(or variants) on parsing, and on formatting X will result in Z and x will
result in +00:00 (or variants).
DFDL's use of 'U' is wider than this,
as we allow 'U' to appear with any number of 'Z's, meaning that Z is accepted
with non-ISO8601 date/times. DFDL also adds the use of 'I' symbol on its
own to mean any ISO8601 compliant date/time, and again we allow 'U' to
appear with 'I'.
However the motivating use case for
adding 'U' was IBM MRM which supports this today. But it does so primarily
for XML use cases, in particular ISO8601. I am not personally aware of
an actual non-XML use case.
I suggest we drop the DFDL-specific
use of the 'U' symbol in conjunction with 'Z' and 'I' symbols from the
DFDL specification via errata, and allow the use of 'ZZZZZ' instead, which
at least will accept Z when parsing. When 'x'/'X' support appears in ICU,
we can take a future errata to support it or leave until DFDL 2.0.
IBM DFDL already supports 'U' but I
am ok with deprecating it as I don't believe it will be being used for
real.
'V' symbol
In a not-too-distant ICU there will be new symbols 'VV' and 'VVV' to handle
time zones expressed as Time Zone Ids and localized locations, respectively.
We can add that via errata in the future, or leave until DFDL 2.0.
However, at the same time the meaning
of V is changed slightly. DFDL supports 'V'. I have asked ICU for a clarification.
'O' symbol
In a not-too-distant ICU there will be new symbol 'O' to handle localized
GMT format variants. We can add that via errata in the future, or leave
until DFDL 2.0.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 17/01/2013 08:58 -----
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>,
Cc:
dfdl-wg@ogf.org
Date:
16/01/2013 18:20
Subject:
Re: [DFDL-WG]
date & time and latest ICU possible issues/conflicts
ICU ticket raised as the help does not
give an example.
https://icu.sanjose.ibm.com/gcoctrac/ticket/469#ticket
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
dfdl-wg@ogf.org,
Date:
16/01/2013 15:11
Subject:
[DFDL-WG] date
& time and latest ICU possible issues/conflicts
Sent by:
dfdl-wg-bounces@ogf.org
Steve Lawrence is on the Daffodil Open Source DFDL team (on CC), and he
has dug into date/time types.
He raised some concerns to me that I really haven't been tracking at all,
so I wanted to put in front of the rest of the group.
The date time format syntax for the latest version of
icu4j contains a 'U'
character, which means "cyclic year name". However, the daffodil
spec says the 'U' character, following a Z makes it so a timezone of UTC
is represented as Z instead of +00:00.
This seems to be a conflict, and would prevent us from ever upgrading to
the newest version of ICU (which might be a good idea).
I will point out that the latest version of ICU supports ZZZZZ (5 Z's),
which is the ISO8601 timezone format. This doesn't add all the
functionality that the DFDL 'U' gives. My question is, is this enough?
Are there cases where the ZU, ZZU, etc. are necessary? I'm just
concerned that the U is going to quite a bit more complexity, and want
to make sure the updates to latest ICU don't address the DFDL-WG concerns.
And if we still need the 'U', maybe it should change to a different
letter to prevent conflicts with the latest ICU4J?
I would point out that the ICU pattern language cannot deal with dual-purpose
letters very well, i.e., ambiguities are introduced if the same letter
both introduces a format, and if following another format string, modifies
its behavior. E.g., does ZU mean Z modified by U, or Z first, and then
U. So it seems pretty unfortunate if the ICU libraries added a conflicting
use of letter U.
I believe the point of DFDL's use of the U modifier for letters I and Z
was to be absolutely clear on the GMT timezone 'Z' issue, i.e., to indicate
that 'Z' is to be used, and -00:00 is not to be output, nor accepted when
parsing. The ICU specification ZZZZZ says ISO format, but that allows either
'Z' or -00:00 to be used for GMT timezone, and it's not clear what it means
on output.
--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU