
Given that ICU do plan to offer this support, the DFDL spec will remain as it is, with textNumberExponentRep being sensitive to ignoreCase, and DFDL implementations that ICU should include a release note that says all matches today will be case insensitive but are liable to change. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 28/08/2013 16:15 ----- From: Steve Hanson/UK/IBM To: dfdl-wg@ogf.org, Date: 21/08/2013 12:21 Subject: Fw: [DFDL-WG] DFDL ICU Challenges for Implementation Regarding Mike's second point: Here's a link to an ICU ticket on this subject: http://bugs.icu-project.org/trac/ticket/9659 - currently targetted for ICU 52.1 Here's some more details from IBM's calls with ICU on this subject: - Exponent character / ignoreCase : Exponent char is not case sensitive. Is this intentional? * Priority : Medium ICU see two options for this: Option 1: Provide an API call to set a flag on the DecimalFormat object. Option 2: Make it a global policy settable via a config switch. This would allow other 'site policies' to be made settable using the same mechanism. There would be one set of policy flags, including this flag, per address space. There are differences in date/time processing between C and Java that could be dealt with using this mechanism. DFDL needs some of these flags to be configurable at runtime. 2012/10/19 Hit an issue where case handling was inconsistent. Fix needs care to avoid changing default behaviour and thus breaking existing users of the API. Currency and prefix/suffix may need separate switch so global switch for the DecimalFormat not appropriate. Could provide a patch to ICU for setting exponent char for now. 2013/1/17 - ICU external ticket #9659 ICU had an issue with the new API being specific to just case sensitivity of exponent (and not other regions). DFDL clarified the requirement is for an API to change global case sensitivity (not just exponent). This is targetted at ICU51 Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 14/08/2013 16:55 ----- From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Date: 14/08/2013 14:14 Subject: [DFDL-WG] DFDL ICU Challenges for Implementation Sent by: dfdl-wg-bounces@ogf.org There are a couple of features in DFDL that ICU doesn't support, yet where all or nearly all the related functionality is supported by ICU. Perhaps these aspects of the spec can be revisited? 1) List of Decimal Separators The textStandardDecimalSeparator property is a list of characters. However, ICU only supports a single character. I see lots of potential for error here, confusing diagnostics, etc. It is not consistent with textStandardGrouping separator, which allows only a single character. Is there a use case where we know we need more than one decimal separator? The only thing I can think of is a blend of say classic European-style decimal numbers like "1 234 567,89" and USA style " 1,234,567.89", but ICU won't deal with different grouping separators either. In any case if there are multiple decimal and grouping separators we really don't have these properties right in DFDL. We should require them to be specified not as two separate lists, but as a list of pairs, because grouping separators match up with specific decimal separator values in a format. 2) Case Insensitivity Some properties that we use to configure ICU are affected by ignoreCase="yes", but ICU does not support case insensitivity. The properties are: textStandardExponentRepCharacter textStandardInfinityRep textStandardNaNRep I can certainly imagine a need for case insensitivity here, and even for multiple values for these (though we allow only one for Infinity and NaN). For the infinity and nan reps that isn't so problematic as one can easily do a pre-check before calling ICU, but for the exponent rep, that is needed down in the detailed number format parsing. I can see no certain algorithm other than creating separate number format parsers for each exponent rep character in provided case, and opposite case, and then using them one by one until a successful parse. Is this ok or do we consider this a mistake? 3) We are not very consistent in these properties. We allow multiple textStandardZeroRep values, but only a single textStandardInfinityRep, and only a single textStandardNaNRep. We allow multiple textStandardExponentRepCharacter, and multiple textStandardDecimalSeparator, but only a single textStandardGroupingSeparator. This kind of inconsistency is always problematic for users. Comments? -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU