[DFDL-WG] Fw: DFDL ICU Challenges for Implementation - textNumberExponentRep

28 Aug 2013

      Given that ICU do plan to offer this support, the DFDL spec will remain as 
it is, with textNumberExponentRep being sensitive to ignoreCase, and DFDL 
implementations that ICU should include a release note that says all 
matches today will be case insensitive but are liable to change. 

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 28/08/2013 16:15 -----

From:   Steve Hanson/UK/IBM
To:     dfdl-wg@ogf.org, 
Date:   21/08/2013 12:21
Subject:        Fw: [DFDL-WG] DFDL ICU Challenges for Implementation

Regarding Mike's second point:

Here's a link to an ICU ticket on this subject: 
http://bugs.icu-project.org/trac/ticket/9659 - currently targetted for ICU 
52.1

Here's some more details from IBM's calls with ICU on this subject:

- Exponent character / ignoreCase : Exponent char is not case sensitive. 
Is this intentional?
    * Priority : Medium

    ICU see two options for this:
    Option 1: Provide an API call to set a flag on the DecimalFormat 
object.
    Option 2: Make it a global policy settable via a config switch. This 
would allow other 'site policies' to be made settable using the same 
mechanism.
          There would be one set of policy flags, including this flag, per 
address space.
              There are differences in date/time processing between C and 
Java that could be dealt with using this mechanism.
              DFDL needs some of these flags to be configurable at 
runtime.

    2012/10/19 Hit an issue where case handling was inconsistent. Fix 
needs care to avoid changing default behaviour and thus breaking existing 
users of the API. 
               Currency and prefix/suffix may need separate switch so 
global switch for the DecimalFormat not appropriate.
               Could provide a patch to ICU for setting exponent char for 
now.

   2013/1/17 - ICU external ticket #9659

        ICU had an issue with the new API being specific to just case 
sensitivity of exponent (and not other regions). 
        DFDL clarified the requirement is for an API to change global case 
sensitivity (not just exponent).
        This is targetted at ICU51

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 14/08/2013 16:55 -----

From:   Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:     "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, 
Date:   14/08/2013 14:14
Subject:        [DFDL-WG] DFDL ICU Challenges for Implementation
Sent by:        dfdl-wg-bounces@ogf.org

There are a couple of features in DFDL that ICU doesn't support, yet where 
all or nearly all the related functionality is supported by ICU. Perhaps 
these aspects of the spec can be revisited?

1) List of Decimal Separators

The textStandardDecimalSeparator property is a list of characters. 
However, ICU only supports a single character.

I see lots of potential for error here, confusing diagnostics, etc. It is 
not consistent with textStandardGrouping separator, which allows only a 
single character.

Is there a use case where we know we need more than one decimal separator? 

The only thing I can think of is a blend of say classic European-style 
decimal numbers like "1 234 567,89" and USA style " 1,234,567.89", but ICU 
won't deal with different grouping separators either.

In any case if there are multiple decimal and grouping separators we 
really don't have these properties right in DFDL. We should require them 
to be specified not as two separate lists, but as a list of pairs, because 
grouping separators match up with specific decimal separator values in a 
format. 

2) Case Insensitivity

Some properties that we use to configure ICU are affected by 
ignoreCase="yes", but ICU does not support case insensitivity. The 
properties are:

   textStandardExponentRepCharacter
   textStandardInfinityRep
   textStandardNaNRep

I can certainly imagine a need for case insensitivity here, and even for 
multiple values for these (though we allow only one for Infinity and NaN). 
For the infinity and nan reps that isn't so problematic as one can easily 
do a pre-check before calling ICU, but for the exponent rep, that is 
needed down in the detailed number format parsing. I can see no certain 
algorithm other than creating separate number format parsers for each 
exponent rep character in provided case, and opposite case, and then using 
them one by one until a successful parse.

Is this ok or do we consider this a mistake?

3)

We are not very consistent in these properties.

We allow multiple textStandardZeroRep values, but only a single 
textStandardInfinityRep, and only a single textStandardNaNRep. 

We allow multiple textStandardExponentRepCharacter, and multiple 
textStandardDecimalSeparator, but only a single 
textStandardGroupingSeparator.

This kind of inconsistency is always problematic for users.

Comments?

--
  dfdl-wg mailing list
  dfdl-wg@ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU