DFDL ICU Challenges for Implementation
 
            There are a couple of features in DFDL that ICU doesn't support, yet where all or nearly all the related functionality is supported by ICU. Perhaps these aspects of the spec can be revisited? 1) List of Decimal Separators The textStandardDecimalSeparator property is a list of characters. However, ICU only supports a single character. I see lots of potential for error here, confusing diagnostics, etc. It is not consistent with textStandardGrouping separator, which allows only a single character. Is there a use case where we know we need more than one decimal separator? The only thing I can think of is a blend of say classic European-style decimal numbers like "1 234 567,89" and USA style " 1,234,567.89", but ICU won't deal with different grouping separators either. In any case if there are multiple decimal and grouping separators we really don't have these properties right in DFDL. We should require them to be specified not as two separate lists, but as a list of pairs, because grouping separators match up with specific decimal separator values in a format. 2) Case Insensitivity Some properties that we use to configure ICU are affected by ignoreCase="yes", but ICU does not support case insensitivity. The properties are: textStandardExponentRepCharacter textStandardInfinityRep textStandardNaNRep I can certainly imagine a need for case insensitivity here, and even for multiple values for these (though we allow only one for Infinity and NaN). For the infinity and nan reps that isn't so problematic as one can easily do a pre-check before calling ICU, but for the exponent rep, that is needed down in the detailed number format parsing. I can see no certain algorithm other than creating separate number format parsers for each exponent rep character in provided case, and opposite case, and then using them one by one until a successful parse. Is this ok or do we consider this a mistake? 3) We are not very consistent in these properties. We allow multiple textStandardZeroRep values, but only a single textStandardInfinityRep, and only a single textStandardNaNRep. We allow multiple textStandardExponentRepCharacter, and multiple textStandardDecimalSeparator, but only a single textStandardGroupingSeparator. This kind of inconsistency is always problematic for users. Comments?
participants (1)
- 
                 Mike Beckerle Mike Beckerle