
For the subset of ICU symbols that DFDL supports, here is what ICU claim: 1) Lenient parsing behaviour when in 'strict' mode: a) case insensitive matching for text fields b) MMM, MMMM, MMMMM all accept either short or long form of Month c) E, EE, EEE, EEEE, EEEEE **, EEEEEE *** all accept either abbreviated, full, narrow and short forms of Day of Week d) accept truncated leftmost numeric field (eg, pattern "HHmmss" allows "123456" (12:34:56) and "23456" (2:34:56) but not "3456") 2) Additional lenient parsing behaviour when in 'lax' mode: a) values outside valid ranges are normalized (eg, "March 32 1996" is treated as "April 1 1996") b) ignoring a trailing dot after a non-numeric field c) leading and trailing whitespace in the data but not in the pattern is accepted **** d) whitespace in the pattern can be missing in the data e) partial matching on literal strings (eg, data "20130621d" allowed for pattern "yyyyMMdd'date' " **** ** Bug found when testing this - EEEEE 'narrow' form completely broken - ICU ticket raised. *** EEEEEE and eeeeee are new and support a 2 char version of 'short' form - eg Tu or Mo. Not currently allowed by DFDL, we should consider allowing it. **** Only currently in ICU4C. ICU4J will be changed to match ICU4C. Note: IBM is in discussion with ICU to provide a 'really strict' mode (name tbd) which has no leniency at all. We need to decide whether to reflect all three variants in the dfdl:calendarCheckPolicy, or whether to remap our 'strict' to the new 'really strict' mode when it appears. Given where we are I think is a DFDL 2.0 item. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU