
Here's a draft errata for action 193, for review on the next WG call. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 ============================================================ Section 24 to read as follows: A DFDL regular expression may be specified for the dfdl:lengthPattern format property and the dfdl:testPattern attribute of the dfdl:assert and dfdl:disciminator annotations. DFDL regular expressions do not interpret DFDL entities. A DFDL regular expression is defined by a set of valid pattern characters. For portability, it is recommended that the regular expression pattern is restricted to the inclusive subset of the ICU regular expression [ICURE] and the Java(R) 7 regular expression [JAVARE] with the Unicode character classes flag (UNICODE_CHARACTER_CLASS) turned on. The following regular expression constructs are not common to both ICU and Java(R) 7 and are not recommended in a DFDL regular expression: *Construct* *Meaning* *Notes* \N{UNICODE CHARACTER NAME} Match the named character ICU only \X Match a Grapheme Cluster ICU only \Uhhhhhhhh Match the character with the hex value hhhhhhhh. ICU only (?# ... ) Free-format comment ICU only (?w-w) UREGEX_UWORD - Controls the behaviour of \b in ICU only a pattern. (?d-d) UNIX_LINES - Enables Unix lines mode. Java 7 only (?u-u) UNICODE_CASE - Enables Unicode-aware case folding. Java 7 only - always on for DFDL (?U-U) UNICODE_CHARACTER_CLASS - Enables the Unicode Java 7 only - version of Predefined character classes and POSIX always on for character classes. DFDL (?imsx-imsx:X) X, as a non-capturing group with the given flags. Java 7 only Note that the flags i,s,m,x are valid, but appending :X to the flag is not. Additionally, the behaviour of the word character construct (\w) is not consistent in ICU and Java(R) 7, and is not recommended. In Java (R) 7 \w is [\p{Alpha}\p{gc=Mn}\p{gc=Me}\p{gc=Mc}\p{Digit}\p{gc=Pc}], which is a larger set than ICU where \w is [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]. Character properties are detailed by the Unicode Regular Expressions [UNICODERE]. Section 30 to add: [ICURE] - http://userguide.icu-project.org/strings/regexp [JAVARE] - http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html [UNICODERE] - http://www.unicode.org/reports/tr18/ Section 30 to remove: [PERLRE] - http://perldoc.perl.org/perlre.html#Extended-Patterns [JAVARE] - http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.htm... Andy Andy Edwards - WebSphere Message Broker - DFDL Email: andy.edwards@uk.ibm.com Snail Mail: MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN Tel int: 247222 Tel ext: +44 (0)1962 817222 Desk: DE2 U20 The Feynman problem solving Algorithm 1) Write down the problem 2) Think real hard 3) Write down the answer -- Murray Gell-mann in the NY Times Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU