Here's a draft errata for action 193, for
review on the next WG call.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK smh@uk.ibm.com
tel:+44-1962-815848
A DFDL regular expression may be specified
for the dfdl:lengthPattern format property and the
dfdl:testPattern attribute of the dfdl:assert
and dfdl:disciminator annotations. DFDL regular
expressions do not interpret DFDL entities.
A DFDL regular expression is defined by
a set of valid pattern characters. For portability,
it is recommended that the regular expression
pattern is restricted to the inclusive subset
of the ICU regular expression [ICURE] and
the Java(R) 7 regular expression [JAVARE] with the
Unicode character classes flag (UNICODE_CHARACTER_CLASS)
turned on. The following regular expression
constructs are not common to both ICU and
Java(R) 7 and are not recommended in a DFDL regular
expression:
*Construct*
*Meaning*
*Notes*
\N{UNICODE CHARACTER NAME} Match
the named character
ICU only
\X
Match a Grapheme Cluster
ICU only
\Uhhhhhhhh
Match the character with the hex value
hhhhhhhh. ICU only
(?# ... )
Free-format comment
ICU only
(?w-w)
UREGEX_UWORD - Controls the behaviour
of \b in ICU only
a pattern.
(?d-d)
UNIX_LINES - Enables Unix lines
mode. Java 7 only
(?u-u)
UNICODE_CASE - Enables Unicode-aware
case folding. Java 7 only -
always on for
DFDL
(?U-U)
UNICODE_CHARACTER_CLASS - Enables
the Unicode Java 7 only -
version of Predefined
character classes and POSIX always on for
character classes.
DFDL
(?imsx-imsx:X)
X, as a non-capturing group with the given flags.
Java 7 only
Note that the flags
i,s,m,x are valid, but
appending :X to
the flag is not.
Additionally, the behaviour of the word
character construct (\w) is not consistent in ICU and Java(R) 7,
and is not recommended. In Java (R) 7 \w
is [\p{Alpha}\p{gc=Mn}\p{gc=Me}\p{gc=Mc}\p{Digit}\p{gc=Pc}],
which is a larger set than ICU where \w
is [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}].
Character properties are detailed by the
Unicode Regular Expressions [UNICODERE].
MP211,
Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
Tel
int:
247222
Tel
ext:
+44
(0)1962 817222
Desk:
DE2
U20
The
Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU