
Update: I just found errata 3.29, which answers this question, I think.
From the description in the errata, and looking at the documentation for java 7 regular expressions, it looks like DFDL regular expressions conform to level 1 of Unicode Regular expressions (UTS#18).
I still think there would be value in stating such conformance in the DFDL spec, but I suppose that would take some legwork for someone to actually confirm the conformance of ICU and Java7 to level 1. Very respectfully, -- Jonathan Cranford
-----Original Message----- From: Cranford, Jonathan W. Sent: Friday, July 05, 2013 1:36 PM To: dfdl-wg@ogf.org Subject: DFDL regular expressions and Unicode
I've been going through the spec recently, and I have a few questions about DFDL regular expressions.
Rather than put them into one long email, I'll break them up into separate emails.
First question: What level of conformance to Unicode Technical Standard #18 UNICODE REGULAR EXPRESSIONS do DFDL regular expressions claim?
For example, * XML Schema regular expressions are "targeted at support of 'Level 1' features" (http://www.w3.org/TR/xmlschema-2/#dt-ccesN) * Java 1.4 regular expressions "implement its second level of support" (http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html) * Perl 5.18 seems to implement most of Level 1 (http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression- Support-Level)
I think the conformance level should be specified in the DFDL spec so that it is clear to schema designers what a regular expression would really match against. Details like case conversion and canonical equivalence make a difference when matching against a Unicode string.
Thanks in advance,
-- Jonathan W. Cranford <jcranford@mitre.org> Senior Information Systems Engineer The MITRE Corporation (http://www.mitre.org)