That looks good to me. Let's close on Tues
WG call.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
"Cranford, Jonathan
W." <jcranford@mitre.org>
To:
Steve Hanson/UK/IBM@IBMGB,
Andrew Edwards/UK/IBM@IBMGB,
Cc:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date:
19/07/2013 20:12
Subject:
RE: [DFDL-WG]
DFDL regular expressions and Unicode - conformance
How does this sound? I just added
a sentence on the end.
> A DFDL regular expression is defined
by a set of valid pattern characters. For
>portability, a DFDL regular expression
pattern is restricted to the inclusive subset
>of the ICU regular expression [ICURE]
and the Java(R) 7 regular expression
>[JAVARE] with the Unicode flags UNICODE_CASE
and
>UNICODE_CHARACTER_CLASS turned on.
DFDL regular expressions
thereby conform to
Unicode Technical Standard
#18 , Unicode Regular Expressions, level 1 [UNICODERE].
>-----Original Message-----
>From: Steve Hanson [mailto:smh@uk.ibm.com]
>Sent: Tuesday, July 16, 2013 9:13 AM
>To: Andrew Edwards
>Cc: dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org;
Cranford, Jonathan W.
>Subject: Re: [DFDL-WG] DFDL regular
expressions and Unicode - conformance
>
>Jonathan
>
>No need for us to contact ICU, as Andy
indicates below ICU and Java both claim
>conformance.
>
>Here's the words from errata 3.29.
Please can you rephrase to combine the
>conformance requirement and the restrictions,
so that we end up with a form you
>are happy with, then we can update
the errata?
>
>A DFDL regular expression is defined
by a set of valid pattern characters. For
>portability, a DFDL regular expression
pattern is restricted to the inclusive subset
>of the ICU regular expression [ICURE]
and the Java(R) 7 regular expression
>[JAVARE] with the Unicode flags UNICODE_CASE
and
>UNICODE_CHARACTER_CLASS turned on.
DFDL regular expressions thereby conform
to
Unicode Technical Standard #18 , Unicode
Regular Expressions, level 1,
>
>Regards
>
>Steve Hanson
>Architect, IBM Data Format Description
Language (DFDL)
>Co-Chair, OGF DFDL Working Group <http://www.ogf.org/dfdl/>
>IBM SWG, Hursley, UK
>smh@uk.ibm.com
<mailto:smh@uk.ibm.com>
>tel:+44-1962-815848
>
>
>
>From: Andrew
Edwards/UK/IBM
>To: Steve
Hanson/UK/IBM@IBMGB,
>Cc: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
dfdl-wg-bounces@ogf.org,
>"Cranford, Jonathan W." <jcranford@mitre.org>
>Date: 11/07/2013
14:19
>Subject: Re:
[DFDL-WG] DFDL regular expressions and Unicode
>
>________________________________
>
>
>
>Hi Jonathan,
>
>Sorry for the delay; first week back
in the office...
>
>As you've noted, errata 3.29 describes
what DFDL regexes are supported.
>Specifically, it is a subset of Java
7's java.util.regex
>(http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
><http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html>
) and
>ICU's regular expression support (http://userguide.icu-project.org/strings/regexp
><http://userguide.icu-project.org/strings/regexp>
), both of which conform with
>level 1 of Unicode technical standard
#18
>
>It looks like there are 2 stages to
checking conformance:
>
>*
Logical - do the available regex constructs provide conformance to the
>technical standard. This is probably
just a couple of hours of reading the Unicode
>standard rules and cross-checking the
constructs in each matching engine.
>*
Actual - do Java 7 and ICU really match properly for each of the
>conformance statements. This
can take an ever increasing amount of time
>testing various sets of data and regex
patterns, and it risks the only reward being
>that we find bugs in Java 7 or ICU.
Minimum would be 3 or 4 days of test
>generation.
>
>
>Does that answer the issue?
>Andy
>Andy Edwards - IBM Integration Bus
<http://www-
>03.ibm.com/software/products/us/en/integration-bus>
- DFDL <https://w3-
>connections.ibm.com/wikis/home?lang=en-
>gb#!/wiki/IBM%20Data%20Format%20Description%20Language>
>
>
>Email: andy.edwards@uk.ibm.com
<mailto:andy.edwards@uk.ibm.com>
>Snail Mail:
MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
>Tel int:
247222
>Tel ext:
+44 (0)1962 817222
>Desk: DE3 V17
>
>The Feynman problem solving Algorithm
> 1) Write down the problem
> 2) Think real hard
> 3) Write down the answer
>-- Murray Gell-mann in the NY Times
>
>
>
>
>
>Unless stated otherwise above:
>IBM United Kingdom Limited - Registered
in England and Wales with number
>741598.
>Registered office: PO Box 41, North
Harbour, Portsmouth, Hampshire PO6 3AU
>
>
>Steve Hanson/UK/IBM
>
>08/07/2013 11:08 To
>"Cranford, Jonathan W." <jcranford@mitre.org>,
>cc
>"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
dfdl-wg-bounces@ogf.org,
Andrew
>Edwards/UK/IBM@IBMGB
>Subject
>Re: [DFDL-WG] DFDL regular expressions
and UnicodeLink
><Notes://D06ML014/80256D7F004ED63A/38D46BF5E8F08834852564B500129B2
>C/8054F31FB22A8880A1C918FA98057ED6>
>
>
>
>
>
>Jonathan
>
>I've copied Andy who added regexs support
into IBM DFDL recently. He might
>have an idea as to the effort involved
in stating conformance.
>
>We will discuss your other two emails
on next DFDL-WG call or so.
>
>Regards
>
>Steve Hanson
>Architect, IBM Data Format Description
Language (DFDL)
>Co-Chair, OGF DFDL Working Group <http://www.ogf.org/dfdl/>
>IBM SWG, Hursley, UK
>smh@uk.ibm.com
<mailto:smh@uk.ibm.com>
>tel:+44-1962-815848
>
>
>
>From: "Cranford,
Jonathan W." <jcranford@mitre.org>
>To: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
>Date: 06/07/2013
00:56
>Subject: Re:
[DFDL-WG] DFDL regular expressions and Unicode
>Sent by: dfdl-wg-bounces@ogf.org
>
>________________________________
>
>
>
>
>Update: I just found errata 3.29, which
answers this question, I think.
>
>From the description in the errata,
and looking at the documentation for java 7
>regular expressions, it looks like
DFDL regular expressions conform to level 1 of
>Unicode Regular expressions (UTS#18).
>
>I still think there would be value
in stating such conformance in the DFDL spec,
>but I suppose that would take some
legwork for someone to actually confirm the
>conformance of ICU and Java7 to level
1.
>
>Very respectfully,
>
>-- Jonathan Cranford
>
>
>>-----Original Message-----
>>From: Cranford, Jonathan W.
>>Sent: Friday, July 05, 2013 1:36
PM
>>To: dfdl-wg@ogf.org
>>Subject: DFDL regular expressions
and Unicode
>>
>>I've been going through the spec
recently, and I have a few questions about
>DFDL
>>regular expressions.
>>
>>Rather than put them into one long
email, I'll break them up into separate
>emails.
>>
>>First question: What level
of conformance to Unicode Technical Standard #18
>>UNICODE
>> REGULAR EXPRESSIONS
do DFDL regular expressions claim?
>>
>> For example,
>> * XML Schema regular
expressions are "targeted at support of 'Level 1'
>>features"
>> (http://www.w3.org/TR/xmlschema-2/#dt-ccesN
><http://www.w3.org/TR/xmlschema-2/#dt-ccesN>
)
>> * Java 1.4 regular
expressions "implement its second level of support"
>>
>(http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
><http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html>
)
>> * Perl 5.18 seems
to implement most of Level 1
>> (http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-
><http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression->
>>Support-Level)
>>
>> I think the conformance
level should be specified in the DFDL spec so that it is
>>clear to schema
>> designers what a
regular expression would really match against. Details
>> like case conversion
and canonical equivalence make a difference when
>> matching against
a Unicode string.
>>
>>Thanks in advance,
>>
>>--
>>Jonathan W. Cranford <jcranford@mitre.org>
>>Senior Information Systems Engineer
>>The MITRE Corporation (http://www.mitre.org
<http://www.mitre.org/>
)
>
>--
> dfdl-wg mailing list
> dfdl-wg@ogf.org
> https://www.ogf.org/mailman/listinfo/dfdl-wg
><https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
>
>
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU