I came across this issue couple of weeks
ago.. the regular expression syntax used in XML Schema is strict
than what is supported in Java regular expression. DFDL regular expression
syntax and restrictions should match XML schema specification..
Here is an example for which APAR has
been opened and we will supplying fix in WMB toolkit to make regular expression
comply to the XML Schema spec...
The following line causes the XML schema
compiler to fail -
<xsd:pattern value="([a-zA-Z0-9
]|\-|\.|_|\(|\)|\\|\/|.&|\')*"/>
Here the customer has escaped forward
slash and single quote characters. Instead of \/ it should be / and instead
of \' it should be '
Following is accepted by XML Schema
compiler..
<xsd:pattern value="([a-zA-Z0-9
]|\-|\.|_|\(|\)|\\|/|.&|')*"/>
Suman Kalia
IBM Canada Lab
WMB Toolkit Architect and Development
Lead
Tel: 905-413-3923 T/L 313-3923
Email: kalia@ca.ibm.com
For info on Message broker
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Tim Kimber <KIMBERT@uk.ibm.com>,
Cc:
dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org
Date:
11/14/2012 12:46 PM
Subject:
Re: [DFDL-WG]
Clarification needed: regular expressions - does '.' match newlines by
default?
Sent by:
dfdl-wg-bounces@ogf.org
I agree with Tim's opinion, but add that this is *NOT*
the default behavior of the java regex library we're using in Daffodil
currently. One must prefix all regex's by (?s) I believe to achieve the
non-default line-ending behavior.
On Wed, Nov 14, 2012 at 11:15 AM, Tim Kimber <KIMBERT@uk.ibm.com>
wrote:
I would vote for this feature to be
switched off by default in DFDL processors. It is mainly useful when dealing
with lines of text, but DFDL formats are not always lines of text.
So to be 100% clear, I think the '.' wildcard should match all characters,
including line endings.
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: dfdl-wg@ogf.org,
Date: 14/11/2012
12:53
Subject: [DFDL-WG]
Clarification needed: regular expressions - does '.' match newlines by
default?
Sent by: dfdl-wg-bounces@ogf.org
A key behavior distinction in regular expressions is whether the '.' wildcard
matches line endings or not.
Regular expression libraries can be configured, usually by some sort of
expression modifier, either way so that the '.' will not match a line ending
or so that it will.
Question is, how is it configured by default in DFDL regular expressions?
This is part of the overall issue of tightening up regular expressions
as part of DFDL. I.e., what exactly is the regex dialect, and how is it
configured by default.
...mike
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg