Nevermind. I figured this out. Just slow today I guess.

You can't use 
, but you can use \n in the regex. And similarly \r \t, etc. 

The pattern in question is just:

<xs:pattern value="[A-Za-z0-9 \n\r]*"/>

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy



On Wed, Mar 3, 2021 at 6:32 PM Mike Beckerle <mbeckerle.dfdl@gmail.com> wrote:
Since data contains any characters, the DFDL infoset allows any characters.

However, XML does not allow any characters.

Furthermore, XML Schema Pattern facets are expressed using this XML Schema fragment:
 
<xs:pattern value="...some regex pattern here ..."/>

But XML attributes are normalized by XML readers/parsers. Line endings in them are converted to single spaces.

So

<xs:pattern value="abc
def"/>

is equivalent to:

<xs:pattern value="abc def"/>

Furthermore

<xs:pattern value="abc&#xA;def"/>

is also normalized to

<xs:pattern value="abc def"/>

As far as I can tell there is no alternate notation to this.

This means, if you want to use a pattern facet to specify that a DFDL infoset string can contain A-Za-z0-9 spaces and line endings, there is no way to express this.

This pattern was the example I was dealing with.

<xs:pattern value="[A-Za-z0-9 &#xD;&#xA;]*"/>

If you look at the string for the value attribute of this pattern element, that string already has the line ending characters converted into spaces. The attribute value is
"[A-Za-z0-9   ]*" which has 3 spaces before the "]".

I think there is no workaround for this in XML, XSD, or DFDL.

I dug into the Daffodil implementation and in the code that accesses this attribute, you don't even get a NodeSeq containing a mixture of Text and Entity nodes. You just get a single Text node. So it is pretty well hopeless without reaching under the XML parser/reader's guts.

Hence, in DFDL if you want to "validate" that a DFDL string contains content that includes line-endings with a regex, you have to use dfdl:assert with failureType="recoverableError" testKind="pattern" and testPattern with the regex of interest. This is then a DFDL regex, which is a Java regex, and you can be explicit about line endings allowed.

You can't do it with a pattern facet.

Comments?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense | www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy