Since data contains any characters, the DFDL infoset allows any characters.
However, XML does not allow any characters.
Furthermore, XML Schema Pattern facets are expressed using this XML Schema fragment:
<xs:pattern value="...some regex pattern here ..."/>
But XML attributes are normalized by XML readers/parsers. Line endings in them are converted to single spaces.
So
<xs:pattern value="abc
def"/>
is equivalent to:
<xs:pattern value="abc def"/>
Furthermore
<xs:pattern value="abc
def"/>
is also normalized to
<xs:pattern value="abc def"/>
As far as I can tell there is no alternate notation to this.
This
means, if you want to use a pattern facet to specify that a DFDL
infoset string can contain A-Za-z0-9 spaces and line endings, there is
no way to express this.
This pattern was the example I was dealing with.
<xs:pattern value="[A-Za-z0-9 
]*"/>
If
you look at the string for the value attribute of this pattern element,
that string already has the line ending characters converted into
spaces. The attribute value is
"[A-Za-z0-9 ]*" which has 3 spaces before the "]".
I think there is no workaround for this in XML, XSD, or DFDL.
I dug into the Daffodil implementation and in the code that accesses this attribute, you
don't even get a NodeSeq containing a mixture of Text and Entity nodes.
You just get a single Text node. So it is pretty well hopeless without
reaching under the XML parser/reader's guts.
Hence,
in DFDL if you want to "validate" that a DFDL string contains content
that includes line-endings with a regex, you have to use dfdl:assert
with failureType="recoverableError" testKind="pattern" and testPattern
with the regex of interest. This is then a DFDL regex, which is a Java
regex, and you can be explicit about line endings allowed.
You can't do it with a pattern facet.
Comments?