We've been worrying about a supposed hard case that I think is not very hard.

Consider a DFDL schema that is for a file like this:

field1;field2;field3

field1;fie./ld2;fi./eld3

field1/fi.;eld2/fie;.ld3

field1;fi/.eld2;field./3

Each record contains three fields, all strings. Delimiter is either ".", ";", or "/" depending on what is in the data.

The first field can be unambiguously parsed. It ends in one of ".", ";", or "/" and cannot contain any of those 3. The second and third field are separated by whatever was used to terminate the first field.

The subsequent fields need to use the actual delimiter that was found after field one because they are allowed to contain the other two delimiters as content, as illustrated in the example above where field2 and field3 are broken up with those characters.

To handle this I suggest a schema something like this:

<element name="delim" type="string" dfdl:lengthKind="pattern"

dfdl:lengthPattern="[\.|\;|\/]"/>

<element name="f1" type="string" dfdl:lengthKind="pattern"

dfdl:lengthPattern="(^[\.|\;|\/])*"/>

<dfdl:hidden ref="delim"/>

</appinfo></annotation>

</sequence>

</sequence>

</complexType>

</element>

The above record uses a regexp to pick off the first field excluding all possible delimiters.

Then a hidden field picks off the actual delimiter that is found.

Subsequently there is a sequence, whose separator is specified by referencing the hidden field. This works exactly the way any computed delimiter works. The "delim" field is, in essence, a header field specifying the delimiter.

The cost of this in complexity is that that we have to specify the potential set of delimiters in two regular expression patterns. For a case like this I have no problem with this minor complexity.

I think this can be made to work for parsing. Some details (properties) are missing of course, but the concept should be clear. For an obscure case like this, I think this is very preferable to yet another keyword in DFDL.

For output, I think an output value calc would be needed to figure out the value for the delim field. We would need functions in the expression library to examine the strings in the infoset of field2 and field3 for the possible delimiter characters so that on output we could figure out whether to use ".", ";", or "/" as the delim element's value. I don't know if our proposed function library includes the necessary functions.

Do we need to concern ourselves with unparsing/writing out this kind of format for DFDL v1.0, or is parsing enough?

...mike