MicroSoft's RTF example:

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang9\f0\fs22\par
Line 1: xxxx\par
\b Line 2:\b0 yyyy\par
}

Elements are delimited by the \ (either initiator or prefix separator) of simple fields, or by the { (initiator) of complex fields.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: dfdl-wg@ogf.org,
Date: 04/06/2013 14:47
Subject: [DFDL-WG] terminate by next field's initiator aka lengthKind="endAtStartOfNext" or something like that
Sent by: dfdl-wg-bounces@ogf.org

I know we omitted this from DFDL v1.0 (I am quite sure I advocated that position), and we're too late to add it back now, but while theoretically possible I had never seen this before, but now I have seen it and I'm wondering if it is more common than I originally thought.

The situation is this. I have an element. It wants to be delimited in that it has an escape scheme, and it is delimited by something in the common-sense of the word, but the terminator is actually what one thinks of as the initiator of the next element.

It comes up in Internet Message Format headers as one example:

Reply-To: joe@foo.com
Reply-To: <joe@foo.com>
Reply-To: joe smith<joe@foo.com>
Reply-To: "joe <Mr. XML> smith"<joe@foo.com>
Reply-To: <>

In the 3rd and fourth case, there is no terminator, just the required < which begins the next field.

Modeling this whole reply-to construct requires a choice of several different elements which model the different formats. For example I see no way to model a format which accepts either line one or line 2 of the above without using a choice. That said, my real concern is with lines 3 and 4.

The natural model for lines 3 and 4 (and perhaps 5) seems like it should be a display-name field followed by an email address field. The "<" really does not want to be used in some situations as the terminator of the prior field and in others as the initiator of the next field. That affects reuse of the validation regex's, etc.

Right now the only way to model this is for the display name field to use a regex which re-invents the escape-scheme-like behavior of the optional quotation mark surround, and uses regex lookahead to sense the "<" when it appears unescaped, without consuming it.

That's not too bad really, but I am curious what others have seen out there in the world of data that also has this idiom where a string is delimited by a unique structure at the beginning of the next element.

Do we have collective knowledge of several more such formats, or have we all just seen this same IMF header example as the motivation.

--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU