MicroSoft's RTF example:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0
Calibri;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang9\f0\fs22\par
Line 1: xxxx\par
\b Line 2:\b0 yyyy\par
}
Elements are delimited by the \ (either
initiator or prefix separator) of simple fields, or by the { (initiator)
of complex fields.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
dfdl-wg@ogf.org,
Date:
04/06/2013 14:47
Subject:
[DFDL-WG] terminate
by next field's initiator aka lengthKind="endAtStartOfNext" or
something like that
Sent by:
dfdl-wg-bounces@ogf.org
I know we omitted this from DFDL v1.0 (I am quite sure I advocated that
position), and we're too late to add it back now, but while theoretically
possible I had never seen this before, but now I have seen it and I'm wondering
if it is more common than I originally thought.
The situation is this. I have an element. It wants to be delimited in that
it has an escape scheme, and it is delimited by something in the common-sense
of the word, but the terminator is actually what one thinks of as the initiator
of the next element.
It comes up in Internet Message Format headers as one example:
Reply-To: joe@foo.com
Reply-To: <joe@foo.com>
Reply-To: joe smith<joe@foo.com>
Reply-To: "joe <Mr. XML> smith"<joe@foo.com>
Reply-To: <>
In the 3rd and fourth case, there is no terminator, just the required <
which begins the next field.
Modeling this whole reply-to construct requires a choice of several different
elements which model the different formats. For example I see no way to
model a format which accepts either line one or line 2 of the above without
using a choice. That said, my real concern is with lines 3 and 4.
The natural model for lines 3 and 4 (and perhaps 5) seems like it should
be a display-name field followed by an email address field. The "<"
really does not want to be used in some situations as the terminator of
the prior field and in others as the initiator of the next field. That
affects reuse of the validation regex's, etc.
Right now the only way to model this is for the display name field to use
a regex which re-invents the escape-scheme-like behavior of the optional
quotation mark surround, and uses regex lookahead to sense the "<"
when it appears unescaped, without consuming it.
That's not too bad really, but I am curious what others have seen out there
in the world of data that also has this idiom where a string is delimited
by a unique structure at the beginning of the next element.
Do we have collective knowledge of several more such formats, or have we
all just seen this same IMF header example as the motivation.
--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU