What the lengthPattern property consumes
is taken to be the content of the element. So your approach B) is correct.
The regex you need uses 'lookahead' syntax:
.+(?=LastName)
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
"Garriss Jr.,
James P." <jgarriss@mitre.org>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date:
06/03/2013 13:37
Subject:
Re: [DFDL-WG]
can DFDL model this? (initiators, but no separators or terminators, plus
optional elements)
Sent by:
dfdl-wg-bounces@ogf.org
So obviously, Mike, you gave
me exactly the right answer previously, but I just didn’t get it. With
the extra info that you and Steve supplied, I think I’m getting it. Thank
you both!
Question: What goes
in the regex in the lengthPattern property?
A) Is it just the next initiator,
something like this?
<element name=”FirstName”
lengthKind=”pattern” lengthPattern=”(LastName)” initiator=”FirstName”/>
B) Is it the entire contents
of the element along with the next initiator, something like this?
<element name=”FirstName”
lengthKind=”pattern” lengthPattern=”[.]+(LastName)” initiator=”FirstName”/>
From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Wednesday, March 06, 2013 4:16 AM
To: Garriss Jr., James P.
Cc: dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org
Subject: Re: [DFDL-WG] can DFDL model this? (initiators, but no separators
or terminators, plus optional elements)
To find the next initiator, you must know
it, so you should be able to express this in a regex.
A good example of a format like this is RTF. The start of an embedded
sequence is indicated by '{'. The field prior to that has no terminator,
so you use lengthKind 'pattern' and a regex that consumes everything up
to but not including a '{'.
Adding initiators to the list of in-scope terminating delimiters has been
discussed in the DFDL WG, but was rejected on complexity grounds. Knowing
the full list of all possible initiators gets hairy when you have lots
of optionality or unordered behaviour.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: "Garriss
Jr., James P." <jgarriss@mitre.org>
To: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date: 05/03/2013
20:28
Subject: Re:
[DFDL-WG] can DFDL model this? (initiators, but no separators or terminators,
plus optional elements)
Sent by: dfdl-wg-bounces@ogf.org
Good point, thank you.
This is a good solution if your data follows nice, easily discerned patterns
that can be captured with a regex.
But what do you do if there’s no pattern? What do you do if the
only way to know you’re at the next element is to find the next initiator?
From: Mike Beckerle [mailto:mbeckerle.dfdl@gmail.com]
Sent: Tuesday, March 05, 2013 3:06 PM
To: Garriss Jr., James P.
Cc: dfdl-wg@ogf.org
Subject: Re: [DFDL-WG] can DFDL model this? (initiators, but no separators
or terminators, plus optional elements)
This is what lengthKind='pattern' is for. To give you the ability to use
a regex with non-capturing lookahead.
On Tue, Mar 5, 2013 at 2:52 PM, Garriss Jr., James P. <jgarriss@mitre.org>
wrote:
Suppose I have this input data:
FirstName James LastName Garriss Hometown Raleigh Company The MITRE
Corporation CRLF
To the human eye, this is simple. We have four elements, each of
which has an initiator. But to make things more interesting:
1.
The elements are
all strings, and they do not have fixed lengths, set values, or any other
terminator. The only way you know them apart is by the initiator.
(And this implies that the initiators cannot be part of the elements.)
2.
There are no separators
(spaces can be in the data).
3.
The third and
fourth elements are optional.
So these are both valid data:
FirstName John Mark LastName Smith
FirstName Bob LastName Brown Company IBM
How do we model this?
Attempt #1:
I have four elements each with a unique initiator (FirstName, LastName,
Hometown, Company). The problem is that there’s no way to know when
the first element terminates, so everything after the “FirstName” initiator
ends up in the FirstName element. Oops.
Attempt #2:
I got funky with the terminators. The first element has LastName
as a terminator. The second element has Hometown or Company as an
element. The third element has Company or %NL; as an element. And
the fourth one uses %NL;. Works great, unless the optional third
element isn’t there. IOW, if I have this input:
FirstName Bob LastName Brown Company IBM
Then “IBM” winds up in Hometown element. Oops.
So, what to do? I don’t know. I don’t know how to solve this.
Hopefully you’re going to teach me about some feature I don’t yet
know.
If not, then I have a potential solution, an addition to the spec. Add
this option as a terminator: “This element terminates when you find
the initiator to the next element.” That’s probably easier said
than done, but it seems to make sense in this context.
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU