James
The purpose of a discriminator is to
check that the data matches the model at a 'point of uncertainty' such
as a choice branch or an optional element or a variable occurence array.
If there is no discriminator then any processing error causes the
parser to backtrack and try the next thing in the model. If there is a
discriminator and it evaluates to false then the parser again backtracks
and tries the next thing in the model. If it evaluates to true, then the
parser knows for sure it is in the right place in the model - and crucially
that any subsequent processing error is a hard error and does not
cause backtracking at that point of uncertainty.
In the 'abc' example you gave, there
are no initiators, so a discriminator must be used that looks at the data
content. But if you have an initiator, like IMF headers have, you can use
the initiator to do the discrimination. Set property dfdl:initiatedContent
'yes' on the choice itself. This acts just like a discriminator when an
initiator matches and will stop backtracking happening. You no longer need
the discriminators.
<xsd:element dfdl:occursCountKind="implicit"
dfdl:terminator="%NL;%WSP*;"
maxOccurs="unbounded" name="HeaderArray">
<xsd:complexType>
<xsd:choice
dfdl:choiceLengthKind="implicit" dfdl:initiatedContent="yes">
<xsd:element
name="From" type="xsd:string" dfdl:initiator="From:%WSP*;"
dfdl:ignoreCase="yes"/>
<xsd:element
name="To" type="xsd:string" dfdl:initiator="To:%WSP*;"
dfdl:ignoreCase="yes"/>
<xsd:element
name="ReturnPath" type="xsd:string" dfdl:initiator="Return-Path:%WSP*;"
dfdl:ignoreCase="yes"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
If the %NL; is not always going to be present, and the data content of
one element can be terminated by the initiator of the next element, then
you need to use lengthKind 'pattern' as Mike showed in his mail. (It wasn't
clear to me whether that was the same or different IMF data).
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
"Garriss Jr.,
James P." <jgarriss@mitre.org>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date:
06/03/2013 19:05
Subject:
Re: [DFDL-WG]
unordered sequence with constrained occurrences
Sent by:
dfdl-wg-bounces@ogf.org
Suppose I’m modeling IMF
headers, many of which can have the exact same form, stuff like:
From: john@doe.com
To: jane@gmail.com
Return-Path: bob@yahoo.com
Etc. Remember that
these can be in any order, so they are an unordered sequence.
The way that we’ve modeled
these headers so far, the “From:” and “To:” and so on have been initiators;
they aren’t elements. But when I use our workaround for an unordered
sequence, which requires discriminators, I am in trouble. Because
the thing that discriminates all of these headers is an initiator, not
an element.
So, it seems to me that I
need to change all my headers so that the “From:” and “To:” and such
are no longer initiators but elements.
Does that sound right?
The more I work with this
workaround, the more hackish it feels, and the more I think that unordered
sequences should be part of DFDL 1.0. Maybe?
From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Wednesday, March 06, 2013 4:16 AM
To: Garriss Jr., James P.
Cc: dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org
Subject: Re: [DFDL-WG] unordered sequence with constrained occurrences
James,
The checkConstraints function is just a convenience that saves you having
to duplicate constraints in an assert or discriminator. For now, just duplicate
the constraint as a discriminator. This works fine as long as you can express
the constraint as a DFDL expression, which with your example you can.
I've tested your xsd exactly as you supplied below (without the terminator)
on my latest MBTK and it parses 'abc' fine. I don't see the infinite loop
error. We did have some bugs in that area where the check was being applied
too strictly which we fixed.

I then tried with 'cba' which parsed without error, except of course that
the values ended up in the wrong elements. So I added discriminators to
check that the elements matched their fixed value, and 'cba' then parsed
into the correct elements.
<xsd:element dfdl:length="1" dfdl:lengthKind="explicit"
dfdl:occursCountKind="implicit" fixed="b" minOccurs="0"
name="b" type="xsd:string">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:discriminator>{.
eq 'b'}</dfdl:discriminator>
</xsd:appinfo>
</xsd:annotation>
</xsd:element>

I then tried with more complex strings, such as 'cbabaccba',
and they all parsed ok.

To make the infoset more symmetric, with one child per array occurrence,
you can use a choice instead of a sequence.

Making that change then results in:

Here's the xsd with discriminators and choices. See if it works with your
MBTK.
If you are still hitting the infinite loop error then add the %NL; terminator
to the array element. This will parse data of the form:
c
b
a
b
a
c
c
b
a
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: "Garriss
Jr., James P." <jgarriss@mitre.org>
To: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date: 05/03/2013
19:15
Subject: Re:
[DFDL-WG] unordered sequence with constrained occurrences
Sent by: dfdl-wg-bounces@ogf.org
> The error message is because you
don't make forward progress through the data with potentially unbounded
occurrences.
I think you just said, “MBTK prevents an infinite loop.” That makes
sense.
> If there are delimiters then
model those and you might not get the error.
I think you just said, “To let MBTK know when it should stop checking,
you need a terminator of some sort.” That also makes sense. So
I added a terminator (%NL;) here:

Good news: That fixed the problem, so long as my input is “abc”.
Bad news: This breaks if the input is any other legal value, such
as “abbc” or “cba” or “b”.
The problem for all of these is that my dear friend, checkConstraints,
is not implemented yet, thus I can’t prevent the parser from slurping
up the wrong character. I don’t know how anyone can build a non-trivial
DFDL schema that involves any sort of choice without this method; I swear,
it must be the single most important thing you guys have created for DFDL.
Until checkConstraints is implemented, I’m not really able to test this
schema with MBTK.
Thanks so much for your help answering my questions, Steve!
From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Tuesday, March 05, 2013 1:46 PM
To: Garriss Jr., James P.
Cc: dfdl-wg@ogf.org;
dfdl-wg-bounces@ogf.org
Subject: Re: [DFDL-WG] unordered sequence with constrained occurrences
James,
The error message is because you don't make forward progress through the
data with potentially unbounded occurrences. Is this because you are using
a cut-down schema? If there are delimiters then model those and you
might not get the error.
Once you have processed the array you can use asserts to check the count.
However IBM DFDL does not implement the count functions yet.
Give me a couple of days to look at this more closely. I have a customer
visit tomorrow hence the delay.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: "Garriss
Jr., James P." <jgarriss@mitre.org>
To: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date: 05/03/2013
16:19
Subject: Re:
[DFDL-WG] unordered sequence with constrained occurrences
Sent by: dfdl-wg-bounces@ogf.org
Hmmm, maybe not. I said:
> The unordered sequence can be modeled
with a data array
Yet when implemented in MBTK, it throws a fatal error:
fatal: CTDP3148E: Infinite loop at offset 3: The DFDL parser cannot process
array element 'ABCarray' because maxOccurs is unbounded and the length
of the previous occurrence was zero.
I think what happens is that on the last pass through the array, it doesn’t
find a, b, or c, so it throws a fatal error.
So is this a bug in MBTK? Or can DFDL not model an unordered sequence?
Or am I just doing it wrong?
Here’s a sample DFDL schemas that illustrates the point:
<?xml
version="1.0"
encoding="UTF-8"?>
<xsd:schema
xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
xmlns:fmt="http://www.ibm.com/dfdl/GeneralPurposeFormat"
xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:import
namespace="http://www.ibm.com/dfdl/GeneralPurposeFormat"
schemaLocation="IBMdefined/GeneralPurposeFormat.xsd"
/>
<xsd:element
ibmSchExtn:docRoot="true"
name="ABC">
<xsd:complexType>
<xsd:sequence
dfdl:separator="">
<xsd:annotation>
<xsd:appinfo
source="http://www.ogf.org/dfdl/">
<dfdl:sequence
/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element
dfdl:occursCountKind="implicit"
maxOccurs="unbounded"
minOccurs="1"
name="ABCarray">
<xsd:complexType>
<xsd:sequence
dfdl:separator="">
<xsd:element
dfdl:length="1"
dfdl:lengthKind="explicit"
dfdl:occursCountKind="implicit"
fixed="a"
minOccurs="0"
name="a"
type="xsd:string"
/>
<xsd:element
dfdl:length="1"
dfdl:lengthKind="explicit"
dfdl:occursCountKind="implicit"
fixed="b"
minOccurs="0"
name="b"
type="xsd:string"
/>
<xsd:element
dfdl:length="1"
dfdl:lengthKind="explicit"
dfdl:occursCountKind="implicit"
fixed="c"
minOccurs="0"
name="c"
type="xsd:string"
/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:annotation>
<xsd:appinfo
source="http://www.ogf.org/dfdl/">
<dfdl:format
ref="fmt:GeneralPurposeFormat"
/>
</xsd:appinfo>
</xsd:annotation>
</xsd:schema>
Test with “abc” as sample input.
From: Garriss Jr., James P.
Sent: Tuesday, March 05, 2013 8:43 AM
To: dfdl-wg@ogf.org
Subject: unordered sequence with constrained occurrences
Suppose text data has 3 constructs: a, b, and c.
·
a must occur 1 time
·
b can occur 0 or 1 time
·
c can occur any number of times, 0 or
more
These 3 constructs can appear in any order.
So these are valid inputs:
abc
a
bcccca
But these are not:
ccbcc
abbc
abcabc
Can data like this be modeled with DFDL?
The unordered sequence can be modeled with a data array, like this:
Array (0 to unbounded)
Sequence
a (0 to 1)
b (0 to 1)
c (0 to 1)
/Sequence
/Array
But I don’t know how to constrain the total number of occurrences.
Appreciate any ideas!--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU