033
| 04/03: Assert/Discriminator semantics. AP to document. TK to check uses of discriminator besides choice. |
I believe the rules should be:
1. A point of uncertainty is any of
* an element which has minOccurs !=
maxOccurs
* a choice
* a sequence with sequenceKind="unordered"
2. Nested within the scope of a point
of uncertainty, there might be other points of uncertainty.
3. A discriminator which evaluates to
true resolves the nearest in-scope point of uncertainty.
4. An assertion which evaluates to false
causes a processing error
5. Any processing error ( from an assertion
failure or otherwise ) will cause the parser to backtrack to the nearest
unresolved point of uncertainty and try the next available branch, if any.
If there are no more branches available, the parser will backtrack to the
next nearest unresolved point of uncertainty.
6. A processing error which reaches
the root tag is reported to the host application.
7. Assertions and discriminators are
allowed on any point of uncertainty ( not only on the branches of a choice
)
Rationale:
If we only allow a discriminator on
a choice branch, then it will be difficult to model this common style of
message
Tagged header, minOccurs="1",
maxOccurs="1"
Untagged body, maxOccurs="unbounded"
Tagged trailer, minOccurs="1",
maxOccurs="1"
An example with 3 occurrences of the
body would be:
HE,headerfield1,headerfield2,headerfield3
John Smith, 100, bodyfield3
John Brown, 200, bodyfield3
Elton John, 30Z, bodyfield3
TR,trailerfield1,trailerfield2,trailerfield3
And the DFDL schema would look something
like this ( excuse the almost inevitable errors, this is just for completeness
):
...
<xs:element name="message">
<xs:complexType dfdl:lengthKind="implicit">
<xs:sequence
dfdl:separator="\r\n">
<xs:element
name="header" initiator="HE,">
<xs:complexType>
<xs:sequence dfdl:separator=",">
<xs:element name="header1" type="xs:string">
<xs:element name="header2" type="xs:string">
<xs:element name="header3" type="xs:string">
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element
name="body" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence dfdl:separator=",">
<xs:element name="body1" type="xs:string">
<xs:element name="body2" type="xs:int">
<xs:element name="body3" type="xs:string">
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element
name="trailer" initiator="TR,">
<xs:complexType>
<xs:sequence dfdl:separator=",">
<xs:element name="trailer1" type="xs:string">
<xs:element name="trailer2" type="xs:string">
<xs:element name="trailer3" type="xs:string">
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
The '30Z' value for the final occurrence
of element Body2 is incorrect. It is not a valid integer, and will trigger
a processing error.
Without a discriminator, this failure
will cause the parser to backtrack to the optional field and try the next
element ( the trailer element ). The initiator will not be found, and the
reported error will be "Initiator 'TR,' not found for element 'trailer'".
The user would almost certainly prefer "Invalid value '30Z' for element
'body2'. Value could not be converted to simple type 'xs:int'"
For this example, the discriminator
would need to detect unambiguously that it really was dealing with a Body
element and not a Trailer element. Due to the message style ( which is
quite common ) the only way to do this is to detect that it is *not* a
Trailer. I cannot think of an elegant way to do that using the facilities
in v0.33 of the specification. I have raised this with Alan and Steve.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU