Tim
I believe that 'all occurrences
up to minOccurs' being required was intended to mean 'the first
minOccurs occurrences are required' not 'keep looking until you find
minOccurs in the data stream'. Your row 3 would be an error if the element
didn't have a default.
A zero length is always treated as missing
which means that if you did want the empty string in the infoset
you would need to have the empty string as the default.
|
Regards
|
|
Alan Powell
|
|
Development - MQSeries, Message Broker,
ESB
|
IBM Software Group, Application and
Integration Middleware Software
|
-------------------------------------------------------------------------------------------------------------------------------------------
|
IBM
|
MP211, Hursley Park
|
Hursley, SO21 2JN
|
United Kingdom
|
Phone: +44-1962-815073
|
e-mail: alan_powell@uk.ibm.com |
From:
Tim Kimber/UK/IBM@IBMGB
To:
dfdl-wg@ogf.org
Date:
14/02/2011 13:01
Subject:
[DFDL-WG] Arrays
with empty elements
Sent by:
dfdl-wg-bounces@ogf.org
Consider the following schema:
<xs:element
name="array"
minOccurs="1"
maxOccurs="1">
<xs:complexType>
<xs:sequence
dfdl:sequenceKind="ordered"
dfdl:separatorPosition="infix"
dfdl:separatorPolicy="required"
dfdl:separator=",">
<xs:element
name="array_item"
type="xs:string"
minOccurs="2"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Allowed data streams and the resulting info sets ( rendered as XML
) are:
item_value,item_value
| <array>
<array_item>item_value<array_item>
<array_item>item_value<array_item>
</array>
|
item_value,
| <array>
<array_item>item_value<array_item>
</array>
|
,item_value
| <array>
<array_item>item_value<array_item>
</array>
|
,
| <array>
</array> |
Notice rows 2 and 3. The parser has applied the rules in the DFDL specification,
and has treated the zero-length elements as 'missing'. Furthermore, these
missing elements are not required, so they are omitted from the info set.
This is not good - the receiver of the info set has no way to reliably
determine whether the array_item was the first or second item in the array.
If presented to the DFDL serializer, both info sets will produce the data
stream for row 2.
Note that this is a problem only for arrays. A sequence of differently-named
optional elements will not be ambiguous because the element names in the
info set can be used to determine which elements were present in the data.
Possible fixes:
a) Change the definition of 'required' from 'all occurrences up to minOccurs'
to 'all occurrences before the final non-missing occurrence'.
In scenarios like the one above, non-required occurrences would be put
into the infoset with a default value ( assuming that a default was defined
in the model ).
b) provide a dfdl property that controls whether elements with zero-length
content are treated as missing.
The presence of one or more delimiters ( a separator or iniitator or terminator
) implies that an element is present in the data. Currently, DFDL
unconditionally treats an element as 'missing' if its content region is
zero-length - regardless of whether there were any delimiters for that
element.
In this scenario, if the parser acted on that information then the info
sets would be distinguishable. Suggested name for the property would be
'dfdl:emptyValueMissingPolicy' with values 'missing' and 'included'.
a) would require the parser to keep track of the last-reported occurrence
of an array element. When a non-missing occurrence was encountered it would
have to put any previously-skipped non-required occurrences into the infoset
first.
An example might help: one,,,four
Occurences 2 and 3 would be omitted from the infoset because they are zero-length.
Upon ecountering occurrence 4, the parser would have to put occurrence
2 and 3 into the infoset with the xs:default value before putting 4 into
the infoset.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU