For discussion on today'call.....

Regards

Steve Hanson
Programming Model Architect, WebSphere Message  Brokers,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh@uk.ibm.com,
Phone (+44)/(0) 1962-815848

----- Forwarded by Steve Hanson/UK/IBM on 13/01/2010 12:46 -----



** Complex element design added. Please review **

------------------------------------------------------------------------------------------------------

Proposal extends the earlier work done in this area & described by spec section 15.13 and 5.7.  To paraphrase those sections:

5.7
The 'default' attribute is used to provide the logical value of a required element while parsing when the representation is empty (content length is zero).

15.13
When we get 'empty content' from an element, and the element is optional, then it is not present and is not added to the infoset.

When we get empty content from an element, and the element is required, then we start to look at nil handling and default handling properties.
- If the properties are such that the empty string is a nil value then the infoset value is the special value nil.  
- If the properties are such that there is a default value specified then the infoset value is the default value.  
- Otherwise if empty string is valid for the type (ie, is derived from xs:string) then the infoset value is a zero length string.

So we know what empty content is and how it is applied to simple elements. We need to define when it is possible to get empty content and what it means to elements of complex type or of non-string simple type.

Proposal:

1. Parsing

Simple elements

1) It is not a schema definition error nor a processing error if a length is being used to extract data and it is zero. This covers dfdl:lengthKind implicit, explicit, prefixed and endOfParent (when parent length is known). The result is 'empty content'. (Note that for implicit, XSDL allows maxLength/length facet to be 0, so disallowing it for others is not consistent).  

2) It is not a processing error if scanning for data and the length of the returned bytes is zero. This applies to dfdl:lengthKind delimited, pattern and  endOfParent (when parent length is not known). The result is 'empty content'. (This is just stating the obvious).

(The above two rules ensure that it is possible to apply empty content to trigger optional, nil value or default value processing regardless of data type and dfdl:lengthKind).

3) Optional, nil and default processing are applied as per spec.

4) If the element is required, and nil value or default value is not used, and empty string is not in the lexical space of the element's type, then it is a processing error.  

The two initiator related properties dfdl:nilValueInitiatorPolicy and dfdl:defaultValueInitiatorPolicy define whether nils and defaults are applied when initiated empty content is found, they don't affect the definition of empty content or what it means for the type.

[Note: If you recall, this discussion was triggered by a customer that was using an expression to calculate the length of a standard text decimal. He wanted 0 length to mean 0 ended up in the infoset. He can achieve this by making the element required with a default value of 0.]

Complex elements

It is possible to get returned empty content for a complex element for cases 1) and 2) above.  

1) If the complex element is optional then it is not added to the infoset.  

2) If the complex element does not have an initiator specified & is required then it is added to the infoset.

3) If the element has an initiator specified then dfdl:defaultValueInitiatorPolicy applies
        - required => element is added to infoset only if initiator is present (processing error if no initiator & empty content)
        - prohibited => element is added to infoset only if initiator is not present (initiator implies real content follows so processing error if initiator & empty content)

4) If the complex element is added to the infoset, then the parser processes the child content of the complex type. This may or may not cause a processing error.  If it doesn't then default value processing applies for required child elements. If we don't do this then we will not create default values for all missing required simple elements, and that would be wrong.

5) If the contained sequence or choice has an initiator or terminator then it is a processing error.


2. Unparsing

Simple elements

Data in the infoset can result in empty content being added to the bit stream (ie, nothing), with an accompanying 0 value in any length prefix or length expression field, if appropiate to the dfdl:lengthKind.

Complex elements

The absence from the infoset of a required complex element will cause any specified initiator to be output, plus if there are required children then default values will be output for those children. If we don't do this then we will not create default values for nested missing required simple elements, and that would be wrong. This enables creation of a sparse infoset containing just the elements with explicit values, with the rest defaulting regardless of nesting.


3. Choices

Worth noting that the concept of 'required' for the elements of a choice does not apply. Even if minOccurs > 0.


4. Outstanding Issues

Is it ok to reuse dfdl:defaultValueInitiatorPolicy for complex elements? Should it be renamed? Should we add a separate property for complex elements?









Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU