I don’t understand your two notions
of infoset wins vs target field wins. It would be useful to discuss these
cases.
There are plenty of data formats that
require buffering in order to deal with. I think we simply have to say that
DFDL implementations may have to buffer data when formats require it.
Note that DFDL implementations might
buffer data unnecessarily, i.e., when other more clever implementations can
figure out how NOT to buffer, but the converse is not true. There are formats
where every DFDL implementation must do some buffering.
From:
dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On Behalf Of Steve Hanson
Sent: Monday, April 07, 2008 11:41
AM
To: Alan Powell
Cc: dfdl-wg@ogf.org
Subject: Re: [DFDL-WG] Fw: Nulls
and Defaults
Hi Alan
That's
just one example of unparsing behaviour that impacts streaming. There's the
target of length XPaths and occurs XPaths as well.
I
think this is something we need to discuss and ratify. We either have the
principle that the content of the infoset wins and sets the target fields, or
the target field wins and determines who the infoset is interpreted. IBM's MRM
unparser follows the second of these, DFDL follows the first.
Regards, Steve
Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Alan Powell/UK/IBM@IBMGB 03/04/2008 12:59 |
|
I have made the changes discussed yesterday plus the following
I am concerned about one of the changes
nilIndicatorPath |
Path Expression Path
to a logical Boolean field which indicates if this element is null.
For
nullKind='nullIndicator'., a path expression referencing
another element that must be of type Boolean which indicates if this element
is null. On input, the element value is null if the provided value is true.
When
null, on input the element is parsed as normal. If the element length is
known then the value is skipped otherwise the value must be scannable. When
null, on output the value is set based on fillByte or padCharacter properties
and the
referenced value set to true. If
non-null then the element is parsed or output normally and the referenced value set to
false. Annotation:
dfdl:element (all simple types) |
By setting the referenced nil indicator we have made it impossible/difficult to
implement a streaming unparser. I'm not sure that is a good idea.
Also unless we relax the expression rules the indicator bit must be before the
element.
Please review sections 13.8-13.10
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
----- Forwarded by Alan Powell/UK/IBM on 03/04/2008 12:20 -----
From:
|
Alan Powell/UK/IBM |
To:
|
Steve Hanson/UK/IBM@IBMGB, "Mike Beckerle"
<mbeckerle@OCO-INC.COM> |
Date:
|
01/04/2008 18:35 |
Subject:
|
Re: Fw: Nulls and Defaults (was [DFDL-WG] OGF
DFDL WG minutes 2007-12-05 call) |
Steve, Mike
I have finally got around to finishing this off and it turned out to be a lot
more work than I expected as the default and nulls information as all in the
wrong places.
Changes
13.8 Properties for Nullable Elements
Updated as requested.
nullKind=xpath changed to nullIndicator as it was xpath is also used in
nullValue so it was confusing.
13.9 Properties for Default Value Control
Moved from most of 17.1.1.1 and 17.2 so is now the main description of
defaults.
13.10 Nulls, Defaults, and Initiators
Moved from 14.2.1
Updated as requested
17.1.1.1 Repeating and Variable-Occurrence Items
and Default Values
Remainder of discussion of variable occurrences.
Outstanding issues
5)
Is the list style syntax for dfdl:nullValues acceptable?
Yes because you can use <dfdl:property
name=”nullValues”>”” “null”
“NULL”</dfdl:property>
So
what is the syntax and it has to include expressions.
7)
Consistent use of nil versus null.
=> I'm wondering that we should standardise on nil to match
xsd ?
(standardize on nil, not null).
Does
everyone agree to this as it is a significant change to the document.?
9)
nullIndicatorPath |
Expression Used
when nullKind='nullIndicator'. A path expression referencing
another element that provides the logical value to compare with nullValues On
input, the element value is null if the provided value matches in nullValues.
When
null, If the element is fixed length then it will be skipped on input, filled
with (TBD: fillbyte?) on output.. Is this correct??? Should
it set element to Null? When
null If the element is variable length with minimum length > 0, then a
minimum length item will be skipped over, or on output filled (TBD with
fillbyte?). When
null If the element is variable length with minimum length 0, then a length
zero object is expected on input, and a length 0 object will be generated on
output. If
non-null then the element is parsed or output normally. Annotation:
dfdl:element (all simple types) |
10)
useNullValueForDefault |
Boolean Ignored
on input. IS this correct. Shouldn't
it set null if element is required? On
output, if an element is not in the logical model, but it is required, the
element is nillable, and has dfdl:useNullValueForDefault="true",
then the logical value is defaulted to null. Annotation:
dfdl:element (all simple types) |
Can you make sure you are happy with the changes.
[attachment "ogf-dfdl-v1.0-Core-032.doc" deleted by Alan
Powell/UK/IBM]
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
From:
|
Alan Powell/UK/IBM |
To:
|
Steve Hanson/UK/IBM@IBMGB |
Cc:
|
Alan Powell/UK/IBM |
Date:
|
07/02/2008 17:13 |
Subject:
|
Re: Fw: Nulls and Defaults (was [DFDL-WG] OGF
DFDL WG minutes 2007-12-05 call) |
Steve
I have done most of this update. See below
Will co,plete in next rev
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
Steve Hanson/UK/IBM 06/02/2008 09:26 |
|
Hi Alan - nulls and defaults changes below.
Regards, Steve
Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 06/02/2008 09:25 -----
"Mike Beckerle"
<mbeckerle@OCO-INC.COM> 05/02/2008 21:17 |
|
Looks good.
From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Tuesday, February 05, 2008 12:22 PM
To: Mike Beckerle
Subject: RE: Nulls and Defaults (was [DFDL-WG] OGF DFDL WG minutes
2007-12-05 call)
Hi Mike
Looks good, small corrections in blue.
With those made we can send to Alan I think.
Regards, Steve
Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle"
<mbeckerle@OCO-INC.COM> 05/02/2008
14:57 |
|
Issues this raises
1)
How can you represent empty string as
a) a null value?
b)
a default value (not sure you can)?
1 Proposal: Input Defaulting for Empty Strings
This is a corner case for strings. If an element is of type string, and has a
default value specified, it is not clear whether the empty string should be an
allowed value or if the empty string, when found in the representation, should
trigger use of the default value instead.
The following makes this corner case unambiguous:
Not
yet added. Wasn't sure where this should go as the Nulls,Default info is
scattered around
This eliminates complexities around the issue of “empty” content.
Empty content always triggers use of the default value. If the type is string
and empty string is a legal value then there cannot be a default value.
We also need the same for null values
Not
yet added. Wasn't sure where this should go as the Nulls,Default info is
scattered around
2) Why are nullIndicatorPath and nullIndicatorIndex separate properties?
Convenience. So you can scope the nullIndicatorPath, and
have local indices.
3) What does 'missing' mean when initiators are involved?
=> Covered by extra properties dfdl:nullValueInitiatorPolicy
& dfdl:defaultValueInitiatorPolicy, as given by tables in 14.2.1.1 and
14.2.1.2
=> I think the bottom row of the table in 14.2.1.2 is
incorrect - in the infoset, empty string and missing element are two distinct
cases - how do/did we resolve this?
Changes to this definition:
defaultValueInitiatorPolicy |
Enum |
Added
1.1.1.1 Initiators and Output
This table describes the output direction logic for an initiated element that
is a required element. We assume here that dfdl:initiator is specified and not
equal to the empty string.
Logical Value |
nullValueInitiatorPolicy |
|
initiator region contains |
content region contains |
nil |
prohibited |
don't care |
nothing |
representation of nil based on nullKind, nullValues,
etc. |
required |
initiator string |
|||
"" (empty string) |
don't care |
initiator string |
empty string |
|
a non-nil
non-empty-string value |
don't care |
initiator string |
The representation of the logical value |
|
Not supplied |
Don’t care |
Don’t care |
Initiator string |
The representation of the default value. |
Not supplied |
Prohibited |
True |
Nothing |
Representation of nil basd on nullKind, nullValues,
etc. |
Required |
Initiator string |
|||
Don’t care |
False |
Initiator String |
The representation of the default value. |
Added but had trould with table format as couldn't
copy/paste.
4) What controls null versus default for a missing element on output?
=> Extra property dfdl:useNullValueForDefault
See above.
5) Is the list style syntax for dfdl:nullValues acceptable?
Yes because you can use <dfdl:property
name=”nullValues”>”” “null”
“NULL”</dfdl:property>
Which avoids quoting hell.
(there’s still some issue of
list-valued expressions.)
6) Error cases - need to enumerate these
=> Input. Required element missing and no default value.
(processing error)
=> Output. Required element missing and no default value or
null value.
(processing error)
=> Output. Element is null
and is not nillable.
(processing error at least. It may be
possible for some implementations to detect this error sooner.)
=> ?
7) Consistent use of nil versus null.
=> I'm wondering that we should standardise on nil to match
xsd ?
(standardize on nil, not null).
Regards, Steve
Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 04/02/2008 16:06 -----
Hi Mike
In preparation for our discussion on nulls and defaults tomorrow.....
First of all I'd like to restate what I see as the requirements:
Uncontentious core properties
xs:default
xs:fixed
dfdl:nullKind
dfdl:nullValues
dfdl:nullIndicatorPath
dfdl:nullIndicatorIndex
Assumptions
-
'Required' below is as defined in section 17.1.1.1.
- The term 'default value' below actually means 'xs:default or xs:fixed'
- Both default values and null values only apply to simple elements
Input
-
If a required element is missing from the data stream and it has a default
value, that will be used as the infoset value of the element
- If an element is nillable and has a value in the data stream which matches
one of a list of null values, the infoset value of the element will be the
special value null
Output
-
If a required element is missing from the infoset and it has a default value,
optionally that will be used as the infoset value of the element
- If a required element is missing from the infoset, optionally the special
value null will be used as
the infoset value of the element
- If an element is nillable and has an infoset value null , the value in the data stream will be the first of the
list of null values
Issues this raises
1)
How can you represent empty string as
a) a null value?
b) a default value (not sure you can)?
2) Why are nullIndicatorPath and nullIndicatorIndex separate properties?
3) What does 'missing' mean when initiators are involved?
=> Covered by extra properties
dfdl:nullValueInitiatorPolicy & dfdl:defaultValueInitiatorPolicy, as given
by tables in 14.2.1.1 and 14.2.1.2
=> I think the bottom row of the table in 14.2.1.2 is
incorrect - in the infoset, empty string and missing element are two distinct
cases - how do/did we resolve this?
4) What controls null versus default for a missing element on output?
=> Extra property dfdl:useNullValueForDefault
5) Is the list style syntax for dfdl:nullValues acceptable?
6) Error cases - need to enumerate these
=> Input. Required element missing and no default
value.
=> Output. Required element missing and no default
value or null value.
=> Output. Element is null
and is not nillable.
=> ?
7) Consistent use of nil versus null.
=> I'm wondering that we should standardise on nil to
match xsd ?
Regards, Steve
Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Mike Beckerle <beckerle@us.ibm.com> 06/12/2007
13:50 |
|
I tend to trust your instincts about things Steve,
I would summarize it as this: regardless of how people think nulls *should*
work, in XSD nillables are orthogonal to value and whether or not this matches
people's past experience we should support it if we're going to overload
nillable at all.
To me this reasoning is pretty compelling, so I withdraw my suggestion (the
"either nillable or default value but not both" idea).
...mikeb
Steve Hanson <smh@uk.ibm.com> 12/06/2007
04:59 AM |
|
Unfortunately I have been roped into something else which will likely occupy me
full time until middle of next week, so I can't look at the defaults/nulls
issue in detail right now. But my first reaction to the proposal below is that
elements should be allowed to have both null and default values. They are
separate concepts in XML Schema, so why are we making the DFDL logical model
different? IMHO subtle differences like this cause more issues with
customers than the odd extra DFDL property. The DFDL subset of XML Schema
should be just that - a subset. For those features of XML Schema that we do
support, the rules should be the same.
Regards, Steve
Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Mike Beckerle <beckerle@us.ibm.com> 05/12/2007
23:21 |
|
OGF DFDL WG minutes 2007-12-05 call
Suman Kalia, Simon Parker, Alan Powell, Mike Beckerle
(who else? - was someone else on also)
We discussed
Output issues in the DFDL expression language:
E.g.., an outputValueCalc for a field in the header of a data stream may
contain information that requires you to know the rep, or length of the rep, of
the whole data item.
We concluded that this kind of thing can't be ruled out. Some formats just
require buffering and are not streamable; however, implementations can vary on
just how large a data item they're able to cope with here.
Expression language section will include a subsection highlighting this issue
and that implementations can vary here.
Alan will update his expression language proposal and include this.
Also suggested was a path length-from-to function that takes 2 path expressions
and gives you the size of the represntation between them. (start of first, to
last bit before start of 2nd).
(I don't think we discussed a clear use case motivating this, but there may be
one. We did discuss applications trying to fit data into limited size boxes,
but the use case is not clear.
Also note that all representation lengths are subject to change due to
different starting alignments.)
Nillable and Default:
We also discussed the interaction of nillable and having a default.
The sense of the group on the call is that we can restrict these so that if
something is nillable it cannot also have a default value, and that the
behavior of DFDL on output for a required element that is nillable but not in
the logical data, is to create a null value. Everyone agreed that there is no
need for a property useNullValueForDefault because this should always be
the behavior.
Mike will forward a proposal.
...mikeb
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
[attachment
"ogf-dfdl-v1.0-Core-032.2.doc" deleted by Steve Hanson/UK/IBM] --
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU