I prefer
dfdl:emptyElementParsePolicy = ( "treatAsMissing"
| "treatAsEmpty" )
You have to understand the difference between
empty and missing in DFDL.
It has an effect on all types - for example,
if you set "treatAsMissing" for a required number, it means empty
always causes a processing error instead of potentially applying a default.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson <smh@uk.ibm.com>
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
08/05/2019 19:26
Subject:
Re: [DFDL-WG]
Action 306 - IBM DFDL behaviour when parsing empty strings
I suggest we stick with the "...Policy" naming
convention for new things that control modes of behavior.
I'd prefer to avoid the terms empty and missing in the
property values and go with something that is more explanatory of what
difference it makes.
E.g, emptyElementParsePolicy with values "excludeEmptyStringAndHexBinaryValues"
and "allowEmptyStringAndHexBinaryValues"
The doc for these values will of course have to be in
terms of Absent/Missing/Empty, etc. but at least the names give some intuition
as to what they control without having to understand all of DFDL's
nuances about the difference between what Absent and Missing is.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
On Wed, May 8, 2019 at 12:52 PM Steve Hanson <smh@uk.ibm.com>
wrote:
Maybe this is better;
dfdl:parseEmptyAsMissing = yes | no
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve
Hanson <smh@uk.ibm.com>
Cc: DFDL-WG
<dfdl-wg@ogf.org>
Date: 08/05/2019
16:48
Subject: Re:
[DFDL-WG] Action 306 - IBM DFDL behaviour when parsing empty strings
Interesting. Many DFDL schemas I've created have a simpleType defintion
named "nzString" which is string, plus an assertion that it is
non-empty.
That's to achieve exactly the behavior you have in IBM DFDL, because, as
you say, many formats want this.
We could rename the suggested property emptyElementParsePolicy to make
it clear it is only about parsing.
I like treatAsMissing. Easy to say what it means.
treatAsEmpty begs the question of what empty elements do, but that's already
complicated in the spec due to optionals and EVDP, so I'm happy with this
also.
...mikeb
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
On Tue, May 7, 2019 at 3:48 AM Steve Hanson <smh@uk.ibm.com>
wrote:
Hi Mike
I think what you have highlighted is that there are formats which require
that empty elements should not be treated as empty but as missing, which
is effectively what IBM DFDL is doing (our code was written prior to action
140 when there was no distinction between empty & missing). That could
be achieved with assertions. So maybe we should view the new property as
a convenience property for such formats, as well as handling IBM DFDL's
behaviour?
If so, then can I suggest new names for the enums, which I think makes
the intent clearer?
dfdl:emptyElementPolicy = ( "treatAsMissing"
| "treatAsEmpty" )
This only applies when parsing, maybe names should reflect that also?
Further, "treatAsMissing" would imply that a default value was
never used when parsing, as they are only used when the representation
is empty. I think we can do away with the SDE clause for "treatAsMissing".
The clause is only needed for "treatAsEmpty".
IBM DFDL does implement nillable processing, including use of ES as nil
literal value.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve
Hanson <smh@uk.ibm.com>
Cc: DFDL-WG
<dfdl-wg@ogf.org>
Date: 03/05/2019
21:23
Subject: Re:
[DFDL-WG] Action 306 - IBM DFDL behaviour when parsing empty strings
Under testing with the EDIFACT schema (from DFDLSchemas on github) against
new code in daffodil, I see that my proposal was not sufficient.
Steve Hanson stated that IBM DFDL current behavior for required empty strings
includes "An empty occurrence
with no default gives a Processing Error."
I misinterpreted this. I was thinking required occurrence of an array element
(as in with index <= minOccurs). But this should not be interpreted
that narrowly, but any required occurrence at all including scalar elements.
The EDIFACT schema depends on this behavior and backtracking driven by
it, in order to work.
So my suggestion for new properties to control this is revised to:
dfdl:emptyElementPolicy enum with values
noEmptyElements - matches current IBM DFDL behavior where
* required elements without default values that are empty (specifically
which satisfy the empty syntax - defined below) always cause Processing
Errors.
** If a default value is specified that is provided as the value instead.
When a default value is specified, then implementations that don't support
default values when parsing must issue a runtime SDE here, not a processing
error.
* optional elements which satisfy the empty syntax are not added to the
infoset. Defaulting is never considered.
emptyElements - matches current description in the DFDL spec where
* required elements: if the string/hexBinary satisfies the empty
syntax then required elements are created with an empty string or
empty hexBinary as their value. If a default value is specified that is
substituted as the value instead. When a default value is specified, then
implementations that don't support default values when parsing must issue
a SDE here, not a processing error.
* optional elements: if the string/hexBinary satisfies the empty syntax,
and emptyValueDelimiterPolicy is not 'none' then an empty string (or hexbinary)
is added to the infoset. If emptyValueDelimiterPolicy is 'none', nothing
is added to the infoset.
The term "satisfy the empty syntax" means what is found in the
data stream may require initiator and/or terminator depending on emptyValueDelimiterPolicy,
but if that is 'none' then this is satisfied just by empty string (or no
bytes for hexBinary).
Having said the above, I believe we also have to consider nillable elements.
There are two topics:
1) defaulting to nilled - For the case of a nillable element, where the
data syntax does NOT match the nil representation, then in
the above anywhere a default value is specified, and there is behavior
associated with that, well if the element is nillable, and dfdl:useNilAsDefault='true'
is specified, then the element is default valued to being nilled.
When nillable and dfdl:useNilAsDefault='true' is specified, then
implementations that don't support defaulting to nilled when parsing must
issue an SDE here, not a processing error.
That takes care of the defaulting aspect of nillables.
The second topic is:
2) nillable, and dfdl:nilValue contains %ES; as one of the possible nil
representations. Hence, there is the possibility of empty string (or empty
hexBinary) matching the nil representation.
I think the DFDL spec is clear here that if the data stream satisfies the
nil syntax, then required or optional, you get a nilled element, period.
Does IBM DFDL implement that behavior? If so great. If not I think
we may have to amend the above description of noEmptyElements case for
dfdl:emptyElementPolicy to specify the special cases.
...mikeb
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
On Sun, Apr 28, 2019 at 9:36 AM Mike Beckerle <mbeckerle.dfdl@gmail.com>
wrote:
One clarification: is the IBM DFDL behavior the same for empty hexBinary
elements as it is for text strings?
I'm going to suggest we need a policy property e.g.,
dfdl:emptyElementPolicy which is an enum with at least these options:
noOptionalEmptyElements - matches current IBM DFDL behavior
optionalEmptyElementsWithSyntax - matches current description in the DFDL
spec where initiator and/or terminator found triggers creation of an empty
string value. (Daffodil implements this.)
This would apply (I think) to both types xs:string ad xs:hexBinary
I'm open to suggestions for better naming for the property and the property
values, but these are the two settings we need I think.
I do believe that the latter optionalEmptyElementsWithSyntax behavior is
what the DFDL spec describes, and is most consistent given the available
properties such as emptyValueDelimiterPolicy.
We can make implementation of optionalEmptyElementsWithSyntax a DFDL optional
language feature, thereby avoiding issues of conformance with the DFDL
standard.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
On Fri, Apr 5, 2019 at 12:43 PM Steve Hanson <smh@uk.ibm.com>
wrote:
Daffodil to perform identical tests but the belief is that they implement
the spec as published (except maybe for one bug with default values for
strings).
So there is a mis-match between Daffodil and IBM DFDL. It sounds
like a new property is going to be needed which toggles the way that empty
strings are handled.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Steve
Hanson/UK/IBM
To: DFDL-WG
<dfdl-wg@ogf.org>
Cc: "Mike
Beckerle" <mbeckerle@tresys.com>,
"Michele Zundo" <michele.zundo@esa.int>,
Bradd Kadlecik/Poughkeepsie/IBM@IBMUS
Date: 03/04/2019
12:04
Subject: Action
306 - IBM DFDL behaviour when parsing empty strings
306
| Confirm
IBM DFDL behaviour when parsing empty strings (Steve)
7/8: IBM DFDL has not fully implemented the behaviour changes arising from
action 140 with respect to empty string elements. Daffodil is about to
do so. IBM DFDL users have complained about lack of defaults when parsing
but other than that appear happy. Are the rules in the spec for empty strings
over complicated? Steve to document the behaviour for IBM DFDL to
inform the discussion.
...
1/11: In progress - there are a lot of subtle scenarios
15/11: Not discussed
...
7/2/19: No further progress |
Some progress :)
9.4.2.2 Simple
element (xs:string or xs:hexBinary)
Required occurrence: If the element has
a default value then an item is added to the infoset using the default
value, otherwise an item is added to the Infoset using empty string (type
xs:string) or empty hexBinary (type xs:hexBinary) as the value.
Optional occurrence: If dfdl:emptyValueDelimiterPolicy
is not 'none' then an item is added to the Infoset using empty string (type
xs:string) or empty hexBinary (type xs:hexBinary) as the value, otherwise
nothing is added to the Infoset.
IBM DFDL behaviour:
Required. IBM DFDL does not implement default values when parsing, so an
empty occurrence with a default value gives an SDE (to prevent backtracking).
An empty occurrence with no default gives a Processing Error. If you need
to add an empty string to the infoset, you can add default=""(when
default values implemented, of course).
Optional. IBM DFDL adds nothing to the
infoset regardless of presence of initiator and/or terminator. No way to
get empty string into the infoset.
9.4.2.3 Complex
element
Required occurrence: An item is added to
the Infoset.
Optional occurrence: If dfdl:emptyValueDelimiterPolicy
is not 'none' then an item is added to the Infoset, otherwise nothing is
added to the Infoset.
For both required and optional occurrences,
the Infoset item may also have a child item.
1. If
the first child element of the complex type is a required simple element,
then an empty string (type xs:string), empty hexBinary (type xs:hexBinary),
or default value will also be added to the Infoset.
2. If
the first child element of the complex type is a required complex element,
then an item is added to the Infoset (which may itself have a child via
(1))
IBM DFDL behaviour:
Required. IBM DFDL follows the spec (modulo
1 when an error would have been thrown, as per its 9.4.2.2 behaviour).
Optional. IBM DFDL follows the spec (modulo
1 when an error would have been thrown, as per its 9.4.2.2 behaviour).
So ...
The spec today is consistent in one way, in that for both complex &
string elements a) a required empty occurrence always adds to the infoset;
& b) an optional empty occurrence adds to the infoset if initiator/terminator
present; & c) an optional empty occurrence does not add to the infoset
if no initiator/terminator present.
If the simple string behaviour was to change to match IBM DFDL then that
consistency is lost, but the string behaviour then matches that
for other simple types. Section 9.4.2.2 disappears as the behaviour
is same as 9.4.2.1. Section 9.4.2.3 becomes as below. We lose the ability
to get an empty string into the infoset for an optional string with initiator/terminator.
9.4.2.3 Complex
element
Required occurrence: An item is added to
the Infoset.
Optional occurrence: If dfdl:emptyValueDelimiterPolicy
is not 'none' then an item is added to the Infoset, otherwise nothing is
added to the Infoset.
For both required and optional occurrences,
the Infoset item may also have a child item.
1. If
the first child element of the complex type is a required simple element,
then a default value will also be added to the Infoset.
2. If
the first child element of the complex type is a required complex element,
then an item is added to the Infoset (which may itself have a child via
(1))
We also need to be sure that any other implementations have not yet implemented
the current spec behaviour. Need to check with DFDL4S and
IBM TPF.
To be discussed on next WG call ...
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU