Suman - it is only certain properties that
can be turned off using empty string. Those that permit this explicitly
state so in the spec.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Suman Kalia <kalia@ca.ibm.com>
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org,
Steve Hanson/UK/IBM@IBMGB
Date:
17/01/2012 14:50
Subject:
Re: [DFDL-WG]
Proposed Errata Language: Issue - DFDL Expressions may not return empty
string for ...
General question : If "" empty string is allowed for a
property, then what would be the mechanism to turn off that property..
Are we saying that such property cannot be turned off
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve
Hanson <smh@uk.ibm.com>
Cc: dfdl-wg@ogf.org
Date: 01/17/2012
09:19 AM
Subject: Re:
[DFDL-WG] Proposed Errata Language: Issue - DFDL Expressions may not return
empty string for ...
Sent by: dfdl-wg-bounces@ogf.org
When a property is a DFDL String Literal, or List of DFDL String Literal,
the spec is usually silent about the validity of empty string. The assumption
has always been that empty string is only allowed when explicitly stated
in the property description (such as for initiator, terminator and separator).
Hence for textBooleanTrue/FalseRep, empty string is implicitly not permitted
as a value.
Is that really true? E.g., [,,,] with DFDL <element name="arrayOfBool"
type="xs:boolean" dfdl:textBooleanFalseRep="" dfdl:textBooleanTrueRep="1"
maxOccurs='4' minOccurs='4'/>
Now, I could do this with a default value also, so we can rule out empty
string for the textBooleanXYZRep properties, but it's not like it isn't
a sensible option.
We should be explicit about this in section 6.3.1, and say empty string
not allowed as a value, unless explicitly stated in the individual property
description.
Agreed.
That then covers us for the case where a property is a union of DFDL String
Literal and DFDL Expression, as the rules for expression say it must return
a value compliant with the property type. So the only properties that need
to take your wording below are initiator, terminator and separator.
Is it a schema definition error or a processing error if an expression
evaluates to something that does not comply with the type of the property?
I don't think the spec says.
I suggest a SDE is the right choice.
Example: Consider path a/b/c. Suppose b is a choice.
One arm has elements p, q, r, and c where c is type string. The other arm
has elements x, y, z, and c where c is type int.
If I use the path a/b/c in a context where I can only have an Int e.g.,
as a length expression, then in principle I could get a type error
or not depending on how the choice is resolved.
I think this is a super bad idea, and we should make a type error be an
SDE to prevent people from modeling data this way. We should require the
schema author to use different field names for the different "c"
fields in this example, (let's stay cString, and cInt as the names), so
that the path is a/b/cInt and then there is no question that if a/b/cInt
can't compute cInt's representation to an Int then it is an SDE.
This prevents different semantics for runtime type-checked and static-type-checked
implementations.
We continue to have the issue that one implementation may throw a SDE at
compile time, where another implementation defers that check to runtime,
and hence, can parse some data if that data does not force the erroneous
part of the schema to be used. I am not sure what we can do about this.
The obvious solutions (no compile-time checking, or only compile time checking)
both preclude classes of implementations that we don't want to rule out.
We do have a one-sided behavior that when an SDE is detected at runtime
by any implementation it will also be detected at runtime or sooner by
all implementations.
However we don't have the inverse of this. It would be nice if when an
SDE is not detected by one implementation an SDE will not be detected by
other implementations. But unfortunately, that's just not the case.
Here's Suggested Errata Language:
Section 6.3.2 is amended to add the following sentences:
DFDL expressions are strongly typed. Incorrect types are schema definition
errors. DFDL expressions are always used in a context where there is an
intended result type for the expression. In the case of test expressions
(dfdl:assert and dfdl:discriminator) the result type is boolean. In the
case of DFDL expressions providing property values, the property's type
is the intended result type. In the case of inputValueCalc, and outputValueCalc,
the result type is the type of the corresponding element.
Note specifically that strings are not automatically converted to numbers.
That is, if an expression is used in a context where the result type must
be an integer, then the expression may not return a string, even if the
string contains only digit characters. In this situation an explicit construction
of an integer (such as by calling the xs:int() function) is required as
part of the expression.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve
Hanson/UK/IBM@IBMGB
Cc: dfdl-wg@ogf.org
Date: 16/01/2012
22:17
Subject: Proposed
Errata Language: Issue - DFDL Expressions may not return empty string for
...
Here's some proposed errata language:
The following properties descriptions are amended to include this stipulation:
When a DFDL Expression is used, it may not produce empty string.
The affected properties are:
- textBooleanTrueRep
- textBooleanFalseRep
- initiator
- terminator
- separator
I did verify that the other properties that allow DFDL Expression to compute
a string do not need further clarification.
On Tue, Jan 10, 2012 at 11:52 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
Hi Mike
As will be minuted, we agreed on the WG call today that we disallow expressions
that return empty string for properties where empty string turns off the
property. Please can you take a look through the spec and see if any properties
other than initiator, terminator, separator are impacted, then I can complete
the errata.
(I would expect that inputValueCalc and outputValueCalc are not affected
by this errata, as empty string is a legal value for the element in question
if it is of type xs:string).
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: dfdl-wg@ogf.org
Date: 10/01/2012
08:06
Subject: [DFDL-WG]
spec clarification needed: is dfdl:terminator='{ ...returns empty string
... }' allowed?
Sent by: dfdl-wg-bounces@ogf.org
Let's use the example of terminator as a delimiter.
If I provide an expression so that I can compute terminator at runtime,
is it allowed to return empty string? I.e., equivalent to writing dfdl:terminator=""
which is effectively "turning off" use of terminator?
It seems very problematic to me if we allow this. Nor do I think
this generality is needed.
We should clarify that for initiator/terminator/separator, if a runtime
expression is used, then it must return at least one non-zero-length value.
So using a runtime expression for a delimiter is effectively saying "yes
there will be a delimiter", you are just not binding its specific
value.
I believe this runtime expression capability for delimiters was intended
to allow the choice of the specific delimiter to be made based on data
containing the value. This is common practice in data formats.
However, turning on/off whether delimiters are present or not, is not something
I anticipated, and it has far bigger implications for the format. I mean
you really can't decide much about the data format statically if even the
existence of delimiters as part of the format or not can be postponed to
runtime.
Comments?
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU