
Before I left Seoul, Stephen showed me a simple solution that was structurally along the lines of what I imagined for the range value element type. RANGE-VALUE COMPLEX TYPE PSUEDO-SCHEMA ---------------------------------------- It is very straightforward in terms of writing XSD. With some changes I recommend, it would look like this as a sequence: <...> <lowerBound>xsd:integer</lowerBound> ? <upperBound>xsd:integer</upperBound> ? <exact>xsd:integer</exact> * <range> <lowerBound>xsd:integer</lowerBound> <upperBound>xsd:integer</upperBound> </range> * </...> I have changed the names of the pieces from what I think Stephen showed me, in order to try to make their meaning as obvious as possible. The whole element is more or less a disjunction of its expression parts, but please note below the special treatment of the optional semi-space constraints (the top-level lower/upper bounds)... they are treated identically to a regular range element if both are specified and only act as true semi-spaces when one is omitted. However, all other exact and range expressions are treated independentally (as a true disjunction). The alternatives to this that I came up with all seemed unsatisfactory: 1) make it a complete disjunction, meaning any value over the lower bound OR under the upper bound is in range. 2) other hybrid conjunctions, meaning multiple bounds, ranges, and/or exact expressions have to match. INTEGER OR FLOATING-POINT REPRESENTATION ----------------------------------------- We had discussed using xsd:integer and this implies that any use of this type must just define the "base" units that are being counted. The only problem in our base terms is (cpu-)time, where I wonder if we are better off using a floating-point type to allow the base units to be seconds and still allow fractional second specification. Choosing some specoific fractional second as the base unit seems unappealing to me. If we provide a floating point/fractional version, I think there needs to be an optional attribute on the exact element to specify a precision or epsilon value for equality tests, e.g. <exact jsdl:precision="0.001">3.1415927...</exact> could match anything in the range (3.1405927..., 3.1425927...). Or, the consumer could treat it as undefined and do something appropriate if a precision is specified which it cannot support. Perhaps a simpler solution is to stick with arbitrary length integers and add an optional divisor attribute that defaults to 1: <exact jsdl:divisor="100000">314159</exact> which would be exactly 3.14159 in decimal fractions? Of course, a different divisor like "1024" could be used for binary fractions. I am not a numerical analyst so I would prefer we bounce any such proposal off of several before adopting it. SEMANTICS ------------------------------------------ The matching semantics would be as follows for an element of this type: let boolean "L" and "U" be whether lower/upper bound values are specified; let integer "l" and "u" be lower/upper bound values respectively let E be { e | e is specified in an exact element } let R be { <l,u> | <l,u> is specified in a range element } in_range(x) = (!L || l <= x) && (!U || x <= u) || there exists e in E such that x = e || there exists <l,u> in R such that l <= x <= u. INCLUSIVE VERSUS EXCLUSIVE RANGES ----------------------------------------- Also, I suggest that we place an optional attribute in the boundary elements: jsdl:exclusiveBound=xsd:boolean with default false, meaning that by default the range of acceptable values includes the boundary value in the element body. Setting it to true would mean that the boundary value is not part of the range. This supports any meaning captured before in the operator enumeration, I think. A minor note is that the elements have to appear in the sequence order, which I argue is a good thing for machine-machine communication as the parse tree will yield three monomorphic arrays of values with clear meanings, rather than one polymorphic array that the consumer has to traverse. karl -- Karl Czajkowski karlcz@univa.com

Karl Czajkowski wrote:
It is very straightforward in terms of writing XSD. With some changes I recommend, it would look like this as a sequence: <...> <lowerBound>xsd:integer</lowerBound> ? <upperBound>xsd:integer</upperBound> ? <exact>xsd:integer</exact> * <range> <lowerBound>xsd:integer</lowerBound> <upperBound>xsd:integer</upperBound> </range> * </...>
Looks mostly good, except we should use xsd:nonNegativeInteger or xsd:double instead of xsd:integer. I can't currently think of a use case for negative bounds, but there might be a case to be made for allowing floating-point representations. (They become more useful when you start using extension resources, perhaps because they are modelling some scientific instrument, and we want to encourage the reuse of our types so that they can be tooled just once.) Your semantics section is exactly right IMO, modulo the exclusiveBound attribute discussion (which mostly just expands it out with lots of minor variations).
We had discussed using xsd:integer and this implies that any use of this type must just define the "base" units that are being counted. The only problem in our base terms is (cpu-)time, where I wonder if we are better off using a floating-point type to allow the base units to be seconds and still allow fractional second specification. Choosing some specoific fractional second as the base unit seems unappealing to me.
If we provide a floating point/fractional version, I think there needs to be an optional attribute on the exact element to specify a precision or epsilon value for equality tests, e.g. <exact jsdl:precision="0.001">3.1415927...</exact>
Actually, the correct thing to do is for the caller to always use bounded intervals with floats unless they really know exactly what they are after, since some floats (e.g. 1.25) actually exactly representable in IEEE arithmetic. OK, it's punting the problem to the document creator (the JSDL processor just checks what it is told) but that's the right thing to do in my experience with processing floats.
Perhaps a simpler solution is to stick with arbitrary length integers and add an optional divisor attribute that defaults to 1: <exact jsdl:divisor="100000">314159</exact>
Ick. That really makes handling floats much nastier! There's no need to do this; xsd:double will be handled right (and tooled nicely) as long as callers don't have unrealistic expectations of float math. (OK, many people do have those unrealistic expectations, but that's not our fault and we can't fix the world.)
Also, I suggest that we place an optional attribute in the boundary elements: jsdl:exclusiveBound=xsd:boolean with default false, meaning that by default the range of acceptable values includes the boundary value in the element body. Setting it to true would mean that the boundary value is not part of the range. This supports any meaning captured before in the operator enumeration, I think.
I have absolutely no objection to adding this to UpperBound and LowerBound, and it makes working with real-like numbers much easier (there are some subtleties in there to do with the difference between open and closed interval bounds.)
A minor note is that the elements have to appear in the sequence order, which I argue is a good thing for machine-machine communication as the parse tree will yield three monomorphic arrays of values with clear meanings, rather than one polymorphic array that the consumer has to traverse.
I'm not sure I'd enforce that, but as we don't need to define an algorithm for minimization or testing equivalence of range types, I'd just not bother. Say that doing it is recommended, not required. :^) Donal.

On Mar 18, Donal K. Fellows loaded a tape reading:
We had discussed using xsd:integer and this implies that any use of this type must just define the "base" units that are being counted. The only problem in our base terms is (cpu-)time, where I wonder if we are better off using a floating-point type to allow the base units to be seconds and still allow fractional second specification. Choosing some specoific fractional second as the base unit seems unappealing to me.
If we provide a floating point/fractional version, I think there needs to be an optional attribute on the exact element to specify a precision or epsilon value for equality tests, e.g. <exact jsdl:precision="0.001">3.1415927...</exact>
Actually, the correct thing to do is for the caller to always use bounded intervals with floats unless they really know exactly what they are after, since some floats (e.g. 1.25) actually exactly representable in IEEE arithmetic. OK, it's punting the problem to the document creator (the JSDL processor just checks what it is told) but that's the right thing to do in my experience with processing floats.
Well, if we're going to support floats, my point was moot. My only question is whether float/integer is a choice made by the schema author for a term or a runtime choice made by the document creator. I'd rather see to versions of our type, e.g. jsdl:integerRangeValueType and jsdl:floatingRangeValueType, and a term element definition has to pick one. I could see offering more variants while we are at it, to capture non-negative types etc. Because I do not feel confident I understand all future resource ontologies, I am not comfortable saying that resource-selection metrics never use negative values. So I think we need to offer signed integer (and float) in a core set of types.
Ick. That really makes handling floats much nastier! There's no need to do this; xsd:double will be handled right (and tooled nicely) as long as callers don't have unrealistic expectations of float math. (OK, many people do have those unrealistic expectations, but that's not our fault and we can't fix the world.)
I'm happy to have floating point variants.
A minor note is that the elements have to appear in the sequence order, which I argue is a good thing for machine-machine communication as the parse tree will yield three monomorphic arrays of values with clear meanings, rather than one polymorphic array that the consumer has to traverse.
I'm not sure I'd enforce that, but as we don't need to define an algorithm for minimization or testing equivalence of range types, I'd just not bother. Say that doing it is recommended, not required. :^)
Donal.
Well, it is actually a matter of having to write a more complicated schema to support reordering! The basic one Stephen and I discussed is (adjusted for my latest proposal): complexType sequence lowerBound? upperBound? exact* range* which exactly captures the cardinality requirements associated with the intended evaluation semantics. On the other hand, the mixed order one (which results in nastier parse trees), is something more like: complexType sequence lowerBound? upperBound? choice * exact range and it is even harder (or impossible?) to allow the two boundary elements to be reordered without relaxing the [0,1] cardinality constraint. karl -- Karl Czajkowski karlcz@univa.com
participants (2)
-
Donal K. Fellows
-
Karl Czajkowski