I agree with all of that.
The best way to specify the type of
a DFDL string literal in the 'schema for DFDL annotations' would be:
- define a global simple type called
'DFDLStringLiteral' that is a restriction of xs:string ( not xs:token )
and contains a pattern facet that describes its lexical space..
- define a separate global simple type
'ListOfDFDLStringLiteral' that is a list of DFDLStringLiteral
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB,
Cc:
dfdl-wg@ogf.org
Date:
17/07/2013 20:35
Subject:
Re: [DFDL-WG]
Fw: DFDL String Literal type
Sent by:
dfdl-wg-bounces@ogf.org
Well, it's looking to me like xml/xsd just doesn't have
the right pre-defined whitespace-handling concepts that DFDL needs for
DFDL String Literal nor for DFDL Expression. The whitespace-separated list
of DFDL String literals works, but this is almost by accident.
If xml/xsd aren't going to have the right thing for us, I think we should
state our own rules, and we should avoid deriving from the behavor of xs:token
because it collapses even quoted whitespace inside expressions, which is
very undesirable.
To me given this value " { ../foo
eq ' . ' }
" the whitespace everywhere except between the single quotes is insignificant
and can be collapsed, but collapsing shouldn't mess with a schema author's
quoted strings.
Yes we have dfdl:decodeDFDLEntities('%SP;%SP;.%SP;%SP;") which could
be plugged in instead. But I think this is a hack.
So to me, from the XSD schema of DFDL annotations point of view, DFDL expression
is a whitespace-preserving string, and DFDL String Literal is as well.
The DFDL implementation must then provide the behavior for removal of insignificant
whitespace.
For DFDL Expressions, all whitespace is insignificant except that between
quotation marks which is significant.
For DFDL String Literals, no whitespace is allowed, and DFDL Character
Entities must be used.
On Wed, Jul 17, 2013 at 11:06 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
We discussed the correct XML schema
type for DFDL String Literal on the last WG call. I read up on xs:NMTOKEN
- not appropriate as it is basically a name so does not allow the
full range of characters we need. Then I looked at restricting xs:token,
but I could not work out from the XML Schema 1.0 spec how whitespace facets
were handled when other facets were present. So I asked Sandy, and
got the very useful clarification below. Please review for next call.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 17/07/2013 16:01 -----
From: Sandy
Gao/Toronto/IBM@IBMCA
To: Steve
Hanson/UK/IBM@IBMGB,
Date: 17/07/2013
13:33
Subject: Re:
DFDL String Literal type
Hi Steve,
Yes, that should work. All other facet checking, including pattern, happens
*after* whitespace handling.
This was made clearer in Schema 1.1, where "whitespace" is called
a pre-lexical facet, and "pattern" etc. are called lexical facets.
Thanks,
Sandy Gao
Source Code Monitoring (SCMon)
IBM Canada
sandygao@ca.ibm.com
From: Steve
Hanson/UK/IBM@IBMGB
To: Sandy
Gao/Toronto/IBM@IBMCA,
Date: 2013-07-17
06:07 AM
Subject: DFDL
String Literal type
Hi Sandy
Please can I ask your advice on use of the whitespace facet in conjunction
with the pattern facet? This is in order to model the correct data
type for a DFDL String Literals. This is defined as:
DFDL String Literal
DFDL String Literals represent a sequence of literal bytes or characters
which appear in the data stream. This presents the following challenges
- the literal
characters in the data stream might not be in the same encoding as the
DFDL schema
- it may be
necessary to specify a literal character which is not valid in an XML document
- it may be
necessary to specify one or more raw byte values
A DFDL string literal can describe any
of the following types of literal data in any combination:
- a single literal
character in any encoding
- a string of
literal characters in any encoding
- a bi-directional
character string
- one or more
characters from a set of related characters ( e.g. end-of-line characters)
- a literal
byte value
A DFDL string literal is therefore able
to describe any arbitrary sequence of bytes and characters.
Empty Strings: Empty string
is not allowed as a DFDL string literal value unless explicitly stated
otherwise in the description of a property. In this case the use of empty
string provides some property specific behavior different from simply using
the empty string as a value. When the empty string is to be used as a value,
the entity %ES; must be used in the corresponding DFDL string literal.
Whitespace: When whitespace
must be used as part of a property value, the DFDL string literal must
use entities (such as %WSP;) to represent the whitespace. (This allows
a property to represent lists of DFDL string literals by using literal
spaces to separate list elements.)
The nearest match to an XSDL built-in
type is xs:token, but we require the additional constraint that no whitespace
can appear. My thought is to define a restriction of xs:token that
applies a pattern facet to disallow use of #x20, given that the whitespace
'collapse' implied by xs:token would have replaced #x9, #xA, #xD with #x20,
collapsed contiguous #x20, and trimmed leading/trailing #x20. Does
that sound right?
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU