Hi Martin,
Great to have you chime in on this.
The difficulty I have is that the XSD
"logical" types hexBinary and base64Binary suggest a representation.
Given this I think it is better to eliminate the crazy combinations of
hexBinary "logical" type with base64 physical and the other way
round. The XML formalisms like the PSVI already suggest that the
logical data for hexBinary and base64Binary are the same binary bytes,
not the encoded strings.
Next isue is do we need to support hex
and base64 representation text encodings or not. It has been suggested
that we can leave this out, at least for V1.0 of DFDL. Obviously there
are multi-layer formats (like MIME), which make heavy use of encoded data,
but for V1.0 of DFDL describing these multi-layer encodings in a single
schema is already something we're putting off.
So assuming we put off encodings, at
that point you'd have two identical types, i.e., there'd be no difference
in DFDL between what xs:hexBinary and xs:base64Binary would mean, in which
case it is conservative for us to leave one out, and base64 generates confusion
by name alone so I'd have to pick that one.
So you arrive at having only hexBinary
and only with binary, not text, representation.
Minimalist, but probably sufficient.
...mikeb
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
"Westhead, Martin
(Martin)" <westhead@avaya.com>
11/19/2007 02:13 PM
|
To
| Mike Beckerle/Worcester/IBM@IBMUS, "Steve
Hanson" <smh@uk.ibm.com>
|
cc
| <dfdl-wg@ogf.org>
|
Subject
| RE: [DFDL-WG] DFDL hexBinary and base64Binary |
|
I am way out of the loop here,
but I felt motivated to throw in a few cents on this discussion.
As far as scope goes, it seems
to me a reasonable goal to consider would be to include all the primitive
types of XML Schema as in scope. That would suggest that hexBinary and
base64 should be included.
Regarding implementation, what
concerns me about the discussion is the confusion between data model and
representation that I seem to be hearing. (Perhaps I am bringing this to
the discussion in which case please set me straight).
The way it looks to me is that
when you specify the XML Schema “type” in the DFDL document you are specifying
the data model, or another way to put it is that you are specifying the
form of the XML document that would be output if your DFDL parser were
producing a document. This should be separated from the discussion of the
representation of the data that you are reading in.
So what I expect is that there
are three different data models for this kind of data:
1. Sequence
of bytes
2. hexBinary
3. base64
And there are three different
underlying representations of the data that could be read from:
1. bytes
2. bin
hex
3. base
64
And ideally you should be able
to choose the model and the data separately (IMO).
Am I making sense?
Martin
From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org]
On Behalf Of Mike Beckerle
Sent: Monday, November 19, 2007 8:34 AM
To: Steve Hanson
Cc: dfdl-wg@ogf.org
Subject: Re: [DFDL-WG] DFDL hexBinary and base64Binary
Steve, (& team)
What you are suggesting is the simplest of the simple. No 'text' representation
at all, Users who have actual hexidecimal strings in their data can
always model them as either strings or if they're small enough, integers
in base 16 text.
In this case the only difference between hexBinary and base64Binary is
what happens if you coerce the infoset value to a string and this is into
the API space which is outside the scope of DFDL.
To me this suggests that we leave out base64Binary entirely for V1.0 to
avoid confusion (it will be confusing to people to explain that hexBinary
and base64Binary are synonymous in DFDL)
So the net functionality for DFDL v1.0 would be this only:
type
|
representation
|
lengthKind
|
resulting length (in
bytes)
|
other
|
xs:hexBinary
| binary
(note: required - If 'text' specified it causes a schema definition error.
This reserves the 'text' behavior for possible future use.)
| implicit
| xs:length facet
|
|
|
| explicit
| dfdl:length
| Validation: xs:length facet must be equal
to resulting length in bytes
(TBD: similar range checks on xs:minLength,
xs:maxLength)
|
|
| endOfData or delimited or nullTerminated
| variable
| Validation: xs:length facet must be equal
to resulting length in bytes
(TBD: similar range checks on xs:minLength,
xs:maxLength) |
I'm very happy with this for V1.0.
Any further comments or should we go with this for V1.0?
...mikeb
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
Steve Hanson <smh@uk.ibm.com>
Sent by: dfdl-wg-bounces@ogf.org
11/19/2007 10:23 AM
|
To
| dfdl-wg@ogf.org
|
cc
|
|
Subject
| Re: [DFDL-WG] DFDL hexBinary and base64Binary |
|
My view: The logical type is binary, so the data in the information item
is binary, the length facets should always deal in bytes, and validation
checks the length of the binary data in bytes.
From the above, of the two simplifications below, I would rather disallow
the text representations of xs:hexBinary and xs:base64Binary. Fyi MRM today
- does not support text reps for binary
- has not had such a request from users
- uses length/minLength/maxLength facets to validate binary field length
post-parse
- uses length/maxLength to populate the default for the physical length.
Regards, Steve
Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Mike Beckerle <beckerle@us.ibm.com>
Sent by: dfdl-wg-bounces@ogf.org
16/11/2007 23:09
|
To
| dfdl-wg@ogf.org
|
cc
|
|
Subject
| [DFDL-WG] DFDL hexBinary and base64Binary |
|
I'm trying to wrap up the opaque/hexBinary/base64Binary topic.
I need opinions on this discussion.
Currently we have a property, dfdl:binaryType :
Properties Specific to Binary Types (hexBinary, base64Binary)
Property Name
| Description
|
binaryType
| Enum
This specifies the encoding method for the
binary.
Valid values are ‘unspecified’, ‘hexBinary’,
‘base64Binary’, ‘uuencoded’
Annotation: dfdl:element (simple type ‘binary’,
‘opaque’) |
This property speaks to what kinds of representations can we interpret
and construct logical hexbinary values from? (similarly base64Binary)
I believe the above is not clear, and causes issues with the xs:length
facet of XSD.
I propose the 4 tables below which describe the 4 cases:
hexbinary - binary
hexbinary - text
base64binary - binary
base64binary - text
I have specified these so that the meaning of the xs:length facet is always
interpreted exactly as in XSD. It always refers to the number of bytes
of the unencoded binary data, and never to the number of characters in
the encoded form.
type
|
representation
|
lengthKind
|
resulting length (in
bytes)
|
other
|
xs:hexBinary
| binary
| implicit
| xs:length facet
|
|
|
| explicit
| dfdl:length
| Validation: xs:length facet must be equal
to resulting length in bytes
(TBD: similar range checks on xs:minLength,
xs:maxLength)
|
|
| endOfData or delimited or nullTerminated
| variable
| |
type
|
representation
|
lengthKind
|
resulting length (in
characters)
|
other
|
xs:hexBinary
| text
| implicit
| 2 * xs:length facet
|
|
|
| explicit
| dfdl:length
| Validation: xs:length facet * 2 must
be equal to resulting character length (after removing all non-hex characters)
(TBD: similar range checks on xs:minLength,
xs:maxLength)
|
|
| endOfData, delimited, nullTerminated
| Variable
| |
type
|
representation
|
dfdl:lengthKind
|
resulting length (in
bytes)
|
other
|
xs:base64Binary
| binary
| implicit
| xs:length facet
|
|
|
| explicit
| dfdl:length
| Validation: xs:length facet must be equal
to resulting length in bytes
(TBD: similar range checks on xs:minLength,
xs:maxLength)
|
|
| endOfData or delimited or nullTerminated
| variable
| |
type
|
representation
|
lengthKind
|
resulting length (in
characters)
|
other
|
xs:base64Binary
| text
| implicit
| 8/6 * xs:length facet
|
|
|
| explicit
| dfdl:length
| Validation: xs:length facet * 8/6
must be equal to resulting character length (after removing all non-base64-encoding
characters)
(TBD: similar range checks on xs:minLength,
xs:maxLength)
|
|
| endOfData, delimited, nullTerminated
| Variable
| |
Looking at the above, one way to simplify
things quite a bit is to disallow the xs:length and xs:minLength and xs:maxLength
facet on hexBinary and base64Binary types in DFDL schemas.
Then the implicit lengthKind goes away,
and the complex validation check for the xs:length facet goes away. I
recommend this.
Another simplification alternative is
to disallow representation text altogether, but I am concerned that peopel
with data that does contain hex or base64 data will naturally want to use
these types to model it. I don't recommend this.
...mikeb
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg