I am way out of the loop here, but I felt
motivated to throw in a few cents on this discussion.
As far as scope goes, it seems to me a
reasonable goal to consider would be to include all the primitive types of XML
Schema as in scope. That would suggest that hexBinary and base64 should be
included.
Regarding implementation, what concerns me
about the discussion is the confusion between data model and representation
that I seem to be hearing. (Perhaps I am bringing this to the discussion in
which case please set me straight).
The way it looks to me is that when you
specify the XML Schema “type” in the DFDL document you are
specifying the data model, or another way to put it is that you are specifying the
form of the XML document that would be output if your DFDL parser were
producing a document. This should be separated from the discussion of the
representation of the data that you are reading in.
So what I expect is that there are three
different data models for this kind of data:
And there are three different underlying
representations of the data that could be read from:
And ideally you should be able to choose
the model and the data separately (IMO).
Am I making sense?
Martin
From:
dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On Behalf Of Mike Beckerle
Sent: Monday, November 19, 2007
8:34 AM
To: Steve Hanson
Cc: dfdl-wg@ogf.org
Subject: Re: [DFDL-WG] DFDL
hexBinary and base64Binary
Steve, (& team)
What
you are suggesting is the simplest of the simple. No 'text' representation at
all, Users who have actual hexidecimal strings in their data can always
model them as either strings or if they're small enough, integers in base 16
text.
In
this case the only difference between hexBinary and base64Binary is what
happens if you coerce the infoset value to a string and this is into the API
space which is outside the scope of DFDL.
To
me this suggests that we leave out base64Binary entirely for V1.0 to avoid
confusion (it will be confusing to people to explain that hexBinary and base64Binary
are synonymous in DFDL)
So
the net functionality for DFDL v1.0 would be this only:
type |
representation |
lengthKind |
resulting length (in bytes) |
other |
xs:hexBinary |
binary |
implicit |
xs:length facet |
|
|
|
explicit |
dfdl:length |
Validation: xs:length facet must be equal to resulting
length in bytes (TBD:
similar range checks on xs:minLength, xs:maxLength) |
|
|
endOfData or delimited or nullTerminated |
variable |
Validation: xs:length facet must be equal to resulting
length in bytes (TBD:
similar range checks on xs:minLength, xs:maxLength) |
I'm very happy with this for V1.0.
Any
further comments or should we go with this for V1.0?
...mikeb
Mike
Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
Steve Hanson
<smh@uk.ibm.com> 11/19/2007 10:23 AM |
|
My view: The logical type is binary, so the data in the information item is
binary, the length facets should always deal in bytes, and validation checks
the length of the binary data in bytes.
>From the above, of the two simplifications below, I would rather disallow the
text representations of xs:hexBinary and xs:base64Binary. Fyi MRM today
- does not support text reps for binary
- has not had such a request from users
- uses length/minLength/maxLength facets to validate binary field length
post-parse
- uses length/maxLength to populate the default for the physical length.
Regards, Steve
Steve Hanson
WebSphere Message Brokers
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848
Mike Beckerle
<beckerle@us.ibm.com> 16/11/2007 23:09 |
|
I'm trying to wrap up the opaque/hexBinary/base64Binary topic.
I need opinions on this discussion.
Currently we have a property, dfdl:binaryType :
Properties Specific to Binary Types (hexBinary, base64Binary)
Property Name |
Description |
binaryType |
Enum This
specifies the encoding method for the binary. Valid
values are ‘unspecified’, ‘hexBinary’,
‘base64Binary’, ‘uuencoded’ Annotation:
dfdl:element (simple type ‘binary’, ‘opaque’) |
This property speaks to what kinds of representations can we interpret and
construct logical hexbinary values from? (similarly base64Binary)
I believe the above is not clear, and causes issues with the xs:length facet of
XSD.
I propose the 4 tables below which describe the 4 cases:
hexbinary - binary
hexbinary - text
base64binary - binary
base64binary - text
I have specified these so that the meaning of the xs:length facet is always
interpreted exactly as in XSD. It always refers to the number of bytes of the
unencoded binary data, and never to the number of characters in the encoded
form.
type |
representation |
lengthKind |
resulting length (in bytes) |
other |
xs:hexBinary |
binary |
implicit |
xs:length facet |
|
|
|
explicit |
dfdl:length |
Validation: xs:length facet must be equal to resulting
length in bytes (TBD:
similar range checks on xs:minLength, xs:maxLength) |
|
|
endOfData or delimited or nullTerminated |
variable |
|
type |
representation |
lengthKind |
resulting length (in characters) |
other |
xs:hexBinary |
text |
implicit |
2 * xs:length facet |
|
|
|
explicit |
dfdl:length |
Validation: xs:length facet * 2 must be equal to
resulting character length (after removing all non-hex characters)
(TBD:
similar range checks on xs:minLength, xs:maxLength) |
|
|
endOfData, delimited, nullTerminated |
Variable |
|
type |
representation |
dfdl:lengthKind |
resulting length (in bytes) |
other |
xs:base64Binary |
binary |
implicit |
xs:length facet |
|
|
|
explicit |
dfdl:length |
Validation: xs:length facet must be equal to resulting
length in bytes (TBD:
similar range checks on xs:minLength, xs:maxLength) |
|
|
endOfData or delimited or nullTerminated |
variable |
|
type |
representation |
lengthKind |
resulting length (in characters) |
other |
xs:base64Binary |
text |
implicit |
8/6 * xs:length facet |
|
|
|
explicit |
dfdl:length |
Validation: xs:length facet * 8/6 must be
equal to resulting character length (after removing all non-base64-encoding
characters) (TBD:
similar range checks on xs:minLength, xs:maxLength) |
|
|
endOfData, delimited, nullTerminated |
Variable |
|
Looking
at the above, one way to simplify things quite a bit is to disallow the
xs:length and xs:minLength and xs:maxLength facet on hexBinary and
base64Binary types in DFDL schemas.
Then
the implicit lengthKind goes away, and the complex validation check for the
xs:length facet goes away. I recommend this.
Another
simplification alternative is to disallow representation text altogether, but I
am concerned that peopel with data that does contain hex or base64 data will
naturally want to use these types to model it. I don't recommend this.
...mikeb
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in
Registered office:
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in
Registered office:
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg