Some interesting discussion on the hexBinary and base64Binary.. Schema authors also had reservations on having these 2 types with same value space and considered a possibility of having one binary type with encoding facet to specify hex and base64 which would have been a much better solution. I am equally perplexed on the use case for having pattern (and also enumeration) facet on the hexBinary and base64Binary as the facet work on the lexical space.

May be the conservative approach for us is to treat these  built-in schema type as is and only allow the dfdl set of annotations pertaining to allowable facets..  

By copy to Sandy - do you know of a use case where pattern facet is used on hexBinary and base64Binary


http://lists.w3.org/Archives/Public/www-xml-schema-comments/2001AprJun/0067.html

Some excerpts from discussion

                 There are two possible ways to fix this before going to recommendation:
                 
                1.                 return to the CR status of a binary datatype with an
encoding facet that could be either 'hex' or 'base64'.
                2.                 eliminate one of hexBinary or base64Binary, and call the
remaining type 'binary'.  *Note that this is the most conservative
decision*.  The missing encoding could always be added back, but once
released as an official recommendation parsers would be required to
support both encodings for all time.
                 




Suman Kalia
IBM Toronto Lab
WebSphere Business Integration Application Connectivity Tools
Tel : 905-413-3923  T/L  969-3923
Fax : 905-413-4850 T/L  969-4850
Internet ID : kalia@ca.ibm.com

----- Forwarded by Suman Kalia/Toronto/IBM on 11/21/2007 11:50 AM -----
Mike Beckerle <beckerle@us.ibm.com>
Sent by: dfdl-wg-bounces@ogf.org

11/21/2007 11:08 AM

To
dfdl-wg@ogf.org
cc
Subject
[DFDL-WG] OGF DFDL WG call today 2007-11-21






We may or may not achieve quorum today because of the US holiday tomorrow and big travel day today.


I will join the call at 12noon US.ET, and if we have enough people by 12:05 then  I'd like to discuss one or more of these topics


* hexBinary and base64Binary  - I still find these confusing.
* array prefix and suffix - just review resolution of this issue - leaving out for now just to be conservative. Could put back in fairly easily.

* choiceType and length properties on xs:choice


My latest musings on hexBinary and base64Binary ....


E.g., XSD allows pattern and enumeration facets on these


    <element name="aThing" type="base64Binary" length="3"
                                                       pattern="AAAA|////" />


I think that pattern is a regexp for a base64 string matching 3 bytes of zeros or 3 bytes of all ones, but my regexp syntax is no doubt incorrectly escapified. ("A" is 6 bits of zero, "/" is 6 bits of 1 in base64).


To me this is marvelously confusing. It feels downright silly to allow pattern and enumeration on these things.


I'd like to adopt the strictest possible sensible thing.
  - hexBinary only, binary representation only, no pattern or enumeration facets supported or allowed for this type.  


There is still the issue of default/fixed.


E.g.,


    <element name="unknownStuff" type="hexBinary" fixed="F41306C0" dfdl:lengthKind="implicit" />


The user specifies what the bytes are as a hex string, and since they said it's hexBinary, they use hex to express the literal content. This is a way to say "right here in the data there's this blob of stuff I don't understand, but it always contains these data bytes". I've certainly seen the need for this sort of thing. It's being ignored on input, but on output it generates the fixed bytes of data so as to create valid output even when you don't understand this part of the data format. In fact, the cases I've seen are using this to skip over decimal numbers the wierd format of which isn't understood. The above pattern could be one or more decimal numbers in strange formats.


However, I can see the slippery slope from here to allowing pattern and enumeration facets, so I'd be happy ruling out use of default and fixed on hexBinary type also, just to make it even simpler.


Also, this element achieves the same end using our "%" escapes in strings.


  <element name="unknownStuff" type="string" dfdl:encoding="ascii" fixed="%F4%13%06%C0"  dfdl:lengthKind="implicit"/>


This works for any single-byte-wide character-set encoding.




Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan  
                priordan@us.ibm.com
                508-599-7046
--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 http://www.ogf.org/mailman/listinfo/dfdl-wg