
I'm trying to wrap up the opaque/hexBinary/base64Binary topic. I need opinions on this discussion. Currently we have a property, dfdl:binaryType : Properties Specific to Binary Types (hexBinary, base64Binary) Property Name Description binaryType Enum This specifies the encoding method for the binary. Valid values are ‘unspecified’, ‘hexBinary’, ‘base64Binary’, ‘uuencoded’ Annotation: dfdl:element (simple type ‘binary’, ‘opaque’) This property speaks to what kinds of representations can we interpret and construct logical hexbinary values from? (similarly base64Binary) I believe the above is not clear, and causes issues with the xs:length facet of XSD. I propose the 4 tables below which describe the 4 cases: hexbinary - binary hexbinary - text base64binary - binary base64binary - text I have specified these so that the meaning of the xs:length facet is always interpreted exactly as in XSD. It always refers to the number of bytes of the unencoded binary data, and never to the number of characters in the encoded form. type representation lengthKind resulting length (in bytes) other xs:hexBinary binary implicit xs:length facet explicit dfdl:length Validation: xs:length facet must be equal to resulting length in bytes (TBD: similar range checks on xs:minLength, xs:maxLength) endOfData or delimited or nullTerminated variable type representation lengthKind resulting length (in characters) other xs:hexBinary text implicit 2 * xs:length facet explicit dfdl:length Validation: xs:length facet * 2 must be equal to resulting character length (after removing all non-hex characters) (TBD: similar range checks on xs:minLength, xs:maxLength) endOfData, delimited, nullTerminated Variable type representation dfdl:lengthKind resulting length (in bytes) other xs:base64Binary binary implicit xs:length facet explicit dfdl:length Validation: xs:length facet must be equal to resulting length in bytes (TBD: similar range checks on xs:minLength, xs:maxLength) endOfData or delimited or nullTerminated variable type representation lengthKind resulting length (in characters) other xs:base64Binary text implicit 8/6 * xs:length facet explicit dfdl:length Validation: xs:length facet * 8/6 must be equal to resulting character length (after removing all non-base64-encoding characters) (TBD: similar range checks on xs:minLength, xs:maxLength) endOfData, delimited, nullTerminated Variable Looking at the above, one way to simplify things quite a bit is to disallow the xs:length and xs:minLength and xs:maxLength facet on hexBinary and base64Binary types in DFDL schemas. Then the implicit lengthKind goes away, and the complex validation check for the xs:length facet goes away. I recommend this. Another simplification alternative is to disallow representation text altogether, but I am concerned that peopel with data that does contain hex or base64 data will naturally want to use these types to model it. I don't recommend this. ...mikeb Mike Beckerle STSM, Architect, Scalable Computing IBM Software Group Information Platform and Solutions Westborough, MA 01581 direct: voice and FAX 508-599-7148 assistant: Pam Riordan priordan@us.ibm.com 508-599-7046