Mike
Comments in-line.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
dfdl-wg@ogf.org
Date:
15/05/2018 22:10
Subject:
[DFDL-WG] Action
292 - Write up of hexBinary with lengthUnits 'bits'
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
The proposed change is to allow lengthUnits 'bits' for
hexBinary data. This turned out to be more complex to describe than I originally
suspected because of the need to deal with XSD minLength and maxLength
facets, which are always measured in bytes. Those in conjunction with dfdl:lengthKind='explicit'
and dfdl:lengthUnits='bits' create some minor complexities.
The below changes match the Daffodil implementation of
this proposed feature.
These sentences in the description of dfdl:lengthUnits
in Section 12.3 must change.
- 'bits' may only be used for xs:boolean, xs:byte, xs:short,
xs:int, xs:long, xs:unsignedByte, xs:unsignedShort, xs:unsignedInt, and
xs:unsignedLong simple types with binary representation.
- 'bytes' must be used for type xs:hexBinary.
The text should read
- 'bits' may only be used for xs:hexBinary and for xs:boolean,
xs:byte, xs:short, xs:int, xs:long, xs:unsignedByte, xs:unsignedShort,
xs:unsignedInt, and xs:unsignedLong simple types with binary representation.
Later in the
section 12.3.2, the
paragraph:
When unparsing a simple element with binary representation,
then for hexBinary the length is the number of bytes in the infoset value
padded to the XSD minLength facet value using dfdl:fillByte, and for the
other types the length is the minimum number of bytes to represent the
value and any sign.
Must change to:
When unparsing a simple element with binary representation,
then for types other than hexBinary the length is the minimum number of
bytes to represent the value and any sign.
For type hexBinary when the dfdl:lengthUnits is 'bytes'
then the length is the number of bytes in the infoset value padded to the
XSD minLength facet value using dfdl:fillByte.
For type hexBinary when the dfdl:lengthUnits is 'bits':
- First the data is padded to XSD minLength bytes as if
the dfdl:lengthUnits was 'bytes'.
- When dfdl:lengthKind is other than 'explicit', the length
in bits is the number
of bytes times 8.
- When the dfdl:lengthKind is 'explicit' then the value
is further padded or truncated to fit the target length, in bits.
- if the data does not have sufficient bytes to supply the
target length in bits it is a processing error. <---
no, see 12.3.7.2.7 and you just said you padded it
- if the data is longer than the minimum number of bytes
needed to supply the target length in bits, it is a processing error.
- If the explicit length in bits is not a multiple of 8,
then the final byte is only partially unparsed according to the current
dfdl:byteOrder <--- you mean bitOrder
>>SMH: Section 12.3.2 is about dfdl:lengthKind
'delimited', so discussion should be limited to delimited only, and detail
for other length kinds moved to their sections or 12.3.7.2.7 for stuff
common to specified lengths.
>>SMH: Need to consider the impact of
this for lengthKinds implicit, explicit, prefixed, as they all use lengthUnits.
And for explicit, need to cover when dfdl:length is an expression (so variable
length on output).
Section 12.3.7.2.7 Length of Binary Opaque
Elements, the first sentence must be modified from:
"The dfdl:lengthUnits property must be 'bytes'. It
is a schema definition error otherwise."
to
"The dfdl:lengthUnits
property must be 'bytes' or 'bits'. It is a schema definition error otherwise.
Note that even when the dfdl:lengthUnits property
is 'bits', the values of the XSD minLength and XSD maxLength facets are
still always interpreted as constraining the length in units of bytes.
>>SMH: Earlier in 12.3.7.2 it says "The
dfdl:lengthUnits can be 'bytes' or 'bits' unless otherwise stated. It is
schema definition error if dfdl:lengthUnits is 'characters'. "
so the first two sentences can just be removed.
That's the end of the actual proposed language.
Note about alternatives: I considered the alternative
to make it a schema definition error when dfdl:lengthUnits is 'bits' type
is 'xs:hexBinary', and the XSD:minLength or XSD:maxLength facets are defined.
I decided to go with the description above to support the use case where
a data item is, in hex, some number of bytes long for a valid XML infoset,
but the representation is explicitly a partial byte smaller. E.g., in a
hexBinary with 8 bytes, but fewer than 64 bits in the representation. Ex:
63 bits as the explicit target length, so 1 bit will be unused from the
xs:hexBinary logical value, but the above rules insure no more than 7 bits
go unused from the final byte of the hexBinary logical value.
This is trying to be consistent with the notion that we
do not truncate data to fit into the available length except for xs:string
when properties explicitly allow it.
It is also trying to be consistent with our treatment
of binary integers where a xs:long value may be output into an element
having room for any number of bits, and any extra bits in the logical value
are ignored.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU