If allowing lengthUnits 'bits' for a new logical/physical
combination has no effect on the infoset then that should be ok.
'binarySecond' & 'binaryMilliseconds'.
These were designed to correspond to C data types and are always treated
as signed. Allowing 'bits' should be ok as long as the same rules for signed
'int' & 'long' respectively are used.
'hexBinary' as you note causes a problem
as the XSD type must be a multiple of 8 bits. That's why it has the restriction
of 'bytes' only today. If we allow 'bits', then on parsing DFDL would have
to pad either using 0 bits or the corresponding bits of dfdl:fillByte,
and on unparsing DFDL would have to trim off the excess as long as it matched
0 bits or the corresponding bits of dfdl:fillByte. Today fillByte is never
used for trimming.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date:
26/01/2017 05:00
Subject:
[DFDL-WG] suggest:
need hexBinary with lengthUnits 'bits' with length not a multiple of 8.
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
We have users who have binary blobs the size of which
is given in bits, and these blobs are not a multiple of 8 long.
Today the DFDL spec doesn't allow hexBinary to have lengthUnits
'bits'.
I am wondering if this restriction should be lifted.
XSD constrains hexBinary to always have an even number
of Hex digits, so we would have to do the same.
So for an example, a 17 bit long hexBinary containing all 1 bits would
be FFFF80
Erratum 5.15 extends the types that are allowed to have
length in bits to include packed calendars. So there is precedent for opening
this restriction up if need arises.
I claim we need to
(a) allow length units bits for all types
(b) restrict the length to have to be 32-bits or 64-bits
only, for types xs:float and xs:double when representation 'binary'
(c) restrict packed decimal to have lengths be a multiple
of 4 bits (when specified in units 'bits')
All other restrictions should be lifted as those restrictions
just cause problems in some formats.
For example 12.3.7.2.5 Specifies that binary calendars
must be 4 bytes or 8 bytes exactly, and cannot be specified in units 'bits'.
This is just a mistake in DFDL. I have even seen binary calendars
with 33 bits length. (seconds since 1-1-1970 representation aka binarySeconds)
That additional bit extends the end time substantially.
These restrictions were put into DFDL because our experience
of many bit-granularity formats was limited.
What we've found is that there are plenty of data formats
where the notion of a "byte" is simply absent. Nothing uses multiples
of 8 bits for anything, and nothing is measured in those units. It's always
measured in bits. Even for things like float and double, which have
impliicit lengths of 4 and 8 bytes respectively, many specifications will
express those as 32 bits or 64 bits. Having to divide by 8 just makes the
DFDL schema awkward. Similarly in these formats strings are given length
in bits. 448 bits worth of 7-bit packed ascii characters is 64 characters,
occupying 56 bytes, but the spec uses 448.
These changes are all backward-compatible. They make legal
property settings that previously had no meaning and caused SDEs.
Discussion?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU