
I agree that bitOrder is needed, not byteOrder. If you want to parse the data as an integer, then fine but that is not the case here, you are parsing the data as hexBinary. The analogy is with your parsing of text strings where the encoding is one where the character size is not a multiple of 8 bytes; you use bitOrder but not byteOrder. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Stephen Lawrence <slawrence@tresys.com> To: Steve Hanson <smh@uk.ibm.com>, "mbeckerle.dfdl@gmail.com" <mbeckerle.dfdl@gmail.com> Cc: DFDL-WG <dfdl-wg@ogf.org> Date: 30/11/2018 18:10 Subject: Re: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with lengthUnits bits As an example of why I feel bitOrder and byteOrder apply if supporting hexBinary with non-byte size lengths or starting on non-byte boundaries, let's say we we had the following data: 11011111 11010001 = 0xDFD1 And we want to model this as one 12-bit unsigned int followed by one 4-bit unsigned int, all with bitOrder=LSBF and byteOrder=LE. We would have a schema like so: <dfdl:format lengthKind="explicit" lengthUnits="bits" bitOrder="leastSignifigantBitFirst" byteOrder="littleEndian" /> <xs:sequence> <xs:element name="foo" dfdl:length="12" type="xs:unsignedInt" /> <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" /> </xs:sequence> The above data would parse as: <foo>479</foo> <!-- binary: 000111011111, hex 0x1DF --> <bar>13</bar> <!-- binary: 1101, hex 0xD --> This is because due to the bit/byteOrder, "foo" is made up of the last four bits in second byte (0001) followed by the first eight bits of the first byte (11011111), resulting in a value of 479. The bitPosition after "foo" is consumed is 12. Then "bar" consumes the remaining bits, which are the first four of the second byte, resulting in a value of 13. This all follows the specification as-is. Now, let's assume we instead wanted to represent "foo" as xs:hexBinary that has a non-byte size length, e.g.: <xs:sequence> <xs:element name="foo" dfdl:length="12" type="xs:hexBinary" /> <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" /> </xs:sequence> If we ignored bitOrder/bytOrder when parsing "foo" read the first 12 bits (essentially BE MSBF), the result would be: <foo>0DFD</foo> But just like before, the bitPosition after "foo" is consumed is 12. And because the bit/byteOrder is LSBF LE, the bits that "bar" will consume are again the first four of the second byte, with the result <bar>13</bar> But this means that the last four bits in the data (0001) were never consumed, and the first four bits in the second byte were consumed twice, which must be wrong (a similar issue occurs when starting on a non-byte boundary). So bitOrder/byteOrder must be taken into account somehow in order to support hexBinary with non-bytesize lengths or starting on a non-byte boundary, primarily because of how bitOrder=LSBF works (which I believe was the original use-case for non-byte size non-byte boundary hexBinary). If instead we do not ignore bit/byteOrder, there must be some way to determine how to get those bits into a hexBinary representation. There are probably a few different ways to handle this, but after some discussions and interpretations of the XSD spec, we determined that the best way to handle it was to just read the bits as if they were a nonNegativeInteger (which does take into account bit/byteOrder) and then convert those bits to a hex representation. For BE MSBF the result is exactly the same. For LE MBSF, it results in the hexBinary being flipped, which is where the Daffodil implementation is inconsistent with spec. On 11/29/18 10:19 AM, Steve Hanson wrote:
Mike
I'm a bit lost on this now. The concept of applying lengthUnits='bits' to xs:hexBinary is straightforward. It just counts bits. Bit order or byte order is irrelevant, in the same way that it is irrelevant when counting bytes for a hex binary. The only thing to note is that the fillByte needs to be used to make up whole bytes.
I'm missing something here.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK Architect, _IBM DFDL_ < http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, _OGF DFDL Working Group_ < http://www.ogf.org/dfdl/ _ __smh@uk.ibm.com_ <mailto:smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: DFDL-WG <dfdl-wg@ogf.org> Date: 20/11/2018 17:33 Subject: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with lengthUnits bits Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org>
--------------------------------------------------------------------------------
Users want a way to express an arbitrary unaligned string of bits, with
the
appearance in the infoset being hexadecimal, not base 10.
Right now the only way I can see to meet this requirement while retaining backward compatibility would be a new DFDL property.
So here's the new idea:
Property dfdl:hexBinaryRep with values 'bytes' or 'bits'. New property, so defaulting (with suppressible warning) to 'bytes' for backward compatibility in schemas not having the property.
When set to 'bits', then type xs:hexBinary would behave just like xs:nonNegativeInteger, and all properties relevant to that type would be
applicable, and any use of XSD length facets on such elements would be an SDE. The hexBinary string would be exactly same as if you took the numeric value for a nonNegativeInteger and instead of presenting it as base 10 digits, you use base 16 digits.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | _www.tresys.com_ < http://www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the _OGF Intellectual Property Policy_ < http://www.ogf.org/About/abt_policies.php
-- dfdl-wg mailing list dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-- dfdl-wg mailing list dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU