Action 292 - version 2 proposal for hexBinary with lengthUnits bits - dfdl-wg

newer
Ambiguity or Correct ? Behavior of...

Action 292 - version 2 proposal for hexBinary with lengthUnits bits

Mike Beckerle

20 Nov 2018 20 Nov '18

5:32 p.m.

Users want a way to express an arbitrary unaligned string of bits, with the appearance in the infoset being hexadecimal, not base 10. Right now the only way I can see to meet this requirement while retaining backward compatibility would be a new DFDL property. So here's the new idea: Property dfdl:hexBinaryRep with values 'bytes' or 'bits'. New property, so defaulting (with suppressible warning) to 'bytes' for backward compatibility in schemas not having the property. When set to 'bits', then type xs:hexBinary would behave just like xs:nonNegativeInteger, and all properties relevant to that type would be applicable, and any use of XSD length facets on such elements would be an SDE. The hexBinary string would be exactly same as if you took the numeric value for a nonNegativeInteger and instead of presenting it as base 10 digits, you use base 16 digits. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>

Attachments:

attachment.html (text/html — 1.5 KB)

Show replies by date

Steve Hanson

29 Nov 29 Nov

3:19 p.m.

New subject: Action 292 - version 2 proposal for hexBinary with lengthUnits bits

Mike I'm a bit lost on this now. The concept of applying lengthUnits='bits' to xs:hexBinary is straightforward. It just counts bits. Bit order or byte order is irrelevant, in the same way that it is irrelevant when counting bytes for a hex binary. The only thing to note is that the fillByte needs to be used to make up whole bytes. I'm missing something here. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: DFDL-WG <dfdl-wg@ogf.org> Date: 20/11/2018 17:33 Subject: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with lengthUnits bits Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> Users want a way to express an arbitrary unaligned string of bits, with the appearance in the infoset being hexadecimal, not base 10. Right now the only way I can see to meet this requirement while retaining backward compatibility would be a new DFDL property. So here's the new idea: Property dfdl:hexBinaryRep with values 'bytes' or 'bits'. New property, so defaulting (with suppressible warning) to 'bytes' for backward compatibility in schemas not having the property. When set to 'bits', then type xs:hexBinary would behave just like xs:nonNegativeInteger, and all properties relevant to that type would be applicable, and any use of XSD length facets on such elements would be an SDE. The hexBinary string would be exactly same as if you took the numeric value for a nonNegativeInteger and instead of presenting it as base 10 digits, you use base 16 digits. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Stephen Lawrence

30 Nov 30 Nov

6:10 p.m.

New subject: Action 292 - version 2 proposal for hexBinary with lengthUnits bits

As an example of why I feel bitOrder and byteOrder apply if supporting hexBinary with non-byte size lengths or starting on non-byte boundaries, let's say we we had the following data: 11011111 11010001 = 0xDFD1 And we want to model this as one 12-bit unsigned int followed by one 4-bit unsigned int, all with bitOrder=LSBF and byteOrder=LE. We would have a schema like so: <dfdl:format lengthKind="explicit" lengthUnits="bits" bitOrder="leastSignifigantBitFirst" byteOrder="littleEndian" /> <xs:sequence> <xs:element name="foo" dfdl:length="12" type="xs:unsignedInt" /> <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" /> </xs:sequence> The above data would parse as: <foo>479</foo>  <bar>13</bar>  This is because due to the bit/byteOrder, "foo" is made up of the last four bits in second byte (0001) followed by the first eight bits of the first byte (11011111), resulting in a value of 479. The bitPosition after "foo" is consumed is 12. Then "bar" consumes the remaining bits, which are the first four of the second byte, resulting in a value of 13. This all follows the specification as-is. Now, let's assume we instead wanted to represent "foo" as xs:hexBinary that has a non-byte size length, e.g.: <xs:sequence> <xs:element name="foo" dfdl:length="12" type="xs:hexBinary" /> <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" /> </xs:sequence> If we ignored bitOrder/bytOrder when parsing "foo" read the first 12 bits (essentially BE MSBF), the result would be: <foo>0DFD</foo> But just like before, the bitPosition after "foo" is consumed is 12. And because the bit/byteOrder is LSBF LE, the bits that "bar" will consume are again the first four of the second byte, with the result <bar>13</bar> But this means that the last four bits in the data (0001) were never consumed, and the first four bits in the second byte were consumed twice, which must be wrong (a similar issue occurs when starting on a non-byte boundary). So bitOrder/byteOrder must be taken into account somehow in order to support hexBinary with non-bytesize lengths or starting on a non-byte boundary, primarily because of how bitOrder=LSBF works (which I believe was the original use-case for non-byte size non-byte boundary hexBinary). If instead we do not ignore bit/byteOrder, there must be some way to determine how to get those bits into a hexBinary representation. There are probably a few different ways to handle this, but after some discussions and interpretations of the XSD spec, we determined that the best way to handle it was to just read the bits as if they were a nonNegativeInteger (which does take into account bit/byteOrder) and then convert those bits to a hex representation. For BE MSBF the result is exactly the same. For LE MBSF, it results in the hexBinary being flipped, which is where the Daffodil implementation is inconsistent with spec. On 11/29/18 10:19 AM, Steve Hanson wrote:

...

Mike

I'm a bit lost on this now. The concept of applying lengthUnits='bits' to xs:hexBinary is straightforward. It just counts bits. Bit order or byte order is irrelevant, in the same way that it is irrelevant when counting bytes for a hex binary. The only thing to note is that the fillByte needs to be used to make up whole bytes.

I'm missing something here.

Regards

Steve Hanson

IBM Hybrid Integration, Hursley, UK Architect, _IBM DFDL_ <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, _OGF DFDL Working Group_ <http://www.ogf.org/dfdl/>_ __smh@uk.ibm.com_ <mailto:smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday

From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: DFDL-WG <dfdl-wg@ogf.org> Date: 20/11/2018 17:33 Subject: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with lengthUnits bits Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org>

--------------------------------------------------------------------------------

Users want a way to express an arbitrary unaligned string of bits, with the appearance in the infoset being hexadecimal, not base 10.

Right now the only way I can see to meet this requirement while retaining backward compatibility would be a new DFDL property.

So here's the new idea:

Property dfdl:hexBinaryRep with values 'bytes' or 'bits'. New property, so defaulting (with suppressible warning) to 'bytes' for backward compatibility in schemas not having the property.

When set to 'bits', then type xs:hexBinary would behave just like xs:nonNegativeInteger, and all properties relevant to that type would be applicable, and any use of XSD length facets on such elements would be an SDE. The hexBinary string would be exactly same as if you took the numeric value for a nonNegativeInteger and instead of presenting it as base 10 digits, you use base 16 digits.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | _www.tresys.com_ <http://www.tresys.com> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the _OGF Intellectual Property Policy_ <http://www.ogf.org/About/abt_policies.php> -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Steve Hanson

4 Dec 4 Dec

9:10 a.m.

New subject: Action 292 - version 2 proposal for hexBinary with lengthUnits bits

I agree that bitOrder is needed, not byteOrder. If you want to parse the data as an integer, then fine but that is not the case here, you are parsing the data as hexBinary. The analogy is with your parsing of text strings where the encoding is one where the character size is not a multiple of 8 bytes; you use bitOrder but not byteOrder. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Stephen Lawrence <slawrence@tresys.com> To: Steve Hanson <smh@uk.ibm.com>, "mbeckerle.dfdl@gmail.com" <mbeckerle.dfdl@gmail.com> Cc: DFDL-WG <dfdl-wg@ogf.org> Date: 30/11/2018 18:10 Subject: Re: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with lengthUnits bits As an example of why I feel bitOrder and byteOrder apply if supporting hexBinary with non-byte size lengths or starting on non-byte boundaries, let's say we we had the following data: 11011111 11010001 = 0xDFD1 And we want to model this as one 12-bit unsigned int followed by one 4-bit unsigned int, all with bitOrder=LSBF and byteOrder=LE. We would have a schema like so: <dfdl:format lengthKind="explicit" lengthUnits="bits" bitOrder="leastSignifigantBitFirst" byteOrder="littleEndian" /> <xs:sequence> <xs:element name="foo" dfdl:length="12" type="xs:unsignedInt" /> <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" /> </xs:sequence> The above data would parse as: <foo>479</foo>  <bar>13</bar>  This is because due to the bit/byteOrder, "foo" is made up of the last four bits in second byte (0001) followed by the first eight bits of the first byte (11011111), resulting in a value of 479. The bitPosition after "foo" is consumed is 12. Then "bar" consumes the remaining bits, which are the first four of the second byte, resulting in a value of 13. This all follows the specification as-is. Now, let's assume we instead wanted to represent "foo" as xs:hexBinary that has a non-byte size length, e.g.: <xs:sequence> <xs:element name="foo" dfdl:length="12" type="xs:hexBinary" /> <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" /> </xs:sequence> If we ignored bitOrder/bytOrder when parsing "foo" read the first 12 bits (essentially BE MSBF), the result would be: <foo>0DFD</foo> But just like before, the bitPosition after "foo" is consumed is 12. And because the bit/byteOrder is LSBF LE, the bits that "bar" will consume are again the first four of the second byte, with the result <bar>13</bar> But this means that the last four bits in the data (0001) were never consumed, and the first four bits in the second byte were consumed twice, which must be wrong (a similar issue occurs when starting on a non-byte boundary). So bitOrder/byteOrder must be taken into account somehow in order to support hexBinary with non-bytesize lengths or starting on a non-byte boundary, primarily because of how bitOrder=LSBF works (which I believe was the original use-case for non-byte size non-byte boundary hexBinary). If instead we do not ignore bit/byteOrder, there must be some way to determine how to get those bits into a hexBinary representation. There are probably a few different ways to handle this, but after some discussions and interpretations of the XSD spec, we determined that the best way to handle it was to just read the bits as if they were a nonNegativeInteger (which does take into account bit/byteOrder) and then convert those bits to a hex representation. For BE MSBF the result is exactly the same. For LE MBSF, it results in the hexBinary being flipped, which is where the Daffodil implementation is inconsistent with spec. On 11/29/18 10:19 AM, Steve Hanson wrote:

...

Mike

I'm a bit lost on this now. The concept of applying lengthUnits='bits' to xs:hexBinary is straightforward. It just counts bits. Bit order or byte order is irrelevant, in the same way that it is irrelevant when counting bytes for a hex binary. The only thing to note is that the fillByte needs to be used to make up whole bytes.

I'm missing something here.

Regards

Steve Hanson

IBM Hybrid Integration, Hursley, UK Architect, _IBM DFDL_ < http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, _OGF DFDL Working Group_ < http://www.ogf.org/dfdl/ _ __smh@uk.ibm.com_ <mailto:smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday

From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: DFDL-WG <dfdl-wg@ogf.org> Date: 20/11/2018 17:33 Subject: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with lengthUnits bits Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org>

--------------------------------------------------------------------------------

...

Users want a way to express an arbitrary unaligned string of bits, with

the

...

appearance in the infoset being hexadecimal, not base 10.

Right now the only way I can see to meet this requirement while retaining backward compatibility would be a new DFDL property.

So here's the new idea:

Property dfdl:hexBinaryRep with values 'bytes' or 'bits'. New property, so defaulting (with suppressible warning) to 'bytes' for backward compatibility in schemas not having the property.

When set to 'bits', then type xs:hexBinary would behave just like xs:nonNegativeInteger, and all properties relevant to that type would be

...

applicable, and any use of XSD length facets on such elements would be an SDE. The hexBinary string would be exactly same as if you took the numeric value for a nonNegativeInteger and instead of presenting it as base 10 digits, you use base 16 digits.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | _www.tresys.com_ < http://www.tresys.com

Please note: Contributions to the DFDL Workgroup's email discussions are subject to the _OGF Intellectual Property Policy_ < http://www.ogf.org/About/abt_policies.php

-- dfdl-wg mailing list dfdl-wg@ogf.org

https://www.ogf.org/mailman/listinfo/dfdl-wg

...

Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number

741598.

...

Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-- dfdl-wg mailing list dfdl-wg@ogf.org

https://www.ogf.org/mailman/listinfo/dfdl-wg

...

Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Mike Beckerle

7 Dec 7 Dec

10 p.m.

New subject: Action 292 - version 2 proposal for hexBinary with lengthUnits bits

I went over this issue again mentally. Here's what I came up with. Note I am using fixed-width font because of some ascii-art in this email. So one thing we realized the other day is we need at least this much amending of the proposal. Changing what xs:hexBinary means when dfdl:lengthUnits='bits' would be binary incompatible. Right now there are schemas with xs:hexBinary in them where dfdl:lengthUnits='bits' is in scope, but is being ignored because DFDL v1.0 says it doesn't apply to hexBinary. So at minimum we need a property to switch on bits-centric behavior for xs:hexBinary. Next, we know that XSD constrains things. The length facets are applicable to hexBinary and are always measured in bytes. Hence, lexically, there should only EVER be an even number of hex digits in a hexBinary, and if the facets are present, then the length units cannot be bits or the values of the facets would be misleading. So even if the number of bits is 17, you should get 6 hex digits, not 5. (I think XML validators may fail on odd number of hex digits. Not necessarily all of them, but some may.) Third, there's no debate that bitOrder matters. The question is only about whether byteOrder should matter. Given that then I think there are two possible interpretations of hexBinary. I'll call them the "byte string" way, and the "binary number" way. THE BYTE STRING WAY The following would be invariants * byte order doesn't matter ever * if the hexBinary's representation is aligned to an 8-bit boundary and is a muliple of 8-bits long, then the logical value is the same regardless of bitOrder. Consider this data stream as hex bytes DE AD BE EF. Regardless of bit order, all 32 bits taken together, starting on a byte boundary, the only hexBinary rep would be <foo>DEADBEEF</foo> Now consider we start at bit 5 (1-based numbering) and proceed for only 24 bits. So we're not going to consume the first 4 bits, nor the last 4 bits. Where first and last here are relative to the bitOrder. When bitOrder is MSBF, we would want the data to be <foo>EADBEE</foo> When bitOrder is LSBF, we would want the data to be <foo>DDEAFB</foo> (Write the whole bytes backwards, drop first and last nibble, then reverse again). Now consider we start at bit 6 and proceed for 22 bits. when bitOrder is MSBF we would want the data to be .... D E A D B E E F 1101 1110 1010 1101 1011 1110 1110 1111 xxxx x110 1010 1101 1011 1110 111x xxxx D 5 B 7 D C <foo>D5B7DC</foo> Note to get the final C, we had to extend the final byte with 2 zero bits, and this is done by shift left/pad on right (least significant side) when bit order is LSBF we wuld want the data to be.... D E A D B E E F 1101 1110 1010 1101 1011 1110 1110 1111 reverse the bytes (not the nibbles, the bytes) E F B E A D D E 1110 1111 1011 1110 1010 1101 1101 1110 xxxx x111 1011 1110 1010 1101 110x xxxx 3 D F 5 6 E Now reverse the bytes again <foo>6EF53D</foo> Note to get the 3 in the final byte we had to assume 2 zero bits on the left (most significant side). In the above, we're effectively treating hexBinary as a sequence of 8-bit integers, followed by a less-than-8-bit integer if the length is not a mulitple of 8 bits, and this less than 8-bit integer gets adjusted to be a full byte in a bitOrder aware way. We don't need byte order because we're never considering a number that occupies more than 8-bits at a time. THE BINARY NUMBER WAY The second way to do hexBinary would be to effectively treat it as a minor variation on a xs:nonNegativeInteger with binaryNumberRep='binary'. In this case, if the bytes are DEADBEEF, and the byte order is bigEndian, the string is <foo>DEADBEEF</foo>, but if byteOrder is littleEndian the string is <foo>EFBEADDE</foo> In this case byteOrder matters. Bit order didn't matter because we were dealing with whole bytes. We are always going to represent 2-digits for each byte of length (rounding up for the final byte). So for 3 bytes, as if the textNumberPattern was "000000". So there will be leading zeros sometimes, (Also we use hex digits,... goes without saying.) If we consider the first example above, DEADBEEF where we remove first and last nibbles, then when bitOrder MSBF and byteOrder bigEndian - no change from above when bitOrder MSBF and byteOrder littleEndian - <foo>EEDBEA</foo> (reversed from above) when bitOrder LSBF and byteOrder littleEndian - <foo>FBEADD</foo> (reversed from above) when bitOrder LSBF and byteOrder bigEndian (Not allowed in DFDL now) - no change from above. Revisiting the 22-bit long examples from above, but adding byteOrder to them, when bitOrder MSBF and byteOrder bigEndian - no change from above when bitOrder MSBF and byteOrder littleEndian - <foo>DCB7D5</foo> (reversed from above) when bitOrder LSBF and byteOrder littleEndian - <foo>3DF56E</foo> (reversed from above) when bitOrder LSBF and byteOrder bigEndian (Not allowed in DFDL now) - no change from above. My evaluation of this is that the numeric treatment here is actually a bit problematic because a hexBinary is not a number represented in base 16 - conceptually it is a byte array. If I look at the XML infoset, first pair of hex digits (leftmost) I expect to be able to look at the data stream, and find that bit pattern. True I must know the bitOrder. But if I throw byte order into the mix, I potentially have to go to the end of the hexBinary (and these can be quite big. Could be screenfuls or megabytes of data away) to find the hex digits that correspond to the current location in the data stream. This is no different than for a base 10 number, but because those are base 10 I'm never going to be doing that for a giant base 10 number. Conclusion. I see no advantage to the BINARY NUMBER way over the BYTE STRING way. It changes what you get based on byte order which seems unnecessary. I think the added flexibility is not required. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Tue, Dec 4, 2018 at 4:10 AM Steve Hanson <smh@uk.ibm.com> wrote:

...

I agree that bitOrder is needed, not byteOrder. If you want to parse the data as an integer, then fine but that is not the case here, you are parsing the data as hexBinary. The analogy is with your parsing of text strings where the encoding is one where the character size is not a multiple of 8 bytes; you use bitOrder but not byteOrder.

Regards

Steve Hanson

IBM Hybrid Integration, Hursley, UK Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday

From: Stephen Lawrence <slawrence@tresys.com> To: Steve Hanson <smh@uk.ibm.com>, "mbeckerle.dfdl@gmail.com" < mbeckerle.dfdl@gmail.com> Cc: DFDL-WG <dfdl-wg@ogf.org> Date: 30/11/2018 18:10 Subject: Re: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with lengthUnits bits ------------------------------

As an example of why I feel bitOrder and byteOrder apply if supporting hexBinary with non-byte size lengths or starting on non-byte boundaries, let's say we we had the following data:

11011111 11010001 = 0xDFD1

And we want to model this as one 12-bit unsigned int followed by one 4-bit unsigned int, all with bitOrder=LSBF and byteOrder=LE. We would have a schema like so:

<dfdl:format lengthKind="explicit" lengthUnits="bits" bitOrder="leastSignifigantBitFirst" byteOrder="littleEndian" />

<xs:sequence> <xs:element name="foo" dfdl:length="12" type="xs:unsignedInt" /> <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" /> </xs:sequence>

The above data would parse as:

<foo>479</foo>  <bar>13</bar> 

This is because due to the bit/byteOrder, "foo" is made up of the last four bits in second byte (0001) followed by the first eight bits of the first byte (11011111), resulting in a value of 479. The bitPosition after "foo" is consumed is 12. Then "bar" consumes the remaining bits, which are the first four of the second byte, resulting in a value of 13.

This all follows the specification as-is.

Now, let's assume we instead wanted to represent "foo" as xs:hexBinary that has a non-byte size length, e.g.:

<xs:sequence> <xs:element name="foo" dfdl:length="12" type="xs:hexBinary" /> <xs:element name="bar" dfdl:length="4" type="xs:unsignedInt" /> </xs:sequence>

If we ignored bitOrder/bytOrder when parsing "foo" read the first 12 bits (essentially BE MSBF), the result would be:

<foo>0DFD</foo>

But just like before, the bitPosition after "foo" is consumed is 12. And because the bit/byteOrder is LSBF LE, the bits that "bar" will consume are again the first four of the second byte, with the result

<bar>13</bar>

But this means that the last four bits in the data (0001) were never consumed, and the first four bits in the second byte were consumed twice, which must be wrong (a similar issue occurs when starting on a non-byte boundary). So bitOrder/byteOrder must be taken into account somehow in order to support hexBinary with non-bytesize lengths or starting on a non-byte boundary, primarily because of how bitOrder=LSBF works (which I believe was the original use-case for non-byte size non-byte boundary hexBinary).

If instead we do not ignore bit/byteOrder, there must be some way to determine how to get those bits into a hexBinary representation. There are probably a few different ways to handle this, but after some discussions and interpretations of the XSD spec, we determined that the best way to handle it was to just read the bits as if they were a nonNegativeInteger (which does take into account bit/byteOrder) and then convert those bits to a hex representation. For BE MSBF the result is exactly the same. For LE MBSF, it results in the hexBinary being flipped, which is where the Daffodil implementation is inconsistent with spec.

On 11/29/18 10:19 AM, Steve Hanson wrote:

...
Mike

I'm a bit lost on this now. The concept of applying lengthUnits='bits' to xs:hexBinary is straightforward. It just counts bits. Bit order or byte order is irrelevant, in the same way that it is irrelevant when counting bytes for a hex binary. The only thing to note is that the fillByte needs to be used to make up whole bytes.

I'm missing something here.

Regards

Steve Hanson

IBM Hybrid Integration, Hursley, UK Architect, _IBM DFDL_ < http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, _OGF DFDL Working Group_ <http://www.ogf.org/dfdl/>_ __smh@uk.ibm.com_ <mailto:smh@uk.ibm.com <smh@uk.ibm.com>> tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday

From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: DFDL-WG <dfdl-wg@ogf.org> Date: 20/11/2018 17:33 Subject: [DFDL-WG] Action 292 - version 2 proposal for hexBinary with

...
lengthUnits bits Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org>

--------------------------------------------------------------------------------

...
Users want a way to express an arbitrary unaligned string of bits, with

...
appearance in the infoset being hexadecimal, not base 10.

Right now the only way I can see to meet this requirement while retaining backward compatibility would be a new DFDL property.

So here's the new idea:

Property dfdl:hexBinaryRep with values 'bytes' or 'bits'. New property, so defaulting (with suppressible warning) to 'bytes' for backward compatibility in schemas not having the property.

When set to 'bits', then type xs:hexBinary would behave just like xs:nonNegativeInteger, and all properties relevant to that type would be applicable, and any use of XSD length facets on such elements would be an SDE. The hexBinary string would be exactly same as if you took the numeric value for a nonNegativeInteger and instead of presenting it as base 10 digits, you use base 16 digits.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | _www.tresys.com_ <http://www.tresys.com> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the _OGF Intellectual Property Policy_ <http://www.ogf.org/About/abt_policies.php> -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number

the 741598.

...
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

2520

Age (days ago)

2537

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

Mike Beckerle
Stephen Lawrence
Steve Hanson