lengthUnits bits not allowed for complex type - Unnecessary restriction on length units?

In section 12.3.7.3 we have this sentence: "The dfdl:lengthUnits may be 'bytes' or 'characters' and it is a schema definition error otherwise. " Does anyone recall why we have this? I have data formats which are bit oriented and there are complex types which would naturally not be a multiple of 8 bytes long, e.g., 1 bit field, 3 bit field, 10 bit field, 6 bit field = 20 bits. I can't think of any reason for this restriction other than to explain how fillByte is used to fill in unused bits. But I think we can say that any unused bits are filled in with bits from the fillByte, and we don't have to be specific about which bits from the fillByte. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy<http://www.ogf.org/About/abt_policies.php>

I know on the call I agreed that complex types with length bits wouldn't be needed, but I rechecked the format I am trying to implement, which is MIL-STD-2045-47001D (which is public) and I found this (emphasis mine): 5.6.42 Group Size field. This field shall be a 12-bit binary number indicating the size,* in bits,*of the Future Use Group in which this field is contained. A value of "0" should not be used for this field. If the parent group is specified present then this child field is mandatory. Turns out that these things are all just binary blobs in this spec document. There is no sub-structure provided as these are "Future Use", but it's always possible they will get specified or already are and I am just unaware of the document which gives the format of some of these. There are a flock of related standards to this one. Most are, unfortunately, not publicly available. I am at a bit of a loss how to model these bit-length fields. HexBinary type allows length only in bytes to be compatible with XSD hexBinary which has a string-of-hex representation. I suppose I could use an xs:nonNegativeInteger with up to 4096 bits as the blob. Is there any other viable option? (array of bit is not viable) My concrete suggestion: this data format is the same one that motivates the proposed dfdl:bitOrder property. I suggest that dfdl:bitOrder, along with any other issues needed to implement this standard, all be addressed at once when I've had time to complete an initial implementation in the daffodil code base. Lifting the restriction on these lengthUnits bits for complex types may well be required. ...mikeb Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy<http://www.ogf.org/About/abt_policies.php> On Tue, Mar 25, 2014 at 9:01 AM, Mike Beckerle <mbeckerle.dfdl@gmail.com>wrote:
In section 12.3.7.3 we have this sentence:
"The dfdl:lengthUnits may be 'bytes' or 'characters' and it is a schema definition error otherwise. "
Does anyone recall why we have this?
I have data formats which are bit oriented and there are complex types which would naturally not be a multiple of 8 bytes long, e.g., 1 bit field, 3 bit field, 10 bit field, 6 bit field = 20 bits.
I can't think of any reason for this restriction other than to explain how fillByte is used to fill in unused bits. But I think we can say that any unused bits are filled in with bits from the fillByte, and we don't have to be specific about which bits from the fillByte.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy<http://www.ogf.org/About/abt_policies.php>

Mike, I think the expectation was that such fields would be modelled as unsigned integers. The issue then becomes one of max size. From your MIL spec quote below, it looks like this type is dfdl:lengthKind 'prefixed', with dfdl:prefixLengthIncludesPrefix 'yes' ? Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Date: 25/03/2014 18:08 Subject: Re: [DFDL-WG] lengthUnits bits not allowed for complex type - Unnecessary restriction on length units? Sent by: dfdl-wg-bounces@ogf.org I know on the call I agreed that complex types with length bits wouldn't be needed, but I rechecked the format I am trying to implement, which is MIL-STD-2045-47001D (which is public) and I found this (emphasis mine): 5.6.42 Group Size field. This field shall be a 12-bit binary number indicating the size, in bits, of the Future Use Group in which this field is contained. A value of “0” should not be used for this field. If the parent group is specified present then this child field is mandatory. Turns out that these things are all just binary blobs in this spec document. There is no sub-structure provided as these are "Future Use", but it's always possible they will get specified or already are and I am just unaware of the document which gives the format of some of these. There are a flock of related standards to this one. Most are, unfortunately, not publicly available. I am at a bit of a loss how to model these bit-length fields. HexBinary type allows length only in bytes to be compatible with XSD hexBinary which has a string-of-hex representation. I suppose I could use an xs:nonNegativeInteger with up to 4096 bits as the blob. Is there any other viable option? (array of bit is not viable) My concrete suggestion: this data format is the same one that motivates the proposed dfdl:bitOrder property. I suggest that dfdl:bitOrder, along with any other issues needed to implement this standard, all be addressed at once when I've had time to complete an initial implementation in the daffodil code base. Lifting the restriction on these lengthUnits bits for complex types may well be required. ...mikeb Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy On Tue, Mar 25, 2014 at 9:01 AM, Mike Beckerle <mbeckerle.dfdl@gmail.com> wrote: In section 12.3.7.3 we have this sentence: "The dfdl:lengthUnits may be 'bytes' or 'characters' and it is a schema definition error otherwise. " Does anyone recall why we have this? I have data formats which are bit oriented and there are complex types which would naturally not be a multiple of 8 bytes long, e.g., 1 bit field, 3 bit field, 10 bit field, 6 bit field = 20 bits. I can't think of any reason for this restriction other than to explain how fillByte is used to fill in unused bits. But I think we can say that any unused bits are filled in with bits from the fillByte, and we don't have to be specific about which bits from the fillByte. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Mike Beckerle
-
Steve Hanson