Re: [DFDL-WG] clarification needed - unparser - do we use mandatory text alignment for zero-length strings?

Thanks. I think the simplest way to say this is that the element is aligned because it is representation text and the charset has a mandatory alignment, the element content is of length 0. The alignment of the element and the content length are orthogonal. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Fri, Jan 22, 2016 at 4:35 AM, Steve Hanson <smh@uk.ibm.com> wrote:
Hi Mike
If we are unparsing an occurrence then yes we apply alignment properties before we output the content, even if the content is zero length. I emphasise "if we are unparsing an occurrence" because if an element is optional and there is no occurrence in the infoset we will not unparse anything, so the alignment is not relevant and not used.
Regards
Steve Hanson Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> *IBM Integration Bus* <http://www-03.ibm.com/software/products/en/ibm-integration-bus>, Hursley, UK *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 mob:+44-7717-378890
From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org> Date: 21/01/2016 22:37 Subject: [DFDL-WG] clarification needed - unparser - do we use mandatory text alignment for zero-length strings? Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> ------------------------------
Suppose field 1 is 3 bits long.
Suppose field 2 is a string in utf-8. dfdl:emptyValueDelimiterPolicy='none'
If field2 has any contents, then those characters must begin on a byte boundary, so before the first character we will skip 5 bits.
What if field2 is of length 0 ? Do we skip 5 bits anyway or not?
Note that if dfdl:emptyValueDelimiterPolicy is "both", and an initator or terminator is defined, then even if length is 0 we definitely will unparse a character at least so the 5 bits of alignment are needed in that case.
But if there is no framing, and no content, do we align to the text mandatory alignment or not?
Now, I believe the answer to this should be "yes" we skip the 5 bits anyway. Because if field 2 is lengthKind='pattern', then when parsing, we have to scan for a match to the pattern, and so we have to know where to start the scan, which has to be at a byte boundary. So even if there is no match, so the length at parse time is zero, we still had to skip 5 bits before we could start scanning.
So to be consistent with this, I believe unparsing must also output the 5 bits regardless of whether the string is length 0 or not.
So there's what should it do, but also if all the constructs for this work in IBM DFDL, I'm curious what it actually does.
Thanks
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *www.tresys.com* <http://www.tresys.com/> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the *OGF Intellectual Property Policy* <http://www.ogf.org/About/abt_policies.php> -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (1)
-
Mike Beckerle