Only question I have is on "endOfParent" not being allowed on root element.

If you have a DFDL implementation processing message buffers, and the root element's content ends at the end of the buffer/end-of-data, how do we express that?

I expected that to be end-of-parent, the notion being that there's an implicit parent for all content, which has an end which is the true end-of-data.

...mikeb

On Mon, Jan 23, 2012 at 6:19 AM, Steve Hanson <smh@uk.ibm.com> wrote:
----- Forwarded by Steve Hanson/UK/IBM on 23/01/2012 11:16 -----

From:        Steve Hanson/UK/IBM
To:        Mike Beckerle <mbeckerle.dfdl@gmail.com>, Tim Kimber/UK/IBM
Date:        18/01/2012 16:26
Subject:        * DFDL Errata* Clarification:  Limitations on use of endOfParent



As agreed on WG extra call on 18th Jan.

Will be raised as a separate issue on next DFDL WG call.

Constraints on element lengthKind 'endOfParent'...

- element maxOccurs = 1
- no terminator on element
- if element is in a sequence
    - separatorPolicy of  sequence must not be 'postFix'
    - sequenceKind of sequence must be 'ordered'
    - no floating elements in the sequence
    - must be the 'last' in the sequence statically **
- if element is in a choice it is always 'last' statically **
- parent element lengthKind must not be 'implicit' or 'delimited'
- if element is complex then all possible 'last' elements ** must also be 'endOfParent'
- not sensitive to any in-scope markup
- not allowed on root element

** Need a concise description of walking the content of a complex element and building the list of 'last elements'. Involves factoring out local sequences and coping with choices.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848




From:        Steve Hanson/UK/IBM
To:        Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:        Tim Kimber/UK/IBM@IBMGB
Date:        08/11/2011 18:22
Subject:        Re: Coping with Character code U+0000 - and how to end-of-parent in an array



Hi Mike

For the record, I knew we said something about U+0000 - it's in section 5...

        String – In DFDL a string can contain any character codes. None are reserved. (Including the character with character code U+0000, which is not permitted in XML documents.)

After discussion with Sandy Gao, this is what we wrote in the DFDL to XDM mapping document:

Note: SimpleElement [datavalue] values may contain characters that are illegal in XML, for example, DFDL strings can contain the character code 0 (zero) within them, but XML does not allow this character code in any XML content even if it is represented as a character entity. Nevertheless, a DFDL described string is mapped to an XDM string data value.

and later for the actual mapping to XDM:

SimpleElement: If the value of [datavalue] is special value “nil”, then the empty string, otherwise the value of [datavalue] converted to its canonical lexical representation.


On to your examples (and assuming separatorPosition is 'infix')...

You are right about the first (endOfParent) example being odd. This example would work fine if the lengthKind was 'delimited'. Remember that the 'explicit' length of the parent element creates a box which scopes the delimited behaviour.  

The second (delimited) works fine.

endOfParent and delimited behave almost identically most of the time. When the element is an array, this looks to be one of the differences. I am thinking that endOfParent should not be allowed when maxOccurs > 1.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848




From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve Hanson/UK/IBM@IBMGB, Tim Kimber/UK/IBM@IBMGB
Date: 08/11/2011 17:02
Subject: Coping with Character code U+0000 - and how to end-of-parent in an array





We have this incompatiability with XML infoset around U+0000 aka NUL.

However, one can model data containing character code U+0000 in the content as an array of strings with NUL termination. That is, we split the string on the NUL characters so as to avoid putting them in the infoset.

So I tried to do this and ran into issues. E.g., if data contains a string of length 80, but inside it the character code 0 can appear, then this could be modeled as:

<element name="stringsWithNul" dfdl:lengthKind="fixed" dfdl:length="80">
  <sequence dfdl:separator="%x0000;">
  <element name="substring" type="string" maxOccurs="80" dfdl:lengthKind="endOfParent" />  <!-- ????  -->
  </sequence>
</element>

Problem: is that use of endOfParent length kind right? It's the last thing in the group, but the same element decl also describes the prior elements. If there are no NULs in the string, then endOfParent is exactly what you want. There will be only one substring, it will have all 80 characters. But if there are NULs in the middle, then you want the earlier array elements to be delimited by the sequence's separators, and only the last element to be delimited by endOfParent.

This semantics where the parent is providing the constraints on length, but sometimes its separator, just for the last thing it's endOfParent, is not something we can express I believe.

I was actually even unclear on this one: If the data 'string' has a terminator of ! then perhaps:

<element name="stringsWithNul" dfdl:lengthKind="delimited" dfdl:terminator="!">
  <sequence dfdl:separator="%x0000;">
  <element name="substring" type="string" maxOccurs="unbounded" dfdl:lengthKind="delimited"/> <!-- delimited entirely by ancestor/enclosing-specified delimiters. -->
  </sequence>
</element>

Is the array element delimited? Is that the right length kind for this situation?

Thanks for comments

...mikeb


...mikeb








Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU












Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU












Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU







--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg



--
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412