Here are the updates to the restrictions
that we discussed on the call. Action 165 raised.
Constraints on element lengthKind 'endOfParent'...
- element maxOccurs = 1
- no terminator on element
- no trailingSkip on element
- if element is in a sequence
- sequence must be the
content of a complex type
- element must be the last object in the sequence
- separatorPosition of sequence must not be 'postFix'
- sequenceKind of sequence must be 'ordered'
- no terminator on sequence
- no trailingSkip on sequence
- no floating elements in the sequence
- if element is in a choice
- if choiceLengthKind
is 'implicit'
- choice
must be the content of a complex type
- no terminator on choice
- no trailingSkip
on choice
- parent element lengthKind must not be 'implicit' or 'delimited'
- not sensitive to any in-scope markup
Notes:
- complex element can have 'endOfParent',
& its last child element can be any lengthKind including 'endOfParent'
- element must be the last thing in
its box
- a box is defined as a portion of the
data stream that has an established length prior to the parsing of its
children
- element with lengthKind
'explicit', 'prefixed', 'pattern' & no sequence/choice right framing
- choice with choiceLengthKind
'explicit'
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 31/01/2012 16:54 -----
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
dfdl-wg@ogf.org
Date:
23/01/2012 17:53
Subject:
Re: [DFDL-WG]
Fw: * DFDL Errata* Clarification: Limitations on use of endOfParent
I have typically used 'implicit' for
this because the children of the root element should match the data leaving
no bytes unconsumed. Alternatively using 'delimited' should be equivalent,
you are never actually scanning. However when the last child is endOfParent
according to the rules below I can't use 'implicit' or 'delimited'....so
in that scenario I am stuck. Need a rethink.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB
Cc:
dfdl-wg@ogf.org
Date:
23/01/2012 17:00
Subject:
Re: [DFDL-WG]
Fw: * DFDL Errata* Clarification: Limitations on use of endOfParent
Only question I have is on "endOfParent" not
being allowed on root element.
If you have a DFDL implementation processing message buffers, and the root
element's content ends at the end of the buffer/end-of-data, how do we
express that?
I expected that to be end-of-parent, the notion being that there's an implicit
parent for all content, which has an end which is the true end-of-data.
...mikeb
On Mon, Jan 23, 2012 at 6:19 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
----- Forwarded by Steve
Hanson/UK/IBM on 23/01/2012 11:16 -----
From: Steve
Hanson/UK/IBM
To: Mike
Beckerle <mbeckerle.dfdl@gmail.com>,
Tim Kimber/UK/IBM
Date: 18/01/2012
16:26
Subject: *
DFDL Errata* Clarification: Limitations on use of endOfParent
As agreed on WG extra call on 18th Jan.
Will be raised as a separate issue on next DFDL WG call.
Constraints on element lengthKind 'endOfParent'...
- element maxOccurs = 1
- no terminator on element
- if element is in a sequence
- separatorPolicy of sequence must not be 'postFix'
- sequenceKind of sequence must be 'ordered'
- no floating elements in the sequence
- must be the 'last' in the sequence statically **
- if element is in a choice it is always 'last' statically **
- parent element lengthKind must not be 'implicit' or 'delimited'
- if element is complex then all possible 'last' elements ** must also
be 'endOfParent'
- not sensitive to any in-scope markup
- not allowed on root element
** Need a concise description of walking the content of a complex element
and building the list of 'last elements'. Involves factoring out local
sequences and coping with choices.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Steve
Hanson/UK/IBM
To: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
Cc: Tim
Kimber/UK/IBM@IBMGB
Date: 08/11/2011
18:22
Subject: Re:
Coping with Character code U+0000 - and how to end-of-parent in an array
Hi Mike
For the record, I knew we said something about U+0000 - it's in section
5...
String – In DFDL a string can contain any
character codes. None are reserved. (Including the character with character
code U+0000, which is not permitted in XML documents.)
After discussion with Sandy Gao, this is what we wrote in the DFDL to XDM
mapping document:
Note: SimpleElement [datavalue] values
may contain characters that are illegal in XML, for example, DFDL strings
can contain the character code 0 (zero) within them, but XML does not allow
this character code in any XML content even if it is represented as a character
entity. Nevertheless, a DFDL described string is mapped to an XDM string
data value.
and later for the actual mapping to XDM:
SimpleElement: If the value of [datavalue]
is special value “nil”, then the empty string, otherwise the value
of [datavalue] converted to its canonical lexical representation.
On to your examples (and assuming separatorPosition is 'infix')...
You are right about the first (endOfParent) example being odd. This example
would work fine if the lengthKind was 'delimited'. Remember that the 'explicit'
length of the parent element creates a box which scopes the delimited behaviour.
The second (delimited) works fine.
endOfParent and delimited behave almost identically most of the time. When
the element is an array, this looks to be one of the differences. I am
thinking that endOfParent should not be allowed when maxOccurs > 1.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
| Mike Beckerle <mbeckerle.dfdl@gmail.com>
|
To:
| Steve Hanson/UK/IBM@IBMGB, Tim Kimber/UK/IBM@IBMGB
|
Date:
| 08/11/2011 17:02
|
Subject:
| Coping with Character code U+0000 -
and how to end-of-parent in an array |
We have this incompatiability with XML infoset around U+0000 aka NUL.
However, one can model data containing character code U+0000 in the content
as an array of strings with NUL termination. That is, we split the string
on the NUL characters so as to avoid putting them in the infoset.
So I tried to do this and ran into issues. E.g., if data contains a string
of length 80, but inside it the character code 0 can appear, then this
could be modeled as:
<element name="stringsWithNul" dfdl:lengthKind="fixed"
dfdl:length="80">
<sequence dfdl:separator="%x0000;">
<element name="substring" type="string" maxOccurs="80"
dfdl:lengthKind="endOfParent" /> <!-- ????
-->
</sequence>
</element>
Problem: is that use of endOfParent length kind right? It's the last thing
in the group, but the same element decl also describes the prior elements.
If there are no NULs in the string, then endOfParent is exactly what you
want. There will be only one substring, it will have all 80 characters.
But if there are NULs in the middle, then you want the earlier array elements
to be delimited by the sequence's separators, and only the last element
to be delimited by endOfParent.
This semantics where the parent is providing the constraints on length,
but sometimes its separator, just for the last thing it's endOfParent,
is not something we can express I believe.
I was actually even unclear on this one: If the data 'string' has a terminator
of ! then perhaps:
<element name="stringsWithNul" dfdl:lengthKind="delimited"
dfdl:terminator="!">
<sequence dfdl:separator="%x0000;">
<element name="substring" type="string" maxOccurs="unbounded"
dfdl:lengthKind="delimited"/> <!-- delimited entirely by
ancestor/enclosing-specified delimiters. -->
</sequence>
</element>
Is the array element delimited? Is that the right length kind for this
situation?
Thanks for comments
...mikeb
...mikeb
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU