Re: [DFDL-WG] Action 260

25 Jun 2014

      I prefer choice (a) for two reasons

* It is more restrictive and therefore more conservative (preserving
freedom to change in future if needed)
* If a user has a positional data format, you don't want them to even have
to understand the concept of speculation in order to model their data. So
choice (a) allows a simpler description that doesn't need to introduce the
notion that the parser might be speculation.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>

On Wed, Jun 25, 2014 at 5:20 AM, Steve Hanson <smh@uk.ibm.com> wrote:
...
*260*
*Positional and non-positional sequences (All)*
10/6: Spec defines the above but also allows different occursCountKinds
within the same sequence which may have different (implied)
separatorSuppressionPolicy, which results in a sequence which is a mixture
of both. Should this be allowed? If so what are the rules? Can certain
combinations be disallowed?
17/6: IBM have discussed internally and will submit a proposal.
In the spec we define Positional Sequence and Non-Positional Sequence:
*Positional sequence - **Each occurrence in the sequence can be
identified by its position in the data. Typically the components of such a
sequence do not have an initiator. In some such sequences, the separators
for optional zero-length occurrences may or must be omitted when at the end
of the group. A positional sequence can be modelled by setting
dfdl:separatorSuppressionPolicy to 'never', 'trailingEmptyStrict'  or
'trailingEmpty'.*
*Non-positional sequence - **Occurrences in the sequence cannot be
identified by their position in the data alone. Typically the components of
such a sequence have an initiator. Such sequences allow the separator to be
omitted for optional zero-length occurrences anywhere in the sequence.
Speculative parsing is employed by the parser to identify each occurrence.
 A non-positional sequence can be modelled by setting
dfdl:separatorSuppressionPolicy to 'anyEmpty'. *
The problem is that the setting of dfdl:separatorSuppressionPolicy is only
examined for child elements with dfdl:occursCountKind 'implicit'.  For
other dfdl:occursCountKinds, there is the concept of an 'implied'
dfdl:separatorSuppressionPolicy:
*When dfdl:occursCountKind is 'fixed' then ... the implied behaviour is
'never'.*
*When dfdl:occursCountKind is 'expression' ... the implied behaviour is
'never'.*
*When dfdl:occursCountKind is 'parsed' ... the implied behaviour is
'anyEmpty'. *
*When dfdl:occursCountKind is 'stopValue' ...the implied behaviour is
'anyEmpty'. *
So if a Positional sequence as defined above contains children with
dfdl:occursCountKind 'parsed' or 'stopValue' then surely it is no longer a
Positional sequence.
A solution to this is to prevent the appearance of certain values of
dfdl:occursCountKind within a Positional sequence. However, precisely which
values to outlaw is subject to interpretation of the phrase "*Each
occurrence in the sequence can be identified by its position in the data*".
Is this intended to mean:
*a) an observer of the raw data can identify an occurrence of an element
in the sequence solely by counting separators *
=> SDE if 'parsed', 'stopValue' or 'expression' ** appeared in a
Positional sequence;
** Although 'expression' would appear to be like 'fixed' it actually
breaks a) so must be included in the SDE list.
or
*b) a parser does not have to speculate to identify an occurrence of an
element in the sequence*
=> SDE only if 'parsed' appeared in a Positional sequence.
Note that it is possible to wrap a 'parsed' etc element in a local
sequence or another element to avoid an SDE. But this could still be seen
as a violation of a) if the separators of both are the same, as the
observer can not count the separators. So should the rule be applied
recursively, ie, a Positional sequence can not contain a non-Positional
sequence unless the separators are different?
Regards
Steve Hanson
Architect, *IBM DFDL*
<http://www.ibm.com/developerworks/library/se-dfdl/index.html>
Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
IBM SWG, Hursley, UK
*smh@uk.ibm.com* <smh@uk.ibm.com>
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg@ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg