I agree with all of Steve's description,
and all of Mike's response. And I still think that in an ideal world we
would include in the specification a set of grammars that describe the
various 'styles' of group, including groups with no separator, positional
separators and non-positional separators.
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB,
Cc:
dfdl-wg@ogf.org
Date:
11/02/2013 18:16
Subject:
Re: [DFDL-WG]
Backtracing behavior for optional elements
Sent by:
dfdl-wg-bounces@ogf.org
This sounds right.
Let me run an array scenario past you. Tell me if you think I am interpreting
consistently with your rules.
What you've said here is that we distinguish positional and non-positional
separators. They are very different.
Positional separators are greedy and drive the parser decision. Once matched,
they no longer tolerate failure to parse. So, if I have an array with occursCountKind='parsed',
then finding a positional separator means I am NOT at the end of the physical
array. I will have syntax for one more element to be parsed successfully,
though I may suppress its value being added to the infoset if it is optional
and I get the appropriate empty representation after the separator. Failure
means the array is broken. Success means I will look for yet another element
(because this is ock parsed).
The above makes sense to me. This is what 'separators' means to me for
the most part, that they are a driving part of the syntax/format.
The non-positional separators case is 100% different.
In that case, the decision that a separator was found is revisited on failure.
An ock='parsed' array/optional will be ended. The thing after it in the
sequence will be attempted next.
This makes sense, I almost wish we didn't have to call it 'separator',
but I think it is a useful behavior certainly, and the right interpretation
of the properties we have in the spec and 140 stuff today.
On Mon, Feb 11, 2013 at 9:56 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
If a processing error occurs for an
optional element in a sequence, the speculative behaviour of the DFDL parser
says that the optional element is assumed not to be present, and the next
alternative in the sequence is tried. That is fine when there are no separators
involved, but we need to clear on what happens when there are separators.
1) Positional separators (separatorSuppressionPolicy is 'never', 'trailingEmpty'
and 'trailingEmptyStrict').
The key point about positional separators is that they are expected in
the data, so if an error occurs while parsing the optional element, it
does not make sense to backtrack to the start offset the element and try
to match the next element. Yes there's a point of uncertainty in the sense
that the element is either there or it has empty representation, but if
an error occurs I think it must be treated as a hard error, and not cause
backtracking.
2) Non-positional separators (separatorSuppressionPolicy is 'anyEmpty').
This behaves like the non-separator case and the next alternative in the
sequence is tried from the start offset. However, because 'anyEmpty' behavior
is lax, it is possible that the next thing in the data is a separator,
so the parser must cater for that when the element is found to have empty
representation. But if an error occurs establishing representation, I think
the parser should just backtrack and try to match the next element.
Does that sound correct?
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU