Re: [DFDL-WG] Backtracing behavior for optional elements

12 Feb 2013

      Glad that we are all in agreement. 

The particular use case that motivated me to write down the rules is the 
IBM 4690 TLOG format. This is a binary, separated, positional format so it 
is modeled with dfdl:separatorPolicy 'trailingEmptyLax'. However I 
encountered two records that each contained a pair of unbounded arrays. 
The end of the first is indicated by the next field having value x95 - 
x99, the end of the second is encountering the end of record delimiter. 
The separator does not change throughout. I was struggling to model this 
correctly until Tim pointed out that although the rest of the record was 
positional, the arrays were non-positional. Wrapping the arrays in a 
sequence with dfdl:separatorPolicy 'anyEmpty' solved the problem. 

I think this would be a good subject for a DFDL tutorial lesson.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From:   Tim Kimber/UK/IBM@IBMGB
To:     dfdl-wg@ogf.org, 
Date:   11/02/2013 19:21
Subject:        Re: [DFDL-WG] Backtracing behavior for optional elements
Sent by:        dfdl-wg-bounces@ogf.org

I agree with all of Steve's description, and all of Mike's response. And I 
still think that in an ideal world we would include in the specification a 
set of grammars that describe the various 'styles' of group, including 
groups with no separator, positional separators and non-positional 
separators. 

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert@uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

From:        Mike Beckerle <mbeckerle.dfdl@gmail.com> 
To:        Steve Hanson/UK/IBM@IBMGB, 
Cc:        dfdl-wg@ogf.org 
Date:        11/02/2013 18:16 
Subject:        Re: [DFDL-WG] Backtracing behavior for optional elements 
Sent by:        dfdl-wg-bounces@ogf.org 

This sounds right.

Let me run an array scenario past you. Tell me if you think I am 
interpreting consistently with your rules.

What you've said here is that we distinguish positional and non-positional 
separators. They are very different.

Positional separators are greedy and drive the parser decision. Once 
matched, they no longer tolerate failure to parse. So, if I have an array 
with occursCountKind='parsed', then finding a positional separator means I 
am NOT at the end of the physical array. I will have syntax for one more 
element to be parsed successfully, though I may suppress its value being 
added to the infoset if it is optional and I get the appropriate empty 
representation after the separator. Failure means the array is broken. 
Success means I will look for yet another element (because this is ock 
parsed). 

The above makes sense to me. This is what 'separators' means to me for the 
most part, that they are a driving part of the syntax/format.

The non-positional separators case is 100% different. 

In that case, the decision that a separator was found is revisited on 
failure. An ock='parsed' array/optional will be ended. The thing after it 
in the sequence will be attempted next. 

This makes sense, I almost wish we didn't have to call it 'separator', but 
I think it is a useful behavior certainly, and the right interpretation of 
the properties we have in the spec and 140 stuff today.

On Mon, Feb 11, 2013 at 9:56 AM, Steve Hanson <smh@uk.ibm.com> wrote: 
If a processing error occurs for an optional element in a sequence, the 
speculative behaviour of the DFDL parser says that the optional element is 
assumed not to be present, and the next alternative in the sequence is 
tried. That is fine when there are no separators involved, but we need to 
clear on what happens when there are separators. 

1) Positional separators (separatorSuppressionPolicy is 'never', 
'trailingEmpty' and 'trailingEmptyStrict'). 
The key point about positional separators is that they are expected in the 
data, so if an error occurs while parsing the optional element, it does 
not make sense to backtrack to the start offset the element and try to 
match the next element. Yes there's a point of uncertainty in the sense 
that the element is either there or it has empty representation, but if an 
error occurs I think it must be treated as a hard error, and not cause 
backtracking. 

2) Non-positional separators (separatorSuppressionPolicy is 'anyEmpty'). 
This behaves like the non-separator case and the next alternative in the 
sequence is tried from the start offset. However, because 'anyEmpty' 
behavior is lax, it is possible that the next thing in the data is a 
separator, so the parser must cater for that when the element is found to 
have empty representation. But if an error occurs establishing 
representation, I think the parser should just backtrack and try to match 
the next element. 

Does that sound correct? 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
  dfdl-wg mailing list
  dfdl-wg@ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg 

-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg@ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU