Bug taken: https://opensource.ncsa.illinois.edu/jira/browse/DFDL-1443


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy


On Fri, Nov 6, 2015 at 1:00 PM, Steve Hanson <smh@uk.ibm.com> wrote:
Mike

I think that is a bug in Daffodil. The DFDL spec says that escapeSchemeRef applies to simple types with text representation, so Daffodil is evaluating the escape scheme properties prematurely. If you notice in the UNA declaration, the simple elements at the start of the UNA that carry the delimiters and escape character all have dfdl:escapeSchemeRef="" to avoid tripping the check.

Regards
 
Steve Hanson
Architect,
IBM DFDL
Co-Chair,
OGF DFDL Working Group
IBM Integration Bus, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890




From:        Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:        "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>
Date:        04/11/2015 17:17
Subject:        [DFDL-WG] EDIFACT schema - daffodil bug or non-bug
Sent by:        "dfdl-wg" <dfdl-wg-bounces@ogf.org>





I'm trying to get EDIFACT working on Daffodil.

I have a somewhat interesting chicken-egg problem.

This schema uses dfdl:escapeCharacter and dfdl:escapeEscapeCharacter as expressions. E.g., there is a top-level dfdl:defineVariable named "EscapeChar" which has a default value, and the expression for the dfdl:escapeCharacter property is { $ibmEdiFmt:EscapeChar }.

The default format that is in effect for the root element has dfdl:lengthKind='delimited'.

When daffodil starts parsing the top level root/document element, it enters a parser that is for delimited elements with an escape-scheme in effect. First thing this parser does is get the escape scheme which evaluates the expressions for escapeCharacter and escapeEscapeCharacter. This picks up the default values for those variables and the variables are then set as "already evaluated", as DFDL specifies that once a variable's default value has been used, it cannot be subsequently set via dfdl:setVariable.

Now, when the very first UNA is encountered, that reads the various delimiters/escapes from the data, and tries to set the variables.

But the variables have already been evaluated, on the way into parsing the "delimited" top level element, and the UNA element itself similarly.

So it fails with a runtime SDE - default value has already been used.

So the questions:

Is this a schema bug in the EDIFACT schema, or is there a principle at work here indicating that Daffodil cannot evaluate the escape scheme on entry to an element of length kind delimited unless delimiters are actually defined?

It gets worse though. How late bound does this have to be? I can imagine it being so late as to be after the last child element/group has been parsed, when the parser unwinds the stack back up to the complex-type element's tier, and only at that point, when it scans for the terminating markup, would it then force the evaluation of the escape scheme. But that seems difficult to implement. However, that would allow the delimiter for the complex-type element to actually be stored within the children of that same complex type element. But is this needed?

One could argue that the EDIFACT schema should have dfdl:lengthKind='implicit' on these global elements down until the UNA has been parsed. Though I think that makes authoring schemas harder because a user thinks of edifact stuff as "a delimited format", and is naturally just going to want to stick dfdl:lengthKind="delimited" at global scope for all the schema components.

Thoughts?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy
--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 
https://www.ogf.org/mailman/listinfo/dfdl-wg