IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
DFDL-WG <dfdl-wg@ogf.org>
Date:
01/10/2018 20:31
Subject:
[DFDL-WG] Clarification
needed: sequence terminator that exists or not depending on expression
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
· ES must not appear as the only DFDL string literal in the property. It can only appear as a member of a list.
· Neither the ES entity nor the WSP* entity may appear on their own as one of the string literals in the list when the parser is determining the length of a component by scanning for delimiters.
The second bullet doesn't apply to my example.
Re: first bullet, I think my terminator expression is
illegal... because the '%ES;' is a list of literals containing ES as the
only DFDL string literal.
But this is a really flawed constraint, as "%ES;%ES;"
and "%ES; %ES;" both skirt the constraint, but mean the same
thing as just "%ES;" which is illegal.
So, if we don't want to allow these hack workarounds,
we need a statement that says runs of %ES; adjacent mean the same thing
as one %ES;, and that more than one identical-meaning delimiter specified
in a list of string literals means the same as just one. Or we can make
these hack workarounds illegal.
However, why are we disallowing these?
The above construct in my example is very useful, and
really hard to work around unless we can have a terminator that is '%ES;'
as the only string literal. Actually I have no work around for this
really. I am guessing I could come up with something, but the various things
I've guessed at don't pan out, or prevent the string named 'value' above
from being modeled as a simple type.
I know we don't want lengthKind='delimited' with terminator="%ES;"
as that is most likely just a schema-definition error, but when we're not
dealing with a lengthKind, we really do seem to need to specify situations
where conditionally the terminator region will be empty.
So I think we need to do:
1) clarify that %ES; cannot be used in combination with
any other character or entity as a member of a list of string literals.
1a) At the same time I would also disallow
combinations of WSP* that are misleading and unnecessary i.e., disallow
%WSP*; adjacent to any other WSP, WSP+, or WSP*.
2) clarify that the constraint that %ES; for terminator
and separator cannot appear as the only string literal in a list of string
literals... applies only when the parser is determining the length of a
component by scanning for delimiters. This is just rephrasing the two bullets
above so the clause about scanning applies to both, not just the second.
I believe this preserves the intent that when lengthKind="delimited"
and we are scanning for delimiters, there must be *some* delimiter that
is potentially not zero length. You still have to cope with the possible
match being zero length due to %ES; being in the list of terminating markup,
or WSP* similarly, with no whitespace found. But the notion that there
is NO scanning to be done can't happen. That is, the notion that the schema
specifies lengthKind delimited, but also specifies no delimiters at all,
is still ruled out.
Comments?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU