1. Let's drop tables 2 and 4 and replace
with refs to the appendix, as suggested
2. Agreed
3. Good point. I think the intention
of %ES; was that it should be used on its own. I don't see any point in
allowing it to be a part of a non-zero-length DFDL string literal. So I
think your modification to the grammar should be put into the spec.
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>,
Cc:
Tim Kimber/UK/IBM@IBMGB
Date:
02/09/2013 17:46
Subject:
Re: First draft
of appendix describing string literal matching
Good description. My comments:
1) Apart from the first three rows,
the grammar table is pretty much duplicating existing tables 2 and 4 in
section 6.3.1. Suggest either that the table is dropped from the appendix
and anything that is missing is added back into 6.3.1, or tables 2 and
4 are dropped and replaced by refs to appendix. I think the latter is preferable
as everything is then in a single table.
2) There is a bug in the grammar for
DfdlStringLiteral - there should not be '{' and '}' - that's expression
syntax.
3) For recognising ES, you say "The
string part is recognized if the data available for matching is zero-length".
That's true if we insist that ES, if present, must be present on
its own. I'm not sure we actually say that. If that is the intent, we should
police this in the grammar. (Note IBM DFDL does not give an error if it
find '%ES;abc' ).
It still needs an errata, as it is a
change to the spec document.
Needs references from 6.3.1.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Tim Kimber/UK/IBM@IBMGB,
Steve Hanson/UK/IBM@IBMGB,
Date:
30/08/2013 00:16
Subject:
Re: First draft
of appendix describing string literal matching
I added this in current form as appendix D.
Will be in draft r14.4.
I did not create an erratum for this. It's a whole new
section, not an error correction or clarificatino. But we can add one if
we think it useful to point out this section.
There are no cross references to this section currently
in the document. We might find a few places we want to reference
this from.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
On Wed, Aug 28, 2013 at 10:43 AM, Tim Kimber <KIMBERT@uk.ibm.com>
wrote:
Thanks Mike.
I agree that the wording could be misinterpreted. Revised draft attached:
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Tim
Kimber/UK/IBM@IBMGB,
Cc: Steve
Hanson/UK/IBM@IBMGB
Date: 20/08/2013
17:33
Subject: Re:
First draft of appendix describing string literal matching
I'm not sure I agree with the algorithm in the 1.3 section for the string
literal part "LiteralString".
I believe this algorithm is independent of what encoding the schema itself
is written in, i.e., what is on the <? xml encoding="..."
?> slug line at the top of the schema file.
What you write in the schema file is read into memory, all characters are
converted to unicode codepoints by way of that reading process.
So these two statements in the Recognition Algorithm for LiteralString
are of concern:
"The characters in the DFDL schema will be encoded using the defined
encoding for the schema in which they appear."
I think this just muddies the waters. Elsewhere we should state that the
encoding used when authoring a DFDL schema file does not affect the behavior
of the schema. All schemas behave as if authored in utf-8, etc.
"The recognition algorithm must be able to compare character sequences
that are encoded using different encodings."
To me that says if I write my schema in ebcdic, but the dfdl:encoding="ascii",
that some algorithm other than mapping both into unicode codepoints first
and then comparing them is needed. I don't think this is or should be true.
I think the division of things into what you call string literal parts
is needed due to raw byte, and due to character class entities. Outside
of that I think translation of everything to unicode should be sufficient.
...mike
On Thu, Aug 15, 2013 at 7:19 PM, Tim Kimber <KIMBERT@uk.ibm.com>
wrote:
Steve, Mike,
Please take a look. Comments on high-level stuff like structure/level of
detail are welcome.
regards
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU