Thanks for reviewing.

1. Let's drop tables 2 and 4 and replace with refs to the appendix, as suggested
2. Agreed
3. Good point. I think the intention of %ES; was that it should be used on its own. I don't see any point in allowing it to be a part of a non-zero-length DFDL string literal. So I think your modification to the grammar should be put into the spec.

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742

From: Steve Hanson/UK/IBM
To: Mike Beckerle <mbeckerle.dfdl@gmail.com>,
Cc: Tim Kimber/UK/IBM@IBMGB
Date: 02/09/2013 17:46
Subject: Re: First draft of appendix describing string literal matching

Good description. My comments:

1) Apart from the first three rows, the grammar table is pretty much duplicating existing tables 2 and 4 in section 6.3.1. Suggest either that the table is dropped from the appendix and anything that is missing is added back into 6.3.1, or tables 2 and 4 are dropped and replaced by refs to appendix. I think the latter is preferable as everything is then in a single table.

2) There is a bug in the grammar for DfdlStringLiteral - there should not be '{' and '}' - that's expression syntax.

3) For recognising ES, you say "The string part is recognized if the data available for matching is zero-length". That's true if we insist that ES, if present, must be present on its own. I'm not sure we actually say that. If that is the intent, we should police this in the grammar. (Note IBM DFDL does not give an error if it find '%ES;abc' ).

For 2) and 3) that would give:

`DfdlStringLiteral`	`::=`	`(DfdlStringLiteralPart)+ \|`DfdlESEntity

DfdlCharClassName ::= DfdlNLEntity | DfdlWSPEntity | DfdlWSPStarEntity | DfdlWSPPlusEntity

It still needs an errata, as it is a change to the spec document.

Needs references from 6.3.1.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: Tim Kimber/UK/IBM@IBMGB, Steve Hanson/UK/IBM@IBMGB,
Date: 30/08/2013 00:16
Subject: Re: First draft of appendix describing string literal matching

I added this in current form as appendix D.

Will be in draft r14.4.

I did not create an erratum for this. It's a whole new section, not an error correction or clarificatino. But we can add one if we think it useful to point out this section.

There are no cross references to this section currently in the document. We might find a few places we want to reference this from.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy

On Wed, Aug 28, 2013 at 10:43 AM, Tim Kimber <KIMBERT@uk.ibm.com> wrote:
Thanks Mike.

I agree that the wording could be misinterpreted. Revised draft attached:

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742

From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: Tim Kimber/UK/IBM@IBMGB,
Cc: Steve Hanson/UK/IBM@IBMGB
Date: 20/08/2013 17:33
Subject: Re: First draft of appendix describing string literal matching

I'm not sure I agree with the algorithm in the 1.3 section for the string literal part "LiteralString".

I believe this algorithm is independent of what encoding the schema itself is written in, i.e., what is on the <? xml encoding="..." ?> slug line at the top of the schema file.

What you write in the schema file is read into memory, all characters are converted to unicode codepoints by way of that reading process.

So these two statements in the Recognition Algorithm for LiteralString are of concern:

"The characters in the DFDL schema will be encoded using the defined encoding for the schema in which they appear."

I think this just muddies the waters. Elsewhere we should state that the encoding used when authoring a DFDL schema file does not affect the behavior of the schema. All schemas behave as if authored in utf-8, etc.

"The recognition algorithm must be able to compare character sequences that are encoded using different encodings."

To me that says if I write my schema in ebcdic, but the dfdl:encoding="ascii", that some algorithm other than mapping both into unicode codepoints first and then comparing them is needed. I don't think this is or should be true.

I think the division of things into what you call string literal parts is needed due to raw byte, and due to character class entities. Outside of that I think translation of everything to unicode should be sufficient.

...mike

On Thu, Aug 15, 2013 at 7:19 PM, Tim Kimber <KIMBERT@uk.ibm.com> wrote:
Steve, Mike,

Please take a look. Comments on high-level stuff like structure/level of detail are welcome.

regards

Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU