Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848

----- Forwarded by Steve Hanson/UK/IBM on 16/08/2011 15:28 -----
From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve Hanson/UK/IBM@IBMGB
Date: 27/07/2011 15:00
Subject: Re: pattern based lengths - suggested revised language





I support what you call the conservative approach. I.e. require text when patterns are used.

On Jul 27, 2011 5:53 AM, "Steve Hanson" <smh@uk.ibm.com> wrote:
> Hi Mike
>
> I don't think we can reduce the wording that much. The second paragraph
> is needed because it covers the binary case, where encoding is not
> actually used.
>
> I think we either need to be conservative and disallow the combination of
> binary & pattern, or leave the second paragraph as-is and effectively say
> that if you binary with pattern then that is the behaviour.
>
> If we are to be conservative then:
>
> - For a simple element or simple type, disallow lengthKind="pattern" with
> binary rep.
>
> - For a complex element with lengthKind = "pattern", all children must
> have lengthUnits = "characters" (so text only) and the encoding of the
> children must be the same as the encoding of the parent. (We already have
> a similar rule for complex elements with specified length and lengthUnits
> = "characters").
> We also allow asserts and discriminators to carry patterns which are
> applied straight at the current position in the data stream. It would be
> difficult to police the conservative rules here. But we need to say what
> encoding is used and we currently do not. I would say it must be the
> encoding of the element or group that carries the assert/discriminator.
> I said on the call that we had extended DFDL regular expressions so that
> raw hex bytes could be specified. However I don't see any evidence of this
> in the DFDL spec. This facility was something we added to IBM MRM for a
> retail format called TLOG which consists of delimited packed decimal data
> with hex indicator bytes, so we needed a way to match the hex indicator
> bytes as part of the regexp. However, I think this was only necessary
> because MRM has neither speculation nor discriminators, and in a DFDL
> version of TLOG I would use a discriminator. So I think my statement was
> in error, and I don't believe raw hex in DFDL regexps is needed.
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, OGF DFDL Working Group
> IBM SWG, Hursley, UK
>
smh@uk.ibm.com
> tel:+44-1962-815848
>
>
>
> From:
> "Mike Beckerle" <
mbeckerle.dfdl@gmail.com>
> To:
> Steve Hanson/UK/IBM@IBMGB
> Date:
> 26/07/2011 17:30
> Subject:
> pattern based lengths - suggested revised language
>
>
>
> I suggest this language to tighten up this whole section (replace both
> paragraphs). Given the concerns of Tim, that we make sure DFDL
> implementations don’t have to reimplement regexp matching, I think this is
> sufficient.
> 1.1.1.1 Based Lengths - Scanability
> Any element (complex, simple text, simple binary) may have a
> dfdl:lengthKind 'pattern'. When an element contains binary data, and
> lengthKind=’pattern’ is used, then it is a schema definition error if the
> character set encoding is not iso-8859-1.
>
>
> (Possible generalization 1: allow other character sets, e.g., iso-8859-15
> as well. This is ok because 8859-15 still maps all 256 codepoints. But
> this is a slippery slope. )
>
> (Possible generalization 2: allow any character set, Ascii, ebcdic,
> utf-16be, etc. Note that using any character encoding other than one which
> maps a valid character to any 8-bit byte creates ambiguity: e.g, the
> regexp “.” is one where we normally think it means “any character”. But
> do we really mean “any byte” ? If the character set encoding doesn’t have
> a given byte as a codepoint, then this question really matters.)
>
>
>
>
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
>
>
>
>







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU