Fw: Action 148: pattern based lengths - suggested revised language
For discussion on today's call...
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 20/09/2011 11:01 -----
From:
Steve Hanson/UK/IBM
To:
Tim Kimber/UK/IBM@IBMGB, mbeckerle.dfdl@gmail.com,
Date:
20/09/2011 10:17
Subject:
Re: Action 148: pattern based lengths - suggested revised language
I'd like to discuss on the WG call today. I think the conservative
approach I outline below is consistent with what we do for complex
elements and specified lengths, and I'd prefer to stick with that for 1.0.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Tim Kimber/UK/IBM
To:
Steve Hanson/UK/IBM@IBMGB
Date:
25/08/2011 16:23
Subject:
Re: Action 148: pattern based lengths - suggested revised language
I think we can afford to be a little less conservative, actually. Let's
suppose that we allow patterns regardless of dfdl:representation and
regardless of the encoding. That will provide users with maximum
flexibility, at the ( not very large ) risk that they will occasionally do
something silly. We can put a note into the specification to the effect
that patterns should usually be used only with character data, but can (
with care ) be used to match bytes if that is the only way to achieve the
desired result. I may be missing something, but I don't see what harm we
can cause ourselves or our users by doing this.
My concern is that we could take away a lot of the power that patterns
provide ( particularly for discriminators / asserts ) and then end up
regretting it when some strange format pops up.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
From: Steve Hanson/UK/IBM
To: Tim Kimber/UK/IBM@IBMGB
Date: 25/08/2011 15:00
Subject: Action 148: pattern based lengths - suggested revised
language
Hi Tim
Please could you have a think about my conservative proposal below?
Firstly, can we get away with restricting patterns to text, or will we
need to use patterns to grab large amounts of data they may include binary
content?
Secondly, are we able to apply the same validation criteria to use of
testKind pattern on an assert or discriminator as we are to use of
lengthKind pattern? .
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 25/08/2011 14:44 -----
From:
Mike Beckerle
From: "Mike Beckerle"
To: Steve Hanson/UK/IBM@IBMGB Date: 26/07/2011 17:30 Subject: pattern based lengths - suggested revised language I suggest this language to tighten up this whole section (replace both paragraphs). Given the concerns of Tim, that we make sure DFDL implementations don’t have to reimplement regexp matching, I think this
sufficient. 1.1.1.1 Based Lengths - Scanability Any element (complex, simple text, simple binary) may have a dfdl:lengthKind 'pattern'. When an element contains binary data, and lengthKind=’pattern’ is used, then it is a schema definition error if
is the
character set encoding is not iso-8859-1.
(Possible generalization 1: allow other character sets, e.g., iso-8859-15 as well. This is ok because 8859-15 still maps all 256 codepoints. But this is a slippery slope. )
(Possible generalization 2: allow any character set, Ascii, ebcdic, utf-16be, etc. Note that using any character encoding other than one which maps a valid character to any 8-bit byte creates ambiguity: e.g, the regexp “.” is one where we normally think it means “any character”. But do we really mean “any byte” ? If the character set encoding doesn’t have a given byte as a codepoint, then this question really matters.)
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number
741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (1)
-
Steve Hanson