I've updated my comments based on yesterday's
call and Mike's mail.
Suman - please can you help with item
4 below ?
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB,
Cc:
dfdl-wg@ogf.org, Andrew
Edwards/UK/IBM@IBMGB
Date:
22/10/2012 17:29
Subject:
Re: [DFDL-WG]
One email or a flock or... - re: 10.03 draft - open review items - update
Suggest wording for issue 1 below. Change in italics:
When the separator and terminator on a group have the same value, then
at a point where either separator or terminator could be found, the separator
is tried first. (Speculative execution may try the terminator subsequently.)
On Wed, Oct 17, 2012 at 6:56 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
I have added some comments
in-line to reflect the WG call on Tuesday, we will continue on Friday.
(Andy - please see 5 below)
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: Steve
Hanson/UK/IBM@IBMGB
Date: 02/10/2012
18:41
Subject: One
email or a flock or... - re: 10.03 draft - open review items
Steve,
I've got the issues below left after your review pass on 10.03, minus 2
I send emails to you about separately.
Should I issue this email to the WG, or do you want me to decompose this
into separate emails, or do you just want to list these as agenda topics
for next call? I think it is good if people get to look at them in advance
of a call.
...mikeb
--------------------------------------------------------
This is a list of items left open after a review pass by SMH(on draft in
preparation r010.03).
These items need specific WG discussion on a call. They may be small enough
to resolve there, or may be escalated into action items. (A couple issues
already clearly action-item related are not listed here.)
Note: please Ignore the identifiers like SMH107 or m236 I'm tagging these
with. Those are just for me editing the text. (Those change ...grrr...
if someone inserts a comment into the document, so they're not good issue
identifiers).
1. SMH107 Spec says:
When the separator and terminator on a group have the same value, then
at a point where either separator or terminator could be found, the separator
is tried first.
- Issue is that this language still feels ambiguous. E.g.,
So it tries the separator first, let’s say it finds it. Will a subsequent
processing error cause it to backtrack and revisit this and try the terminator?
Or does finding the separator confirm that it IS a separator, resolve forever
that point of uncertainty? I believe the latter is what was intended (delimiter
decisions drive parsing and are not revisited), but we need to state this
(or do we somewhere else already?)
SMH:
Mike will add the following after
the sentence in question "(Speculative execution may try the terminator
subsequently.)". Agreed that encountering a separator does
not resolve a point of uncertainty.
2. SMH169 - Some numeric
types are signed, others unsigned. Some representations are sign-capable,
some are not (BCD specifically). Right now spec draft says you can't have
bcd as rep for signed integer types long, int, short, byte. But you CAN
have bcd for rep of decimal, integer. We could allow bcd only for nonNegativeInteger
type, but there is no nonNegativeDecimal type, so....how to resolve? I
would suggest that we simply allow bcd as rep for both signed and unsigned
types, and it's a processing error to unparse a negative value into bcd
rep.
SMH: Noted that for a decimal, property
decimalSigned is used to indicate whether the logical value is signed or
not. So we could disallow BCD for integer and for decimal when decimalSigned
is 'yes'. Interestingly, section 3.7.1 states "Signed
numbers with dfdl:binaryNumberRep 'bcd' are always positive. On unparsing
it is a processing error if the data is negative."
which is admitting that BCD can be used with signed types. IBM DFDL currently
implements the table in the description of binaryNumberRep, and so allows
BCD for integer and decimal regardless of decimalSigned, but does not allow
long, short, int, byte. Agreed
to leave rules as they are today.
3. m229 - textStandardZeroRep
- should this allow %ES; as one of the list of possibles?
SMH: Decided not to allow %ES; because it adds some complexity to the 'empty
representation' processing rules, in the same way that xs:string and xs:hexBinary
do. Can always make an element required and use default of 0.
4. m236 - is V (virtual
decimal point position) and also P allowed in the textNumberPattern for
double and float types?
SMH: Post-call investigation: Errata 2.80 says they are allowed, but not
in conjunction with E, @ and * symbols. This is reflected in BNF as subpattern
:= prefix? ((number exponent?) | vpinteger) suffix?.
This may be too restrictive.
COBOL supports 'external floating point' numbers which contain +/-, E and
. characters, and so are DFDL text standard floats. See http://publib.boulder.ibm.com/infocenter/comphelp/v7v91/index.jsp?topic=%2Fcom.ibm.aix.cbl.doc%2Fcpari09.htm
and its child topic for some examples of these floats. Note that
one example shows use of V symbol. Also experimentation with the IBM DFDL
COBOL importer shows that both V & P symbols in the PIC clause are
allowed. This suggest that the BNF should be revised. It be should be noted
though that IBM DFDL COBOL imports these floats as xs:string and makes
no attempt to convert to a number. Need to ask Suman why this is.
5. m237 - Do we check that
the various symbols used for infinity, digits, grouping separators, decimal
separators are properly distinct to allow parsing? E.g., that the decimal
separator and grouping separator aren't the same, and that the positive
and negative pattern variants are distinguishable? ICU library supposedly
doesn't do this checking. Do we state this is an SDE in DFDL. If so then
is this checking required? Can we make it possible for implementations
to not check somehow? Other grammar ambiguity situations like separator
and terminator being ambiguous are specifically NOT checked for, because
determining if a grammar is ambiguous is hard or undecidable, and would
have to be done at runtime because delimiters can be run-time computed.
Buf for the syntax components of text numbers do we require checking or
not?
SMH: Post-call investigation: IBM DFDL gives an error if the decimal &
grouping separators are the same, but does not check any of the other characters
for uniqueness. (Andy - please
can you check what ICU does if you set various of the text number characters
to be the same value, eg, decimal sep, grouping sep, exponent and (for
floats) Nan and Inf reps ?, in both strict & lax modes)
ICU does not appear to check
when setting the values for symbols if they overlap. It has precedence
rules when matching (eg, match NaN before Inf, match decimal separator
before grouping separator). It gives parse errors if the data does not
work with its rules. To avoid having to understand these rules, it was
agreed that decimal separator, group separator, exponent rep, Inf, NaN
and zero rep must all be distinct, schema definition error otherwise. Noted
that if decimal separator, group separator and exponent rep are expressions,
this checking must be deferred until parsing/unparsing.
6. m370 - multiple PoU resolutions:
If you have initiatedContent, AND a choiceBranchRef, AND a discriminator
all on the same element, and there are 3 enclosing nested PoU, which one
controls which? Precedence is the issue. Or..... do we really need to allow
this? Why don’t we just disallow this kind of piling-on of complexity
and make the user choose which PoU resolution technique they want?
SMH: a) initiatedContent 'yes' on a choice/sequence,
and a discriminator on a child of the choice/sequence. Allowed at the moment,
and it is quite possible that users of IBM DFDL have this combination.
Agreed not to make this an SDE
(it's a difficult check to get right anyway due to the nature of discriminator
placement), nor to issue a warning either.
b) initiatedContent 'yes' and choiceBranchRef together on a choice. Yes
one is redundant. Agreed to make
the combination an SDE. Noted that might force users to explictly set initiatedContent
to 'no' if initiatedContent 'yes' in scope.
c) choiceBranchRef on choice and a discriminator on a child of the choice.
Agreed not to make this an SDE
(same reasoning as a) above), nor to issue a warning either.
7. m396 - is BCD representation
a mandatory feature, or optional?
SMH: BCD calendars and BCD numbers are independent optional features. Will
update errata document to make this clear.
8. m398 - portability at
risk if subset processors ignore properties they don't implement. We relaxed
this from a more rigid policy, and now allow subsets to not validate properties
they don't implement. However, is there a better compromise, e.g., require
a warning about all unimplemented/unrecognized properties? E.g., dfdl:textBiDi='no'
yields SDE "unrecognized property 'textBiDi' with value 'no'.
SMH: Agreed that this had been relaxed too much and a warning MUST be issued
by implementations.
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU