For discussion on today's call
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 16/10/2012 13:13 -----
From: Mike Beckerle
To: Steve Hanson/UK/IBM@IBMGB
Date: 02/10/2012 18:41
Subject: One email or a flock or... - re: 10.03 draft - open review
items
Steve,
I've got the issues below left after your review pass on 10.03, minus 2 I
send emails to you about separately.
Should I issue this email to the WG, or do you want me to decompose this
into separate emails, or do you just want to list these as agenda topics
for next call? I think it is good if people get to look at them in advance
of a call.
...mikeb
--------------------------------------------------------
This is a list of items left open after a review pass by SMH(on draft in
preparation r010.03).
These items need specific WG discussion on a call. They may be small
enough to resolve there, or may be escalated into action items. (A couple
issues already clearly action-item related are not listed here.)
Note: please Ignore the identifiers like SMH107 or m236 I'm tagging these
with. Those are just for me editing the text. (Those change ...grrr... if
someone inserts a comment into the document, so they're not good issue
identifiers).
1. SMH107 Spec says: When the separator and terminator on a group
have the same value, then at a point where either separator or terminator
could be found, the separator is tried first.
Issue is that this language still feels ambiguous. E.g., So it tries the
separator first, let’s say it finds it. Will a subsequent processing error
cause it to backtrack and revisit this and try the terminator? Or does
finding the separator confirm that it IS a separator, resolve forever that
point of uncertainty? I believe the latter is what was intended (delimiter
decisions drive parsing and are not revisited), but we need to state this
(or do we somewhere else already?)
2. SMH169 - Some numeric types are signed, others unsigned. Some
representations are sign-capable, some are not (BCD specifically). Right
now spec draft says you can't have bcd as rep for signed integer types
long, int, short, byte. But you CAN have bcd for rep of decimal, integer.
We could allow bcd only for nonNegativeInteger type, but there is no
nonNegativeDecimal type, so....how to resolve? I would suggest that we
simply allow bcd as rep for both signed and unsigned types, and it's a
processing error to unparse a negative value into bcd rep.
3. m229 - textStandardZeroRep - should this allow %ES; as one of the
list of possibles?
4. m236 - is V (virtual decimal point position) and also P allowed in
the textNumberPattern for double and float types?
5. m237 - Do we check that the various symbols used for infinity,
digits, grouping separators, decimal separators are properly distinct to
allow parsing? E.g., that the decimal separator and grouping separator
aren't the same, and that the positive and negative pattern variants are
distinguishable? ICU library supposedly doesn't do this checking. Do we
state this is an SDE in DFDL. If so then is this checking required? Can we
make it possible for implementations to not check somehow? Other grammar
ambiguity situations like separator and terminator being ambiguous are
specifically NOT checked for, because determining if a grammar is
ambiguous is hard or undecidable, and would have to be done at runtime
because delimiters can be run-time computed. Buf for the syntax components
of text numbers do we require checking or not?
6. m370 - multiple PoU resolutions: If you have initiatedContent, AND
a choiceBranchRef, AND a discriminator all on the same element, and there
are 3 enclosing nested PoU, which one controls which? Precedence is the
issue. Or..... do we really need to allow this? Why don’t we just disallow
this kind of piling-on of complexity and make the user choose which PoU
resolution technique they want?
7. m396 - is BCD representation a mandatory feature, or optional?
8. m398 - portability at risk if subset processors ignore properties
they don't implement. We relaxed this from a more rigid policy, and now
allow subsets to not validate properties they don't implement. However, is
there a better compromise, e.g., require a warning about all
unimplemented/unrecognized properties? E.g., dfdl:textBiDi='no' yields SDE
"unrecognized property 'textBiDi' with value 'no'.
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU