I have added some comments
in-line to reflect the WG call on Tuesday, we will continue on Friday.
(Andy - please see 5 below)
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB
Date:
02/10/2012 18:41
Subject:
One email or
a flock or... - re: 10.03 draft - open review items
Steve,
I've got the issues below left after your review pass on 10.03, minus 2
I send emails to you about separately.
Should I issue this email to the WG, or do you want me to decompose this
into separate emails, or do you just want to list these as agenda topics
for next call? I think it is good if people get to look at them in advance
of a call.
...mikeb
--------------------------------------------------------
This is a list of items left open after a review pass by SMH(on draft in
preparation r010.03).
These items need specific WG discussion on a call. They may be small enough
to resolve there, or may be escalated into action items. (A couple issues
already clearly action-item related are not listed here.)
Note: please Ignore the identifiers like SMH107 or m236 I'm tagging these
with. Those are just for me editing the text. (Those change ...grrr...
if someone inserts a comment into the document, so they're not good issue
identifiers).
1. SMH107
Spec says: When the separator and terminator on a group have the
same value, then at a point where either separator or terminator could
be found, the separator is tried first.
- Issue is that this language still feels ambiguous. E.g.,
So it tries the separator first, let’s say it finds it. Will a subsequent
processing error cause it to backtrack and revisit this and try the terminator?
Or does finding the separator confirm that it IS a separator, resolve forever
that point of uncertainty? I believe the latter is what was intended (delimiter
decisions drive parsing and are not revisited), but we need to state this
(or do we somewhere else already?)
SMH:
Would like to discuss this when Tim is present, as need to see what IBM
DFDL does with separators when backtracking.
2. SMH169
- Some numeric types are signed, others unsigned. Some representations
are sign-capable, some are not (BCD specifically). Right now spec draft
says you can't have bcd as rep for signed integer types long, int, short,
byte. But you CAN have bcd for rep of decimal, integer. We could allow
bcd only for nonNegativeInteger type, but there is no nonNegativeDecimal
type, so....how to resolve? I would suggest that we simply allow bcd as
rep for both signed and unsigned types, and it's a processing error to
unparse a negative value into bcd rep.
SMH:
Noted that for a decimal, property decimalSigned is used to indicate whether
the logical value is signed or not. So we could disallow BCD for integer
and for decimal when decimalSigned is 'yes'. Interestingly, section
3.7.1 states "Signed
numbers with dfdl:binaryNumberRep 'bcd' are always positive. On unparsing
it is a processing error if the data is negative."
which is admitting that BCD can be used with signed types. IBM DFDL currently
implements the table in the description of binaryNumberRep, and so allows
BCD for integer and decimal regardless of decimalSigned, but does not allow
long, short, int, byte.
3. m229
- textStandardZeroRep - should this allow %ES; as one of the list of possibles?
SMH:
Decided not to allow %ES; because it adds some complexity to the 'empty
representation' processing rules, in the same way that xs:string and xs:hexBinary
do. Can always make an element required and use default of 0.
4. m236
- is V (virtual decimal point position) and also P allowed in the textNumberPattern
for double and float types?
SMH:
Post-call investigation: Errata 2.80 says they are allowed, but not in
conjunction with E, @ and * symbols. This is reflected in BNF as subpattern
:= prefix? ((number exponent?) | vpinteger) suffix?
5. m237
- Do we check that the various symbols used for infinity, digits, grouping
separators, decimal separators are properly distinct to allow parsing?
E.g., that the decimal separator and grouping separator aren't the same,
and that the positive and negative pattern variants are distinguishable?
ICU library supposedly doesn't do this checking. Do we state this is an
SDE in DFDL. If so then is this checking required? Can we make it possible
for implementations to not check somehow? Other grammar ambiguity situations
like separator and terminator being ambiguous are specifically NOT checked
for, because determining if a grammar is ambiguous is hard or undecidable,
and would have to be done at runtime because delimiters can be run-time
computed. Buf for the syntax components of text numbers do we require checking
or not?
SMH:
Post-call investigation: IBM DFDL gives an error if the decimal & grouping
separators are the same, but does not check any of the other characters
for uniqueness. (Andy - please
can you check what ICU does if
you set various of the text number characters to be the same value, eg,
decimal sep, grouping sep, exponent and (for floats) Nan and Inf reps ?,
in both strict & lax modes)
6. m370
- multiple PoU resolutions: If you have initiatedContent, AND a choiceBranchRef,
AND a discriminator all on the same element, and there are 3 enclosing
nested PoU, which one controls which? Precedence is the issue. Or.....
do we really need to allow this? Why don’t we just disallow this kind
of piling-on of complexity and make the user choose which PoU resolution
technique they want?
SMH:
a) initiatedContent 'yes' on a choice/sequence, and a discriminator on
a child of the choice/sequence. Allowed at the moment, and it is quite
possible that users of IBM DFDL have this combination, so I would prefer
not to make this an SDE (it's a difficult check to get right anyway due
to the nature of discriminator placement). Issue a warning if a discriminator
found on a direct child of the choice?
b) initiatedContent
'yes' and choiceBranchRef together on a choice. Yes one is redundant, but
remember that initiatedContent could have been obtained via scoping rules,
so if we made the combination an SDE then we would be forcing users to
explictly set initiatedContent to 'no'. I'd be ok with ignoring initiatedContent's
PoU resolution behaviour if choiceBranchRef was present on a choice (note
that choiceBranchRef can not be scoped).
c) choiceBranchRef
on choice and a discriminator on a child of the choice. Issue a warning
if a discriminator found on a direct child of the choice?
7. m396
- is BCD representation a mandatory feature, or optional?
SMH:
BCD calendars and BCD numbers are independent optional features. Will update
errata document to make this clear.
8. m398
- portability at risk if subset processors ignore properties they don't
implement. We relaxed this from a more rigid policy, and now allow subsets
to not validate properties they don't implement. However, is there a better
compromise, e.g., require a warning about all unimplemented/unrecognized
properties? E.g., dfdl:textBiDi='no' yields SDE "unrecognized property
'textBiDi' with value 'no'.
SMH:
Agreed that this had been relaxed too much and a warning MUST be issued
by implementations.
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU