· Tracker issue: codepoints outside BMP, as literals and in data.
· If I put in a value that requires use of a high/low surrogate pair, is that an error, does it require me to put in two separate %#...; thingys, one for each of the surrogates (in which case these are not really code points in ISO10646). If I put in a codepoint for one of the supplemental characters and the schema itself is written in UTF-16 then that has to translate into literal surrogate pair. Ok, but I’m very uncertain about all this stuffTracker Issue: illegal character encodings for parsing and unparsing. TBD: how do these make it into the infoset or are they replaced, and if so how TBD: can one represent these in the infoset for output? Ideally not, but…
· Tracker Issue: Processing-time Schema Definition Errors
This section (2.3.1 in this draft), is problematic as we’re trying to allow simple DFDL implementations to not do a bunch of static checking, yet if implementations differ on when Schema Definition errors are detected, then the second paragraph says they are converted to processing errors. This lets different implementations do very different things in terms of how the speculative parsing back-tracks around.
Grammar ambiguity is a very tricky case. Unless a DFDL implementation can prove a grammar to be unambiguous, then it is very hard to say that any particular combinatino of delimiters make up a legal DFDL schema definition. If the parser simply fails because the grammar was ambiguous, there’s no way to tell the difference between this and just broken data without proving the grammar is unambiguous. In general it is formally undecidable whether a grammar is ambiguous or unambiguous. (http://books.google.com/books?id=lIuu53IcKWoC&pg=PT217&lpg=PT217&dq=proving+a+grammar+is+unambiguous&source=bl&ots=wie8TAt-MT&sig=ZSD7tIwnXZIT8Ic91BWMH2H2dKg&hl=en&ei=hAQ5S5vPOIri7APc37CKBg&sa=X&oi=book_result&ct=result&resnum=10&ved=0CDAQ6AEwCQ#v=onepage&q=proving%20a%20grammar%20is%20unambiguous&f=false)
Since DFDL v1.0 doesn’t allow recursive declarations/definitions, it may be possible to provide the ambiguity or unambiguity of a DFDL schema (or rather, the data syntax grammar described by it – if you want to bother to distinguish the two), but recursion isn’t something we want to rule out for the future, so
Type checking is decidable in DFDL’s expression language, so we could always detect type safety before run time; however, if we allow a simplistic DFDL implementation to just check types at run time then this would, by the definition in this section (2.3.1), issue processing errors when it detects these at run time, thereby allowing backtracking of the speculative parser to be driven off of type-checks in the expression language. It seems to me that we need to find a way to put this problem back into the hands of the user, and say that a schema where this actually matters (one where a type error causes a backtrack, which ultimately causes a successful parse) are illegal but implementations are allowed to not detect this particular illegality.
It seems to me we need to put this problem back into the hands of the user.
· Tracker Issue: "round trip" for infoset. Should we omit the whole point?
·
Tracker
Issue: [schema] is an absolute or relative SCD. Why bother allowing absolute?
· Tracker Issue: Glossary as the place for centralized definitions, or should they be repeated there, but also introduced at point of first use, or should we put the definitions only at the places where they are discussed, and xref from the glossary?
·
TBD:
Issue - semantics of expressions containing relative paths that are inherited
via ref to a dfdl:defineFormat. (also section 10.3)
·
TBD:
Issue - XPath term - we are not consistent about using the term XPath,
or "expression" when referring to our expression language. I
prefer to call it our expression language, and then in the section that
defines it state that it is a strict subset of XPath 2.0.
· TBD: Issue - fn:position is unclear given that we've just said we don't support sequences in the expression language.
· TBD: Issue - order of sections. Scoping rules section should come before variables section, which uses these concepts.
·
Issue: dfdl:representation
- Strings in binary rep. I see no reason why elements of type xs:string
will examine dfdl:representation. They shouldn’t' care what it is, they
are always "text". I should be able to specify a bunch of inter-mixed
binary number and string elements without having to specify dfdl:representation="text'
just to avoid an error on the string type elements. I believe xs:string
type ignores dfdl:representation (always behaves as if dfdl:representation
is 'text').(If we change this then the property precedence section for
simpletypes changes slightly as representation="text" is implied
if type is string.)
That will make it
impossible to introduce a binary representation of text later
· TBD: Issue - \n in regular expressions - clarify relationship of this to entities like NL entity. Also, if I include an entity like WSP* in a regular expression (can I?) does it then match accordingly?
It appears that some of our multi-valued
entities like WSP+ create conditional "matching" behavior without
having to use regular expressions, e.g., when WSP+ is used as a separator.
But can you use entities like WSP+ in a regular expression? It seems you
should be able to use regular "single valued" entities in a regular
expression, its these multi-valued ones that have tricky semantics.
Added Unicode values
to /n, /t,/r. Disallow DFDL entities in regular expressions.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU