No
| Action |
045
| 20/05 AP: Speculative Parsing
27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms. 20/01: Mike will write up 27/01: further discussion of discriminators 29/01: Alan had emailed bot proposals but not enough time to discuss |
049
| 20/05 AP Built-in specification description
and schemas
03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progress 20/01: no progress 27/01: no progress 29/01: No progress. The predefined formats do not need to be available when the spec is published. Suman said that he had been mapping COBOL structures to DFDL and it didn't look as though the way text numbers are define is very usable. He will document for next call |
066
| Investigate format for defining test
cases
25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update 27/01: no progress 29/01: no progress |
077
| SKK: mapping of COBOL numbers to textNumberFormats. |
A few comments in-line below
On Wed, Jan 20, 2010 at 7:01 AM, Alan Powell <alan_powell@uk.ibm.com>
wrote:
I have answered most of the issues and comments raised by Steve and Mike
but some need further discussion.
Issues from Steve H
General. Although dfdl:encoding enums are case insensitive, we should
stick to UC throughout in examples.
2. I agree with the existing comment that the RFC2119 key words should
be upper case.
14.3.4. There are type/rep combinations where lengthKind="implicit"
is not allowed - so saying that 'pattern' is replaced by 'implicit' on
unparsing does not work.
TBD
We covered this on the most recent wg call.
16.2. I'm not sure that scannability in this constant encoding sense
is necessary for patterns. I can create a regular expression that extracts
all characters up to hex value xXX or all characters up to xYY, thereby
treating the content as an encoding in-sensitive black box.
If your byte pattern happens to be a legal part of a multi-byte
character sequence, then you'll get a false recognition, or you won't get
what you expect.
Example: You are searching for byte 0xAA, but that can
legally appear as byte 3 of a 3-byte UTF-8 encoded character. When you
say you are looking for hex AA in a string, DFDL is currently defined to
mean you are looking for the character reprsented by that raw byte. If
the encoding is UTF-8, that isn't a legal character encoding sequence even,
so the decoder should cause an error or something.
Even for a fixed length single byte character set, you
have to have no unused code that have no mapping to ISO 10646, because
our infoset is defined in terms of translations into that.
I think we need encoding="none" or encoding="bytes"
or something if you really want to scan bytes without encoding causing
problems.
Issues from Mike B
· Tracker issue: codepoints outside BMP, as literals and in data.
· If I put in a value that requires use of a high/low surrogate pair, is that an error, does it require me to put in two separate %#...; thingys, one for each of the surrogates (in which case these are not really code points in ISO10646). If I put in a codepoint for one of the supplemental characters and the schema itself is written in UTF-16 then that has to translate into literal surrogate pair. Ok, but I’m very uncertain about all this stuff
The above item had two issues glomed together. There really
are two separate issues. The above is about these crazy codepoints that
use surrogate pairs. That's a minor corner case given the amount of use
those get.
The bigger issue is the one below, which is about things
that either are in strings and are broken character encodings, but we still
need to be able to process the data. There's also the matter of recovery
from errors in decoding, and what we put out when the infoset contains
a character code where there is no valid encoding, or just a character
code which isn't even in ISO 10646 (e.g., character code 0xFFFFFFFF, which
is not a valid character at all.
Tracker Issue: illegal character
encodings for parsing and unparsing. TBD: how do these make it into the
infoset or are they replaced, and if so how TBD: can one represent these
in the infoset for output? Ideally not, but…
·
Tracker
Issue: Processing-time Schema Definition Errors
This section (2.3.1 in this draft), is problematic as we’re trying to allow simple DFDL implementations to not do a bunch of static checking, yet if implementations differ on when Schema Definition errors are detected, then the second paragraph says they are converted to processing errors. This lets different implementations do very different things in terms of how the speculative parsing back-tracks around.
Grammar ambiguity is a very tricky case. Unless a DFDL implementation can prove a grammar to be unambiguous, then it is very hard to say that any particular combinatino of delimiters make up a legal DFDL schema definition. If the parser simply fails because the grammar was ambiguous, there’s no way to tell the difference between this and just broken data without proving the grammar is unambiguous. In general it is formally undecidable whether a grammar is ambiguous or unambiguous. (http://books.google.com/books?id=lIuu53IcKWoC&pg=PT217&lpg=PT217&dq=proving+a+grammar+is+unambiguous&source=bl&ots=wie8TAt-MT&sig=ZSD7tIwnXZIT8Ic91BWMH2H2dKg&hl=en&ei=hAQ5S5vPOIri7APc37CKBg&sa=X&oi=book_result&ct=result&resnum=10&ved=0CDAQ6AEwCQ#v=onepage&q=proving%20a%20grammar%20is%20unambiguous&f=false)
Since DFDL v1.0 doesn’t allow recursive declarations/definitions, it may be possible to provide the ambiguity or unambiguity of a DFDL schema (or rather, the data syntax grammar described by it – if you want to bother to distinguish the two), but recursion isn’t something we want to rule out for the future, so
Type checking is decidable in DFDL’s expression language, so we could always detect type safety before run time; however, if we allow a simplistic DFDL implementation to just check types at run time then this would, by the definition in this section (2.3.1), issue processing errors when it detects these at run time, thereby allowing backtracking of the speculative parser to be driven off of type-checks in the expression language. It seems to me that we need to find a way to put this problem back into the hands of the user, and say that a schema where this actually matters (one where a type error causes a backtrack, which ultimately causes a successful parse) are illegal but implementations are allowed to not detect this particular illegality.
It seems to me we need to put this problem back into the hands of the user.
· Tracker Issue: "round trip" for infoset. Should we omit the whole point?
· Tracker Issue: [schema] is an absolute or relative SCD. Why bother allowing absolute?
· Tracker Issue: Glossary as the place for centralized definitions, or should they be repeated there, but also introduced at point of first use, or should we put the definitions only at the places where they are discussed, and xref from the glossary?
·
TBD:
Issue - semantics of expressions containing relative paths that are inherited
via ref to a dfdl:defineFormat. (also section 10.3)
·
TBD: Issue - XPath term - we
are not consistent about using the term XPath, or "expression"
when referring to our expression language. I prefer to call it our expression
language, and then in the section that defines it state that it is a strict
subset of XPath 2.0.
· TBD: Issue - fn:position is unclear given that we've just said we don't support sequences in the expression language.
· TBD: Issue - order of sections. Scoping rules section should come before variables section, which uses these concepts.
·
Issue: dfdl:representation
- Strings in binary rep. I see no reason why elements of type xs:string
will examine dfdl:representation. They shouldn’t' care what it is, they
are always "text". I should be able to specify a bunch of inter-mixed
binary number and string elements without having to specify dfdl:representation="text'
just to avoid an error on the string type elements. I believe xs:string
type ignores dfdl:representation (always behaves as if dfdl:representation
is 'text').(If we change this then the property precedence section for
simpletypes changes slightly as representation="text" is implied
if type is string.)
That will make it impossible to introduce a binary representation of text
later
What is "a binary representation of text"? Is
there a real issue here. This is a primary convenience and clarity issue
for me. I do not want to have to change to representation="text"
for every string inside a cobol structure, which is ultimately a binary
representation object. To me type="string" is enough. I want
to put in the file scope level of the schema a representation="binary",
and then decorate the elements with the specifics of their types, but I
do not expect to have to put representation="text" on anything.
I do not understand what you are trying to achieve by
requiring representation="text" for things that are already textual
based on the type.
The rest of the issues below I think we need to discuss
on calls.
It appears that some of our multi-valued
entities like WSP+ create conditional "matching" behavior without
having to use regular expressions, e.g., when WSP+ is used as a separator.
But can you use entities like WSP+ in a regular expression? It seems you
should be able to use regular "single valued" entities in a regular
expression, its these multi-valued ones that have tricky semantics.
Added Unicode values to /n, /t,/r. Disallow DFDL entities in regular
expressions.
Regards
|
Alan Powell |
Development - MQSeries, Message Broker, ESB |
IBM Software Group, Application and Integration Middleware Software |
------------------------------------------------------------------------------------------------------------------------------------------- |
IBM |
MP211, Hursley Park |
Hursley, SO21 2JN |
United Kingdom |
Phone: +44-1962-815073 |
e-mail: alan_powell@uk.ibm.com |
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU