[DFDL-WG] Minute for OGF DFDL Working Group Call, February 10-2010

11 Feb 2010

      Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, February 10-2010

Attendees
Mike Beckerle (Oco) 
Steve Hanson (IBM) 
Alan Powell (IBM) 
Suman Kalia (IBM) 
Peter Lambros (IBM) 
Tim Kimber(IBM) 
Stephanie Fetzer (IBM)

Apologies
Steve Marting (Progeny) 

1. Comments of latest discriminators doc. 
Stephanie has sent some editorial comments. Action will be closed next 
week subject to comments from others.

2. Remaining 037 review issues 

16.2 scannablility with lengthKind pattern:   
Confirm that this is what we agreed 
In summary, you can use a data pattern on any element (complex, simple 
text, simple binary) as long as the bytes are legal in the stated 
encoding, which where binary data is involved in practice means an 8-bit 
ASCII encoding. 

By 8-bit ASCII we really mean an encoding where all the codepoints from 
0-255 map to the equivalent value. Subsequent investigation indicates that 
'all' 8-bit ASCII encodings have gaps so there isn't a valid character.
Mike has suggested
1) for all ascii-based character sets, we say that bytes 0x00 to 0xFF all 
map to exactly those codepoints in ISO 10646 for the infoset, and vice 
versa.

2) define dfdl:encoding="bytes" as a special character set name which has 
the above property.
Action rasied

·         Tracker Issue: [schema] is an absolute or relative SCD. Why 
bother allowing absolute? 

Both Abosolute and relative SCDs are allowed. No change needed.

·         Tracker Issue: Glossary as the place for centralized 
definitions, or should they be repeated there, but also introduced at 
point of first use, or should we put the definitions only at the places 
where they are discussed, and xref from the glossary? 
The glossary will be used for definitions.
·         TBD: Issue - semantics of expressions containing relative paths 
that are inherited via ref to a dfdl:defineFormat. (also section 10.3)

This issue applies to expressions used in any global component not just 
defineFormats. It was agreed that relative paths would be allowed in 
global components expect for the dfdl:format annotation on an xs:schema. 
The relative paths are resolved when the global component is reference, 
not where it is defined..

·         TBD: Issue - XPath term - we are not consistent about using the 
term XPath, or "expression" when referring to our expression language. I 
prefer to call it our expression language, and then in the section that 
defines it state that it is a strict subset of XPath 2.0. 
The term 'DFDL expression' will be used.
·         TBD: Issue - fn:position is unclear given that we've just said 
we don't support sequences in the expression language. 
Action: clarify semantics of fn:position and fn:count. Relax single 
sequence restriction.
·         TBD: Issue - order of sections. Scoping rules section should 
come before variables section, which uses these concepts. 

Expressions and regular expressions will be moved to the back of the spec 
as they are 'advanced' features.

TBD: Issue: Case sensitivity of enum names - did we say whether this is 
case sensitive or not? I believe it should be case sensitive. 

Already agreed that they are case sensitive
·          Issue: dfdl:representation - Strings in binary rep. I see no 
reason why elements of type xs:string will examine dfdl:representation. 
They shouldn?t' care what it is, they are always "text". I should be able 
to specify a bunch of inter-mixed binary number and string elements 
without having to specify dfdl:representation="text' just to avoid an 
error on the string type elements. I believe xs:string type ignores 
dfdl:representation (always behaves as if dfdl:representation is 
'text').(If we change this then the property precedence section for 
simpletypes changes slightly as representation="text" is implied if type 
is string.)
That will make it impossible to introduce a binary representation of text 
later 

What is "a binary representation of text"? Is there a real issue here. 
This is a primary convenience and clarity issue for me. I do not want to 
have to change to representation="text" for every string inside a cobol 
structure, which is ultimately a binary representation object. To me 
type="string" is enough. I want to put in the file scope level of the 
schema a representation="binary", and then decorate the elements with the 
specifics of their types, but I do not expect to have to put 
representation="text" on anything. 

I do not understand what you are trying to achieve by requiring 
representation="text" for things that are already textual based on the 
type. 

Ageed that dfdl:represetnation 'text' is implied for strings and 
dfdl:represetnation 'binary' is implied for hexbinary

The rest of the issues below I think we need to discuss on calls. 
textStringPadCharacter textNumberPadCharacter - did we agree that this 
character must be a "minimum width" character if the char set encoding is 
variable width? (i.e., the pad char must be 1 byte if the encoding is 
UTF-8. 

textStringPadCharacter textNumberPadCharacter  must be a 1 byte character 
if the char set encoding is variable width?

numberInfinityRep numberNanRep - Is this applicable only to xs:double and 
xs:float? Also, what I've seen requires a distinction of sign. I.e., there 
are positive and negative infinities often printing as -inf and +inf. 

The description is the way ICU behaves but need clarification. It isn't 
clear how inf and Nan are represented in the infoset. Need to investigate 
if XML allows these values. Action raised

·         TBD: Issue - \n in regular expressions - clarify relationship of 
this to entities like NL entity. Also, if I include an entity like WSP* in 
a regular expression (can I?) does it then match accordingly? 
It appears that some of our multi-valued entities like WSP+ create 
conditional "matching" behavior without having to use regular expressions, 
e.g., when WSP+ is used as a separator. But can you use entities like WSP+ 
in a regular expression? It seems you should be able to use regular 
"single valued" entities in a regular expression, its these multi-valued 
ones that have tricky semantics. 
Added Unicode values to /n, /t,/r.  Disallow DFDL general (multi-option) 
entities in regular expressions
. 
14.1 Alignment - TBD: Issue - zero-based thinking here. But all the bits 
stuff and everything else in DFDL uses 1-based reasoning. Need to revisit 
to make this sensible for 1 based world. 
Added implicit alignment table. TBD zero-based 

It was felt that it was more natural to have alignment 0,2,4 etc rather 
than 1,3,5 etc. MB to rewrite section. Action raised

finalTerminatorCanBeMissing - spec is not clear. Also is there a 
finalSeparatorCanBeMissing 
Changed to finalDocumentTerminatorCanBeMissing and  
finalDocumentSeparatorCanBeMissing. Not sure where 
finalDocumentSeparatorCanBeMissing should be specified. Looks odd on 
'distinguished root'. These properties operate differently from other 
properties as they are defined on the 'distinguished root' but affect some 
lower down element. Effectively they are put in scope by a different 
mechanism

We discussed if the propoerties should be on the distinguished root and 
it's sequence but deciced that because these were really global properies 
that would only be allowed in the 'default' format block and be 'scoped' 
over the whole schema.

3. Go through Actions 

Meeting closed, 14:40

Next call  Wednesday 17 February January 2010  13:00 UK 

Next action: 083
Actions raised at this meeting

No
Action 
079
AP:Encoding for binary fields when lenghtkind is pattern
080
AP:Clarify semantics of fn:poisition and fn:count
081
AP: Inf and Nan
The description is the way ICU behaves but need clarification. It isn't 
clear how inf and Nan are represented in the infoset. Need to investigate 
if XML allows these values
082
MB: Should alignment be 0 or 1 based

Current Actions:
No
Action 

045
20/05 AP: Speculative Parsing
27/05: Psuedo code has been circulated. Review for next call
03/06: Comments received and will be incorporated
09/06: Progress but not discussed
17/06: Discussed briefly
24/06: No Progress
01/07: No Progress
15/07: No progress. MB not happy with the way the algorithm is documented, 
need to find a better way.
29/07: No Progress 
05/08: No Progress. Will document behaviour as a set of rules.
12/08: No Progress 
...
16/09: no progress
30/09: AP distributed proposal and others commented. Brief discussion AP 
to incorporate update and reissue
07/10: Updated proposal was discussed.Comments will be incorporated into 
the next version.
14/10: Alan to update proposal to include array scenario where minOccurs > 
0
21/10: Updated proposal reviewed
28/10: Updated proposal reviewed see minutes
04/11: Discussed semantics of disciminators on arrays. MB to produce 
examples
11/11: Absorbing action 033 into 045.  Maybe decorated discrminator kinds 
are needed after all. MB and SF to continue with examples. 
18/11: Went through WTX implementation of example. SF to gather more 
documentation about WTX discriminator rules.
25/11: Further discussion. Will get more WTX documentation. Need to 
confirm that no changes need to Resolving Uncertainty doc.
04/11: Further discussion about arrays.
09/12: Reviewed proposed discriminator semantic.
16/12: Reviewed discriminator examples and WTX semantic.
23/12: SF to provide better description of WTX behaviour and invite B 
Connolley to next call
06/01:B Connolly not available. SF to provide more complete description.
13/01: Stephaine took us through a description of WTX identifiers. Mike 
agreed to write up in DFDL terms.
20/01: Mike will write up
27/01: further discussion of discriminators
29/01: Alan had  emailed both proposals but not enough time to discuss
02/02: Agreed to adopt 'component exists' semantics for discriminators
10/02: 'component exists' proposal updated. comments by next call.
049
20/05 AP Built-in specification description and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these from test cases)
15/07: No progress. Once available, the examples in the spec should use 
the dfdl:defineFormat annotations they provide.
...
14/10: no progress
21/10: Discussed the real need for this being in the specification. It 
seemed that the main value is it define a schema location for downloading 
'known' defaults from the web. 
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource to complete this action it can 
be deferred
13/01:no progress
20/01: no progress
27/01: no progress
29/01: No progress.  The predefined formats do not need to be available 
when the spec is published.
Suman said that he had been mapping COBOL structures to DFDL and it didn't 
look as though the way text numbers are define is very usable. He will 
document for next call 
03/02: No progress
10/02: No progress
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
20/01: no update
27/01: no progress
29/01: no progress
03/02: IBM is still invetsigating
103/02: IBM is still invetsigating
079
Encoding for binary fields when lenghtkind is pattern
079
AP:Encoding for binary fields when lenghtkind is pattern
080
AP:Clarify semantics of fn:poisition and fn:count
081
AP: Inf and Nan
The description is the way ICU behaves but need clarification. It isn't 
clear how inf and Nan are represented in the infoset. Need to investigate 
if XML allows these values
082
MB: Should alignment be 0 or 1 based

Closed actions
No
Action 
077
SKK:  mapping of COBOL numbers to textNumberFormats.
03/02: Suman documented the problem. Agreed to remove textNumberFormat and 
textCalendarFormat.
10/02: closed
078
MB: Reword section 2.3.1 incorporating markup order rules.
10/02:closed

Work items:
No
Item
target version
status
005
Improvements on property descriptions 

not started
012
Reordering the properties discussion: move representation earlier, improve 
flow of topics 

not started 
036
Update dfdl schema with change properties 
ongoing

042
Mapping of the DFDL infoset to XDM 
none
not required for V1 specification
069
ICU fractional seconds
039

070
Write DFDL primer 

071
Write test cases.

072
it is a processing error if the number of occurrences in the data does not 
match the value of the expression or prefix
039

073
Rename dfdl:separatorPolicy="required" to "always". 
039
Defferred untilaction 071 agreed
078
document UPA checks
039

079
Semantics of length=0, nil handling and defaults. (A071)
039

080
Tlog: Allow LengthKind delimited for packed/bcd (A074)
039

081
Update empty sequence section (A075)
039

082
semantics of minOccurs= 0 on choice branches (A076)
039

083
Implement RFC2116

084
Length|Kind pattern scanability rules

085
Invalid character substitution

086
infoset round tripi: Rephrase sentence 'It is possible to define a schema 
so that when infoset unparsed and the datastream reparsed, the same 
infoset will be produced'

087
Clarify use of relative paths in global components.

088
'DFDL expression'

089
Ageed that dfdl:represetnation 'text' is implied for strings and 
dfdl:represetnation 'binary' is implied for hexbinary

091
textStringPadCharacter textNumberPadCharacter  must be a 1 byte character 
if the char set encoding is variable width?

092
 finalDocumentTerminatorCanBeMissing and  
finalDocumentSeparatorCanBeMissing allowed only in 'default' format 

093
 remove textNumberFormat and textCalendarFormat.

Regards

Alan Powell

Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell@uk.ibm.com

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU