Alan

I've made a couple of changes to item 3 to reflect what was discussed.

Regards

Steve Hanson
Strategy, Common Transformation & DFDL
Co-Chair, OGF DFDL WG
IBM SWG, Hursley, UK,
smh@uk.ibm.com,
tel +44-(0)1962-815848



From: Alan Powell/UK/IBM@IBMGB
To: dfdl-wg@ogf.org
Date: 08/07/2010 16:08
Subject: [DFDL-WG] Minutes for OGF DFDL Working Group Call, July 07-2010
Sent by: dfdl-wg-bounces@ogf.org







Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, July 07-2010

Attendees

Steve Hanson (IBM)
Alan Powell (IBM)  
Stephanie Fetzer (IBM)

Tim Kimber(IBM)
Suman Kalia (IBM)


Apologies

Mike Beckerle (Oco)


1. Current Actions

Updated below


2 Nils and Defaults.

Discussed Alan updates and Tim and Steve's comments. Still some corrections and updates. Needs to have a introductory paragraph.


3  DFDL property types and other issues.

Not discussed

Tim has proposed more specific types for some properties. In particular separating the different kinds of entities.

- Modify the meaning and usage of the type 'DFDL String Literal' and modify the remainder of the specification accordingly

- Improve the description of DFDL entities to avoid confusion over the intended usage of raw byte values.

- Clarify the standard sentence about forward references in DFDL expressions - the current text implied that the restrictions only applied to the unparser.

On parsing: No forward references allow for expressions that provide property values. Asserts and discriminators may have downward references (need to be defined)
and expression must be evaluated by the time the element has been processed.

On unparsing:  outputValueCalc may have a forward reference.

1.1        DFDL Properties

Properties on DFDL annotations may be one or more of the following types

·        DFDL string literal
The property value is a string that represents a sequence of literal bytes or characters which appear in the data stream.

·        List of DFDL string literals
The property value is a space-separated list of DFDL string literals. When parsing, if more than one string literal in the list matches the portion of the
data stream being evaluated then the longest matching string literal in the list must be used.
When unparsing, the first string literal in the list must be used.

·        DFDL expression
The property value is an DFDL subset XPath 2.0 expression that returns a value derived from other property values and/or from the DFDL infoset.

·        
DFDL regular expression
The property value is a regular expression that can be used as a pattern to calculate the length of an element by applying that pattern to the sequence
of literal bytes or characters which appear in the data stream.

·        
Enumeration
The property value is one of the allowed values listed in the property description.

·        
Logical Value Simple Type
The property value is a string that describes a logical value. The type of the logical value is one of the XML Schema simple types.

·        
QName
The property value is an XML Qualified Name as specified in “Namespaces in XML “.

Identify which properties can be a list of an above type, or a union of more than one of the above types


  1.1.1.1         DFDL Entities in String Literals

XML entities should not be listed as they are valid in any property value.
A statement concerning use of XML entities should be added to section 5 of the spec, if not already there.

     1.1.1.1        Character classes in DFDL String literals

     1.1.1.1        Raw byte values in DFDL String Literals


4. using textStringPadCharacter with charRef '%#r' on multi-byte encoding

Not discussed
textStringPadCharacter DFDL String literal

The padding character
or byte value that is used when justifying or trimming text elements.

A pad character can be specified using DFDL entities.

A pad byte value must be specified using the %#r entity.


DFDL validation rules

- if a pad byte value is specified when lengthUnits='characters' then the encoding must be a fixed-width encoding.

- if a pad character is specified when lengthUnits='bytes' then the pad character must be a single-byte character.


If a pad byte value is specified when lengthUnits='characters' then padding and trimming must be applied
using an array of N pad byte values, where N is the width of a character in the fixed-width encoding.


Annotation: dfdl:element, dfdl:simpleType




Would adding fillByte  to dfdl:padChar enumeration make it clearer.  
padChar is a pad character (%#r not allowed)

fillByte is a fill byte


5. nilIndicatorPath and nilIndicatorIndex properties
 

These properties seem a bit of an anomaly. Tim has suggested they can be simplified.

Tim had emaled Mike with a proposal and Steve had responded with 3 alternatives

1) Leave things as they are, perhaps renaming 'nilIndicator' to 'indicator'

2) Collapse nilIndicatorPath and nilIndicatorIndex to a single property. It means the path is repeated, but it makes nils consistent with length and occurs. For all, you can always use a string variable to hold the constant part of the path and concatenate with the index. And if we improve usability in a future DFDL release, we would improve it for nils, lengths and occurs at the same time.

3) Drop the properties for 1.0 altogether on the grounds that it is a rare case. I've never seen such a format, but we know Mike has.

Not discussed.

Meeting closed, 17:00

Next call  Wednesday  14 July  2010  15:00 UK  (10:00 ET)

Next action: 100

Actions raised at this meeting
No
Action

Current Actions:
No
Action
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.

04/12: no update

...

17/02: IBM is willing in principle to publish the test case format and some of the test cases. May need some time to build a 'compliance suite'

24/03: No progress

03/03: Discussions have been taking place on the subset of tests that will be provided.

10/03: work is progressing

17/03: work is progressing

31/03: work is progressing

14/04: And XML test case format has been defined and is being tested.

21/04. Schema for TDML defined. Need to define how this and the test cases will be made public

05/05: Work still progressing

12/05: Work still progressing

02/06: Work still progressing on technical and legal considerations

16/06: work continues

23/06: work continues

30/06: work continues

07/07: work continues
085
ALL: publicize Public comments phase to ensure a good review..
14/04: see minutes

21/04: Press release, OMG and other standards bodies.

05/05: Alan and Steve H have contacted other standards bodies. Will ask them to add comments on spec

15/05: still no public comments

02/06: No public comments

16/06: Public comments period has ended with no external comments. Alan had posted changes made in draft 041. Steve suggested send a note to the WG highlighting these changes.  Steve also suggested requesting an extension as other IBM groups may review. We discussed whether this was necessary as changes will need to be made during the implementation phase anyway. Alan to ask OGF what the process is for changes post public comment.

23/06: Still no comments. Alan will contact OGF to understand the rest of the process.

30/06: Alan has emailed Joel asking what the process is now public comment period is over andcan we update the published version with WG updates. No response yet.

07/07: No response. Alan will chance up
086
AP: Nils and Defaults during unparsing - update table
31/03: TK to documetn use cases for parsing

14/04: Investigate new property to control empty string behaviour.

21/04: After investigation a new property is not required. New rules developed and tables updated.

Need examples of complexTypes to confirm tables apply.
Review Nils, defaulting spec section.

05/05: Discussed defaulting complex elements. Tables updated but need to add terminator.

SH; to confirm WMD behaviour when infoset item has no value on unparsing

Need to describe defaulting choices.

15/05: More discussion. Alan updating sections

26/05: Discussed draft updates. Stephanie to confirm asserts do not make an element required.
Alan will update draft..  All: review rest of draft.

02/06: Alan updated description. Please review.
Discussed Stephanie's example using discriminators. Decided no changes needed.

16/05: went through Steves comments. Steve to update draft.

23/06: Steve's updates to the rules discussed. See minutes. Rest of document needs updating.

30/06: Discussed Alans updates. Some corrections. Alan will send out updated copy for review before next call.

07/07: Discussed Alan updates and Tim and Steve's comments. Still some corrections and updates.
088
define semantics of choiceKind 'fixedLength'
31/03: TK to provide definition of calculable length.

Investigate  PL/I varchars and Cobol occurs dependingon.

14/04Tim had distributed a document starting the definition of calculable length for the longest choice member.

Alan had done some investigation of COBOL occurs depending on and when used in the working section of a program then the maximum storage was reserved but when used in the linkage section the dependent number was uses. We need to understand how the WMB COBOL importer deals with ODO.

21/04: Need to define 'calculable length' and WMB importer ODO behaviour.

05/05: TK: Still need definition of calculable length.

SKK: WMB COBOL imported behaviour with ODO

15/05: Suman sent an expmle of an imported Cobol ODo which suggested that the maximum space was reserved. He will extend the example.

02/06: no progress

16/06: no progress

23/06: no progress

30/06: Alan looked at Tim's description of calculable length and suggested that that real use case may be much simpler. If real use case is COBOL and C importers then it would be cleaner to require the 'fixed length' to be specified on the enclosing complex element and remove  choiceKind 'fixed length'. Ask Suman is COBOL and C importers can be enhance to provide length on cpmplex element.

07/07: Steve had proposed 5 possible approaches to supporting main use case.
i) Change the importer to add a parent element for each group. Con: This changes the logical model by inserting an extra level.
ii) Allow dfdl:length to be carried on embedded xs:choice (along with dfdl:lengthUnits & dfdl:lengthKind).
iii)  New dfdl:choiceLength property to be carried on embedded xs:choice (always bytes).
iv) dfdl:choiceKind as today. Con: length from TD model is lost and must be recalculated.

v) dfdl:choiceKind but with limitations to make the calculation easier. Con: length from TD model is lost and must be recalculated.

Agreed a modified version of iii)

dfdl:choiceKind becomes dfdl:choiceLengthKind with emuns 'implicit' (length of selected branch) and 'explicit' ( length specified by dfd;choiceLength)

New property dfdl:choiceLength
096
AP: using textStringPadCharacter with charRef '%#r' on multi-byte encoding
07/07: not discussed
097
nilIndicatorPath and nilIndicatorIndex properties
07/07: Tim had emaled Mike with a proposal and Steve had responded with 3 alternatives

1) Leave things as they are, perhaps renaming 'nilIndicator' to 'indicator'

2) Collapse nilIndicatorPath and nilIndicatorIndex to a single property. It means the path is repeated, but it makes nils consistent with length and occurs. For all, you can always use a string variable to hold the constant part of the path and concatenate with the index. And if we improve usability in a future DFDL release, we would improve it for nils, lengths and occurs at the same time.

3) Drop the properties for 1.0 altogether on the grounds that it is a rare case. I've never seen such a format, but we know Mike has.

Not discussed.
099
Splitting the specification in simpler sections.
07/07: Steve sent a proposal but not discussed. Alan will arrange a separate call.

Closed actions
No
Action
092
AP: Confirm behaviour of defaulting with various occursCountKinds and separator policies.
16/06: no progress

23/06: discussed
- whether when number of instances doesn't match specified number of occurrences is it an error or should missing instances be defaulted? Decided it is an error.

- defaulting occurs up to minoccurs unless separator policy is required when default up to maxOccurs and unbounded is an error.

30/06: Decided that defaulting for variable occurrence arrays should always be to minOccurs. If separatorPolicy is 'required' then just the separators will be output up to maxOccurs and unbounded is an error.

07/07: Agreed last call's decision. Closed

Work items:
No
Item target version status
005
Improvements on property descriptions not started
012
Reordering the properties discussion: move representation earlier, improve flow of topics not started
036
Update dfdl schema with change properties ongoing
042
Mapping of the DFDL infoset to XDM none not required for V1 specification
070
Write DFDL primer
071
Write test cases.
083
Implement RFC2116
105
AP: Describe trailingSkipBytes for delimited formats.
Alan suggested 'dfdl:terminator must be specified and not empty if dfdl:lengthKind is delimited or endOfParent.'
106
AP: Skip Bytes should allow bits
Ageed that it should be possible to specify bits.
- LSB and TSB renames to dfdl:leadingSkip, dfdl:trailingSkip
- units are specified by dfdl:alignmentUnits.
107
Remove timing from dfdl:assert
108
AP: Confirm behaviour of defaulting with various occursCountKinds and separator policies.
30/06: Decided that defaulting for variable occurrence arrays should always be to minOccurs. If separatorPolicy is 'required' then just the separators will be output up to maxOccurs and unbounded is an error.

 
Regards
 
Alan Powell
 
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell@uk.ibm.com







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





--
 dfdl-wg mailing list
 dfdl-wg@ogf.org
 
http://www.ogf.org/mailman/listinfo/dfdl-wg








Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU