Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, January-20-2010

Attendees
Mike Beckerle (Oco)
Steve Hanson (IBM)
Alan Powell (IBM)
Steve Marting (Progeny)
Suman Kalia (IBM)
Peter Lambros (IBM)

Apologies
Stephanie Fetzer (IBM)
Tim Kimber(IBM)

1. 045 - Disciminators
Mike has started to write proposal.

2 Unparsing lengthKind = 'pattern'
The Specification currently says that on unparsing dfdl:lengthKind='pattern' behaves like dfdl:lenghtKind='implict' but since we limited 'implict' to certain logical/representation combinations this is no longer correct.

Proposals
1. When unparsing complex elements with dfdl:lengthKind='pattern', the length is the length of the children (same as 'implicit')

When unparsing simple elements with dfdl:lengthKind='pattern' the length is the implicit length for those that have one and length of the data supplied in the infoset converted to the representation with no padding/filling for the rest.

For string/text and hexbinary/binary that is reasonable
For number/text/standard and zoned it is governed by the numberPattern
For number/binary/ binary use 'implicit' lengths
For number/binary/packed or bcd use minimum number of bytes
For Calendar/text it is governed by the calendarpattern
For Calendar/binary/packed or bcd use minimum number of bytes
For Calendar/binary/binarymilliseconds or binaryseconds use implicit lengths
For Boolean/text use the length of the true/false rep.
For Boolean/binary use implicit length (32)

2. A new property dfdl:patternOutputLengthKind

3. We could limit dfdl:lenghtKind 'pattern' to complex elements so that there is always a lengthKind for child simple elements but this would introduce 'unnecessary' complex items in the infoset when there is only one child.

The WG decided that option 1 was preferable but it needs to be written in terms of a set of 'base' unparsing rules which are augmented depending on the dfdl:lengthKind.

There was some discussion about whether the pattern should be reapplied to the data stream produced to ensure that it could be parsed by the same schema. Perhaps this should be part of validation. There was no agreement.

MB suggested that dfdl:lengthKind 'pattern' should only be allowed for elements with representation 'text' and complex elements with children all 'text'. Steve H felt that it was needed for complex elements containing binary data.

Alan will write up 'base' unparsing length rules.

3. TLOG
Steve H took us through his further investigation of TLog format

Finally had some time to look at the TLOG stuff again (the format emitted by IBM 4680 & 4690 POS controllers). This time I've looked at the MRM code, & spoken with domain expert David Bennett, to establish the full behaviour. Unfortunately there are some issues which mean DFDL 1.0 is not capable of handling it.

1. The individual fields are a mixture of ASCII strings, proprietary packed strings (rare), proprietary packed decimals, binary integers (rare). All fields are delimited by a separator.

2. The fields are all defined with a length in bytes, but most of the string and decimal fields are actually variable length. If the data exceeds the length in bytes when parsing or unparsing the MRM throws an error. However, talking to David B, the length is really intended to be used if there was ever a fixed length equivalent format, so is really for validation only. To parse the current TLOG formats it is sufficient to use the delimiter.
Note to WG: Validation using the specified length only works for strings. Should we allow dfdl:length to be specified when dfdl:lengthKind=delimited or pattern, and use it as an extra constraint when parsing/unparsing?

3. Scanning for the separator, or maybe use of a data pattern, is needed to extract the variable length data, including packed decimal data (which we would consider a binary type).
Note to WG: Should we allow binary scanning when the users says it is safe to do so?

4. Packed strings. This is a packed data type used when the range of possible chars in the string is limited to 0-9, A-F. You can view this as a BCD that can also carry A-F. MRM does not try to turn this into an integer, instead it treats it as character data. That is x12x34 would result in string '1234'. Odd numbers of digits are padded with a x0 nibble. On unparsing the MRM throws an error if a character other than 0-9, A-F is encountered. In practice, these packed strings are rare and invariably only carry 0-9 anyway. I'm still trying to establish why this data type is needed, as it could be treated as a logical integer and modelled as BCD.

5. Packed decimals. Like a packed decimal in the IBM sense. These can carry negative numbers but use a leading xD sign nibble. No sign nibble if positive. Odd number of digits (including sign if present) are padded with xF nibble. This is best illustrated using examples.
1234 => x12x34
123 => xF1x23
-1234 => xFDx12x34
-123 => xD1x23
Note to WG: Should we support this data type natively?

6. Some of the packed decimals are interpreted as an array of bit flags. I assume that DFDL would model these using dfdl:hidden.

Steve H is going to make a proposal about how DFDL can parse TLog

4, Empty Sequences
Is the current section correct

1.1 Empty Sequences

A sequence having no children is syntactically legal in DFDL; however, a sequence having no children must have content length zero. It can still have non-zero length Prefix and Suffix regions, but the SequenceContent region in between must be of length zero. It is a schema definition error if the SequenceContent region of an empty sequence is not length zero.

Steve H will re-word.

5. Mike/Steve review issues
Mike B had to leave but a number a issues were discussed

- Case of enumerations. We should follow the XSDL convention which is that enumerations are case sensitive

- dfdl:lengthKind='Pattern: dfdl entities will be allowed in the lengthPattern regular expression except for generic entities such as WSP and NL

- dfdl:lengthKind='Pattern scannability: A complex element with lengthKind=Pattern will use its dfdl:encoding property as the encoding when scanning its children irrespective of the child's encoding property.

Mike to comment on remaining review issues.

6. Go through remaining actions
Updated below

Action 071 Semantics of length=0, nil handling and defaults.
1) Steve H to propose new name for dfdl:defaultValueInitiatorPolicy

Looking more at this, we should be consistent with the similar property for missing separators. So I've changed the enums too - now 'require' and 'suppress', Also changed unparsing behaviour - we must honour the property - the existing behaviour of always writing the initiator means we can not successfully re-parse if writing empty content and enum is 'suppress'. When reading, assume that section 15.13 has been updated to include complex as well as simple elements.

missingValueInitiatorPolicy Enum
Valid values ‘require', ‘suppress'
Specifies whether to expect an initiator when an element is missing. Ignored unless dfdl:initiator is specified and is not "" (empty string).
'require' - Indicates that the dfdl:initiator followed by empty content is the required syntax to indicate that the element is missing.
'suppress' - Indicates that empty content is the required syntax to indicate that the element is missing. The presence of an initiator implies that real content must follow.
Use of ‘suppress’ implies an ordered sequence. If used on an initiated element of an unordered group it is a schema definition error.
If the element is required, defaulting occurs as defined above.
This property also applies on unparsing, when the data to be written (after nil value and default value processing) is empty content.
Annotation: dfdl:element

We should similarly change the enums for nilValueInitiatorPolicy to 'require' and 'suppress'.

2) Not recorded in minutes, but there was a discussion around my bullet on choices.

Worth noting that the concept of 'required' for the elements of a choice does not apply. Even if minOccurs > 0.

The issue was on unparsing. Which branch of a choice do we output when a complex element is required but missing from the infoset? I think it should be the first branch of the choice that does not result in a processing error.

NOTE: Work Item 071 changes dfdl:separatorPolicy enumeration from require to always. Need to make sure that is consistent with this proposal.

7 Review Schedule
OGF prereview is confirmed to take about 4 weeks assuming no document updates are required. We are behind schedule to be available for public review by March. Draft 038 will be available at the end of this week.

Activity

Schedule

Who

Complete Action items
- 18 Dec 2009
WG
Complete Spec Write up work items
– 23 Dec 2009
AP
Restructure and complete specification
- 23 Dec 2009
AP
Issue Draft 038
23 Dec 2009

WG review WG review
7 Dec – 08 Jan 2010
WG
Incorporate review comments
4 Jan - 29 Jan 2010
AP +
Issue Draft 039
15 Jan 2010

Incorporate review comments
4 Jan - 29 Jan 2010
AP +
Issue Draft 040
29 Jan 2010

Initial OGF Editor Review Initial Editor review
1 Feb - 1 Mar 2010
OGF
Initial GFSG review
1 Feb - 1 Mar 2010

Issue Draft 041
1 Mar 2010

OGF Public Comment period (60 days)
1 Mar - 30 Apr 2010
OGF
OGF 28 Munich
15-19 March 2010

Incorporate comments Incorporate comments
28 May 2010

Issue Draft 042
28 May 2010

Final OGF Editor Review Final Editor review
June 2010
OGF
final GFSG review
June 2010

Issue Final specification
30 June 2010

Publish proposed recommendation
1 July 2010

Grid recommendation process
1 Jan - 1 April 2011

Meeting closed, 14:30

Next call Tuesday 26 January 2010 13:00 UK and Wednesday 27th January 2010 13:00 UK.

Next action: 076

Actions raised at this meeting

No
Action

074
SH: Proposal for parsing TLog

075
SH: rewrite empty sequences section

Current Actions:

No
Action

045
20/05 AP: Speculative Parsing
27/05: Psuedo code has been circulated. Review for next call
03/06: Comments received and will be incorporated
09/06: Progress but not discussed
17/06: Discussed briefly
24/06: No Progress
01/07: No Progress
15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way.
29/07: No Progress
05/08: No Progress. Will document behaviour as a set of rules.
12/08: No Progress
...
16/09: no progress
30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue
07/10: Updated proposal was discussed.Comments will be incorporated into the next version.
14/10: Alan to update proposal to include array scenario where minOccurs > 0
21/10: Updated proposal reviewed
28/10: Updated proposal reviewed see minutes
04/11: Discussed semantics of disciminators on arrays. MB to produce examples
11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples.
18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules.
25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc.
04/11: Further discussion about arrays.
09/12: Reviewed proposed discriminator semantic.
16/12: Reviewed discriminator examples and WTX semantic.
23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call
06/01:B Connolly not available. SF to provide more complete description.
13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms.
20/01: Mike will write up

049
20/05 AP Built-in specification description and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these from test cases)
15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide.
...
14/10: no progress
21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web.
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource to complete this action it can be deferred
13/01:no progess
20/01: no progess

064
MB/SH Request WG presentation at OGF 28
25/11: Session requested
04/12: no update
09/12: no update
16/12: SH has changed request to a general session rather tha WG in the hope of attracting more people.
23/12: no update
06/01: not heard anything yet
13/01: no update
20/01: no update

066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
20/01: no update

071
Semantics of length=0, nil handling and defaults.
23/12:SH no update
06/01: SH has started
13/01: SH proposal review. Minor updates to be made
20/01: Reviewed updated proposal. Need to agree on unparsing empty choices.

074
SH: Proposal for parsing TLog

075
SH: rewrite empty sequences section

Closed actions

No
Action

056
MB Resolve lengthUnits=bits including fillbytes
12/08: No Progress
...
28/10: no progress
04/11: MB to look at lengthUnits = bits
11/11: no update
18/11: no update
25/11: no update
04/12: no update. ALan will set up a separate call to progress this action.
09/12: no update. ALan will set up a separate call to progress this action.
16/12: MB, SH and AP had a separate call. MB to distribute proposal
23/12: Discussed proposal. MB will updated
06/01: V4 discussed and approved
13/01: Mike updated proposal. Closed

073
SH: Control of overpunching zoned positive sign
13/01: no update
20/01: Proposal agreed. Closed

Work items:

No
Item target version status

005
Improvements on property descriptions not started

011
How speculative parsing works (combining choice and variable-occurence - currently these are separate) (from action 045) awaiting completion of actions 045

012
Reordering the properties discussion: move representation earlier, improve flow of topics not started

036
Update dfdl schema with change properties ongoing

038
Improve length section including bit handling 038 some improvement in 036

042
Mapping of the DFDL infoset to XDM none not required for V1 specification

069
ICU fractional seconds 038

070
Write DFDL primer

071
Write test cases.

072
it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix 038

073
Rename dfdl:separatorPolicy="required" to "always". 038

074
- Last 'postFix' separator is not optional
- Terminators are mandatory.
- dfdl:documentFinalTerminatorCanBeMissing
- dfdl:documentFinalSeparatorCanBeMissing (Action (70)) 038

075
Remove occursCountKind="useAvailableSpace". 038

076
dfdl:documentRoot, will be defined that can only be on global elements.
The DFDL spec does not have to define the format of parameters to the DFDL processor but will indicate that it must be possible to adresss any element.
Agreed that ANY element within the schema cane be the starting point for parsing or unparsing.
dfdl:documentRoot no longer required 038

077
'delimited' means the item is delimited by the item’s terminator (if specified) or an enclosing construct’s separator or end of the enclosing construct designated by its known length or its terminator.
The definition of EndOfParent also needs improving. 038

078
document UPA checks 038

079
Restrictions on use of 'special' entities in regular expressions 038

080
LengthUnit=bits (A056) 038

081
Case of enumerations. We should follow the XSDL convention which is that enumerations are case sensitive 038

082
dfdl:lengthKind='Pattern: dfdl entities will be allowed in the lengthPattern regular expression except for generic entities such as WSP and NL 038

083
dfdl:lengthKind='Pattern scannability: A complex element with lengthKind=Pattern will use its dfdl:encoding property as the encoding when scanning its children irrespective of the child's encoding property. 038

084
Control of overpunching zoned positive sign 038

Alan Powell

MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Activity		Schedule	Who
Complete Action items		- 18 Dec 2009	WG
Complete Spec	Write up work items	– 23 Dec 2009	AP
	Restructure and complete specification	- 23 Dec 2009	AP
	Issue Draft 038	23 Dec 2009
WG review	WG review	7 Dec – 08 Jan 2010	WG
	Incorporate review comments	4 Jan - 29 Jan 2010	AP +
	Issue Draft 039	15 Jan 2010
	Incorporate review comments	4 Jan - 29 Jan 2010	AP +
	Issue Draft 040	29 Jan 2010
Initial OGF Editor Review	Initial Editor review	1 Feb - 1 Mar 2010	OGF
	Initial GFSG review	1 Feb - 1 Mar 2010
	Issue Draft 041	1 Mar 2010
OGF Public Comment period (60 days)		1 Mar - 30 Apr 2010	OGF
OGF 28 Munich		15-19 March 2010
Incorporate comments	Incorporate comments	28 May 2010
Incorporate comments	Issue Draft 042	28 May 2010
Final OGF Editor Review	Final Editor review	June 2010	OGF
	final GFSG review	June 2010
	Issue Final specification	30 June 2010
Publish proposed recommendation		1 July 2010

Grid recommendation process		1 Jan - 1 April 2011

No	Action
074	SH: Proposal for parsing TLog
075	SH: rewrite empty sequences section

No	Action

045	20/05 AP: Speculative Parsing 27/05: Psuedo code has been circulated. Review for next call 03/06: Comments received and will be incorporated 09/06: Progress but not discussed 17/06: Discussed briefly 24/06: No Progress 01/07: No Progress 15/07: No progress. MB not happy with the way the algorithm is documented, need to find a better way. 29/07: No Progress 05/08: No Progress. Will document behaviour as a set of rules. 12/08: No Progress ... 16/09: no progress 30/09: AP distributed proposal and others commented. Brief discussion AP to incorporate update and reissue 07/10: Updated proposal was discussed.Comments will be incorporated into the next version. 14/10: Alan to update proposal to include array scenario where minOccurs > 0 21/10: Updated proposal reviewed 28/10: Updated proposal reviewed see minutes 04/11: Discussed semantics of disciminators on arrays. MB to produce examples 11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds are needed after all. MB and SF to continue with examples. 18/11: Went through WTX implementation of example. SF to gather more documentation about WTX discriminator rules. 25/11: Further discussion. Will get more WTX documentation. Need to confirm that no changes need to Resolving Uncertainty doc. 04/11: Further discussion about arrays. 09/12: Reviewed proposed discriminator semantic. 16/12: Reviewed discriminator examples and WTX semantic. 23/12: SF to provide better description of WTX behaviour and invite B Connolley to next call 06/01:B Connolly not available. SF to provide more complete description. 13/01: Stephaine took us through a description of WTX identifiers. Mike agreed to write up in DFDL terms. 20/01: Mike will write up
049	20/05 AP Built-in specification description and schemas 03/06: not discussed 24/06: No Progress 24/06: No Progress (hope to get these from test cases) 15/07: No progress. Once available, the examples in the spec should use the dfdl:defineFormat annotations they provide. ... 14/10: no progress 21/10: Discussed the real need for this being in the specification. It seemed that the main value is it define a schema location for downloading 'known' defaults from the web. 28/10: no progress 04/11: no progress 11/11: no update 18/11: no update 25/11: Agreed to try to produce for CSV and fixed formats 04/12: no update 09/12: no update 16/12: no update 23/12: no update 06/01: no progress. If there is no resource to complete this action it can be deferred 13/01:no progess 20/01: no progess
064	MB/SH Request WG presentation at OGF 28 25/11: Session requested 04/12: no update 09/12: no update 16/12: SH has changed request to a general session rather tha WG in the hope of attracting more people. 23/12: no update 06/01: not heard anything yet 13/01: no update 20/01: no update
066	Investigate format for defining test cases 25/11:IBM to see if it is possible to publish its test case format. 04/12: no update 09/12: no update 16/12: reminded dent to project manager 23/12: SH will send another reminder. 06/01: Another reminder will be sent 13/01: no update 20/01: no update
071	Semantics of length=0, nil handling and defaults. 23/12:SH no update 06/01: SH has started 13/01: SH proposal review. Minor updates to be made 20/01: Reviewed updated proposal. Need to agree on unparsing empty choices.
074	SH: Proposal for parsing TLog
075	SH: rewrite empty sequences section

No	Action
056	MB Resolve lengthUnits=bits including fillbytes 12/08: No Progress ... 28/10: no progress 04/11: MB to look at lengthUnits = bits 11/11: no update 18/11: no update 25/11: no update 04/12: no update. ALan will set up a separate call to progress this action. 09/12: no update. ALan will set up a separate call to progress this action. 16/12: MB, SH and AP had a separate call. MB to distribute proposal 23/12: Discussed proposal. MB will updated 06/01: V4 discussed and approved 13/01: Mike updated proposal. Closed
073	SH: Control of overpunching zoned positive sign 13/01: no update 20/01: Proposal agreed. Closed

No	Item	target version	status
005	Improvements on property descriptions		not started
011	How speculative parsing works (combining choice and variable-occurence - currently these are separate) (from action 045)		awaiting completion of actions 045
012	Reordering the properties discussion: move representation earlier, improve flow of topics		not started
036	Update dfdl schema with change properties	ongoing
038	Improve length section including bit handling	038	some improvement in 036
042	Mapping of the DFDL infoset to XDM	none	not required for V1 specification
069	ICU fractional seconds	038
070	Write DFDL primer
071	Write test cases.
072	it is a processing error if the number of occurrences in the data does not match the value of the expression or prefix	038
073	Rename dfdl:separatorPolicy="required" to "always".	038
074	- Last 'postFix' separator is not optional - Terminators are mandatory. - dfdl:documentFinalTerminatorCanBeMissing - dfdl:documentFinalSeparatorCanBeMissing (Action (70))	038
075	Remove occursCountKind="useAvailableSpace".	038
076	dfdl:documentRoot, will be defined that can only be on global elements. The DFDL spec does not have to define the format of parameters to the DFDL processor but will indicate that it must be possible to adresss any element. Agreed that ANY element within the schema cane be the starting point for parsing or unparsing. dfdl:documentRoot no longer required	038
077	'delimited' means the item is delimited by the item’s terminator (if specified) or an enclosing construct’s separator or end of the enclosing construct designated by its known length or its terminator. The definition of EndOfParent also needs improving.	038
078	document UPA checks	038
079	Restrictions on use of 'special' entities in regular expressions	038
080	LengthUnit=bits (A056)	038
081	Case of enumerations. We should follow the XSDL convention which is that enumerations are case sensitive	038
082	dfdl:lengthKind='Pattern: dfdl entities will be allowed in the lengthPattern regular expression except for generic entities such as WSP and NL	038
083	dfdl:lengthKind='Pattern scannability: A complex element with lengthKind=Pattern will use its dfdl:encoding property as the encoding when scanning its children irrespective of the child's encoding property.	038
084	Control of overpunching zoned positive sign	038