New subject: Infoset codepage

29 Apr 2009

      Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, April-29-2009

Meeting opened, 14:00 UK

Attendees
Steve Hanson (IBM)
Mike Beckerle (Oco)
Suman Kalia (IBM)
Alan Powell (IBM)

Apologies
Dave Glick (drac)

Agenda:

1. Go through actions. 
Actions updated below

2. LengthKind on Sequences and choices. 

LengthKind on sequences and choices and their parent element has proved 
confusing to new users of DFDL. It is proposed that lengthKind is removed 
from groups and only allow it to be set on parent element. See email from 
SH

Not discussed. Action raised. Please review SH email and make comments 
before next call 

3. Discuss UnorderedInitated email from SH 

Not discussed. Action raised. Please review SH email and make comments 
before next call 

4. Infoset codepage and encoding 

The spec does not say what codepage and encoding is used for string 
fields. 

5. AOB 
Next version (034) 

6. Next call 6 May 14:00 UK

Meeting closed, 15:10 UK

Actions raised at this meeting

No
Action 
040
SH: LengthKind on complex objects. 
29/04: All send comment before next call
041
AP: UnorderedInitiated
29/04: All: Review for next call
042
MB: Complete variable specification.
To include how properties such as encoding can be set externally. Must be 
a known variable name.

Current Actions:
No
Action 
012
AP/SH: Update decimalCalendarScheme
10/9: Not allocated yet
17/9: No update
24/9: Add calendar binary formats to actions
22/10: No progress
16/1: proposal distributed and discussed. Will be redistributed
21/1: add locale, 
04/02: changed from locale to specific properties
18/2: Need more investigation of ICU strict/lax behaviour.
08/04: Not discussed
22/04: AP to complete asap once the ICU strict/lax behaviour is 
understood. 
29/04: No progress
020
SH: Resolve packedDecimalSignCodes behaviour depends on NumberCheckPolicy 
22/10: No progress
10/12: added how to decide to overpunch and sign position
11/02: proposal largely agreed. SH to make minor changes
18/02: AP to document unsigned type behaviour
25/02: no progress
08/04: Not discussed
22/04: SH to complete last remaining issue, which is the behaviour when 
logical type is signed/unsigned and the physical type is unsigned/signed.
29/04: SH had identified a problem with definition values and types in the 
infoset and will email proposal.  DG to be asked to accelerate action 032 
to see if helps
024
<No owner> String XML type
08/04: Not discussed
22/04: Need to allocate owner. Work is to describe the semantics of using 
dfdl:representation="xml" to model a well-formed XML fragment in an 
overall non-XML document described by a DFDL schema.
29/04: As no resource availbel to progress this action agreed to defer 
from V1. Will close next week if no objections

026
SH: Envelopes and Payloads
08/04: Not discussed explicity, but recursive use of DFDL is tied up with 
this
22/04: Two aspects. Firstly compositional - do sufficient mechanisms exist 
to model an envelope with a payload that varies. Secondly markup syntax - 
this might be defined in the envelope. 
The second of these is very much tied up with the variable markup action 
028, so will be considered there. SH to verify the composition aspect.
29/04: SH and AP working on proposal. related to Action 028
027
SH: Property precedence tables
08/04: Not discussed
22/04: Two things missing from the existing precedence trees. Firstly, 
does not show alternates (eg, initiator v initiatorkind). Secondly, need a 
tree per concrete DFDL object (eg, element). SH to update.
29/04: No progress
028
SH: Variable markup 
08/04: Discussed briefly at end of call, IBM to see whether there any use 
cases that require recursive use of DFDL.
15/04: Use case was distributed and will be discussed on next call.
22/04: The use case in question is EDI where the terminating markup for 
the payload segments is defined in the ISA envelope segment. The markup is 
modelled as an element of simple type where the allowable markup values 
are defined as enums on the type. But we need to handle two cases - 
firstly where the envelope is present, so the value used by the payload is 
taken from the envelope. Secondly where only the payload is present. Here 
we need a way of scanning for all the enum values, and adopting the one we 
actually find, when parsing. And using a default when unparsing. SH to 
explore use of a DFDL variable, where the variable has a default, but also 
has a type that is the same as the markup element - that way we get to use 
the enums without defining everything twice.
29/04: SH and AP working on proposal.
029
MB: valueCalc (output length calculation)
08/04: Not discussed
22/04: Action allocated to MB, this is to complete the work started at the 
Hursley WG F2F meeting.
29/04: No progress
032
DG: Investigate compatibility between DFDL infoset and XDM
08/04: No update
22/04: No update
29/04: No update
033
AP/TK: Assert/Discriminator semantics. AP to document. TK to check uses of 
discriminator besides choice.
08/04: In progress within IBM
22/04: Waiting for TK to return from leave to complete. 
29/04: TK has sent examples shown need for discriminators beyond choice. 
Agreed. MB to respond to TK 
036
SH: Provide use case for floating component in a sequence
08/04: Raised
15/04: Use case sent and discussed. SH to do further investigation
22/04: IBM feedback from WTX team is that alternate suggested ways of 
modelling the EDI floating NTE segment have significant usability issues. 
The DFDL principle is that for a problem that can be expressed as 
two-layered, then two DFDL models are needed.  The EDI NTE segment does 
not fall into this though, as its use is on a per sequence basis. Ongoing. 

29/04: Agreed that need to be in V1. SH to make a proposal
037
All: Approach for XML Schema 1.0 UPA checks.
22/04: Several non-XML models, when expressed in their most obvious DFDL 
Schema form, would fail XML Schema 1.0 Unique Particle Attribution checks 
that police model ambiguity.  And even re-jigging the model sometimes 
fails to fix this. Note this is equally applicable to XMl Schema 1.1 and 
1.0. While the DFDL parser/unparser can happily resolve the ambiguities, 
the issue is one of definition. If an XSD editor that implements UPA 
checks is used to create DFDL Schema, then errors will be flagged. DFDL 
may have to adopt the position that: 
a)DFDL parser/unparser will not implement some/all UPA checks (exact 
checks tbd)
b) XML Schema editors that implement UPA checks will not be suitable for 
all DFDL models
c) If DFDL annotations are removed, the resulting pure XSD will not always 
be valid (ie, the equivalent XML is ambiguous and can't be modelled by XML 
Schema 1.0)
Ongoing in case another solution can be found.
29/04: Will ask DG and S Gao for oppinion before closing
038
MB: Submit response to OMG RFI for non-XML standardization
22/04: First step is for MB to mail the OGF Data Area chair to say that we 
want to submit
29/04: MB has been in contact with OMG and will sunbit dfdl.
039
SKK: Approach for creating Schema-For-DFDL xsds. 
22/04: Resolve issue around multiple declarations needed for DFDL 
properties, perhaps using MB's meta approach
29/04: Don't like qualified attributes in long form. SKK to check there 
are no code gen implications, eg EMF.

Closed actions:

025
AP: Escape schemes 
21/1: discussed requirements
04/02: AP/SH to describe behaviour for known length text fields. Need to 
discuss if comment escapes should be supported.
11/02 new draft distributed:
18/02: SH up document concerns
25/02: SH and AP have refined proposal ready for approval.
04/03: SH and AP have further refined proposal.
11/03: discussed. suggested a simplified proposal be evaluated.
18/03: SH and AP had further discussions on simplified proposal
08/04: See minutes, review in detail for next call 
15/04: See minutes, review for next call 
22/04: MB mailed answers to the mailing list in response to AP's last few 
questions. Following agreed:
1.Should data containing the escapeEscapeCharacater cause escaping to be 
used if if so how should it be escaped.  
EEC alone isn't an active character. it has to be followed by the EC to be 
interpreted at all. That said, if the pair EEC EC appears in the data, 
then yes, we must escape the EC, with another EEC. 
2.Should we only look for escapeStartString at the beginning of the data  
Yes, we will be restrictive/conservative for v1.0
3.Property names (everyone has their own favourite so lets just pick one.) 

Only changes areescapeBlockStart and escapeBlockEnd. 
AP to incorporate the agreed scheme into draft 0.34.
29/04: closed. Moved to workitems
034
AP: Remove redundant properties, correct old examples
08/04: No update
22/04: In progress as part of draft 0.34. 
29/04: closed. Moved to work item
035
AP: Add validation ranges to spec, update specialized annotations in spec.
08/04: Raised. For draft 0.34
22/04: In progress as part of draft 0.34. 
29/04: closed. Moved to work item

Work items:
No
Item
target version
status
001
String XML type (Ian P) - Apr 30, 2008 

002
Escape schemes (Ian P) - Apr 30, 2008 
034

003
Variables - ??, 2008 (Mike) 

005
Improvements on property descriptions - ??, 2008 (All - split TBD) 

006
Envelopes and Payloads (Steve) - Apr 30, 2008

007
(from draft 32) valueCalc (Mike) - ??, 2008  

mostly
complete
008
(from draft 32) Property precedence for writing (Steve) - 

under review
009
(from draft 32) Variable markup (Steve) - Mar 31, 2008  

proposal needs writing up
010
(from draft 32) Assertions, discriminators and choice, including 
discussion of timing option (Suman) - Mar 31, 2008   (A033)
034
in progress 
011
(from draft 32) How speculative parsing works (combining choice and 
variable-occurence - currently these are separate) ??, 2008 (IBM) 

 in progress 
012
(from draft 32) Reordering the properties discussion: move representation 
earlier, improve flow of topics ??, 2008 (Alan) 

not started 
025
Augmented infoset and unparsing (Alan) 
034
added but needs work
026
 Remove duration
034

027
Calendar schemes
034

028
Validation ranges (A035)
034

029
Decimals (A020) - document unsigned type behaviour - 
packedDecimalSignCodes behaviour depends on NumberCheckPolicy 
034

030
Remove redundant properties, fix examples. (A036)
034

031
Specialized annotations
034

032
Floating components

033
Specialized annotations
034

Alan Powell

 MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
 Notes Id: Alan Powell/UK/IBM     email: alan_powell@uk.ibm.com 
 Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Minutes for OGF DFDL Working Group Call, April-29-2009

Alan Powell

Mike Beckerle

Steve Hanson

DFDL

Alan Powell

Mike Beckerle

tags

participants (4)