
Here is a list of things that are important to resolve for DFDL but which I don't recall seeing discussed or in a spec. 1) Rules for application of default values. XML Schema has rules for default value application when handling XML instance documents (they are different for elements and attributes). These rules could/should apply in some non-XML circumstances but are not applicable to others. I think we need to agree the rules that apply to different non-XML circumstances on both input and output. For example, - Fixed length data (eg, COBOL) - here the data must be present on input, default values could be added if missing on output - When a separator is present - this changes things as a double separator can occur - but does this indicate missing or empty? Customers use both semnatics, and Schema rules distinguish the two cases. - When an initiator is present - here missing is observably different from empty, so we could probably use Schema rules here. I have a draft proposal I am working on for this which I could share. 2) Properties - on object or on object inclusion. I didn't see anywhere in Mike's properties spec that said a property occurred on an element/attribute per se, or on its use in a structure. Eg, offset is clearly something that is only applicable to a local element or element reference, you would not put it on a global element. But some other properties are perhaps not so clear cut. 3) Mike's properties spec does not impose any restrictions on the use of different ways of identifying an element in the bitstream. Examples: - Optionality - are eg COBOL fixed length elements allowed to be optional - if so how can you identify one is missing. Here the IBM model mandates that maxOccurs must appear. - Unordered content - what should a DFDL parser do when faced with an xsd:all group - unless an initiator is present, should xsd:all be treated as an xsd: sequence when parsing? 4) Wildcards and 'self-defining' content - what are the rules that apply here? While this might seem unimportant if your starting point is a fixed file, in the messaging world this is frequently encountered - eg HL7 or X12 users will agree their own private extensions to the standard and add extra data, we must be able to model and parse/write this. 5) Truncation/omission rules when separators are involved - we have some extra options in the IBM model and several parsing rules, which we find necessary to cope with our user's CSV style messages. I should note at this point that the IBM model has the concept of 'separation type' which is a property of a group. It stipulates that all members of that group follow a certain pattern - examples are 'fixed length', 'separated', 'tagged and separated', 'use a regular expression'. We have found this a convenient way to define rules for default value, unordered content, wildcard and open content, etc, processing. These 'separation types' can be considered as specializations of a general case where the members of a group do not all follow a pattern. Clearly we need to define rules for the general case, but I think we should also consider whether such specializations are a useful addition to the DFDL model. I would observe that the majority of customers element groups to indeed follow one pattern or another (eg, COBOL - fixed length, CSV - separated). Apologies if any of these have been discussed already, eg at the f-2-f, or prior to GGF12, and I have missed it. If it is more convenient we could start up a series of discussion documents on the forum rather than the usual e-mail chain. I was certainly having trouble tracking the various mails about multi-dimensional arrays. Regards, Steve Steve Hanson WebSphere Business Integration Brokers, IBM Hursley, England Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848