I've thought further on this. I think any subsets need to be fairly wide ranging and independent, in order for implementations (runtime and tooling) to be able to sensibly offer support. If it becomes too fragmented then users will find it difficult to know what construct may be used when. I realise that this means it takes more effort to implement a subset in terms of content, but I think it will be easier to understand how to do it. Accordingly I've revised the strawman. Note that I have introduced choices and unordered sequences at the same point as initiators, as that provides a way of resolving uncertainty without speculation, which is introduced under an advanced expression subset.

Regards

Steve Hanson
Strategy, Common Transformation & DFDL
Co-Chair, OGF DFDL WG
IBM SWG, Hursley, UK,
smh@uk.ibm.com,
tel +44-(0)1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 04/08/2010 13:50 -----

From:	Steve Hanson/UK/IBM
To:	Suman Kalia <kalia@ca.ibm.com>
Cc:	dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org
Date:	07/07/2010 18:34
Subject:	Re: [DFDL-WG] Subsetting the DFDL spec

Hi Suman

I added hidden elements as it allows things to be omitted from the infoset, which is a very useful technique. I removed it from the expression subset because you only need hidden + expressions when for example using a hidden complex element to return a synthesised simple value. Hidden on its own just to skip things is useful, easy to implement, and does no harm in core.

Defaults are a core capability. Otherwise you can't create a sparse infoset on output. If we can separate out nils then perhaps nils could be in a separate subset. I started off that way then changed my mind but we can revisit.

I originally had choices in core but I removed it because without initiators or expressions how can you resolve a choice? You can't. Choices are not as common as you might think in the non-XML world, for precisely this reason. However, as I write I've realised that I've not allocated uncertainty (ie, choice or 'optionality') to any of the subsets, a major omission on my part. I was intending core to be fixed occurrences thereby avoiding the need to implement backtracking, a significant item in any implementation. I'll think more on this.

My rationale for omitting delimiters from core was to keep core for fixed length data. Many scientific users will never need delimiter support - and they are the folk most likely from OGF to write an implementation. Once you add in separators you pull in a huge amount of implementation - all the scanning, escaping, etc. However, the uncertainty issue could well force us to split initiators from the other delimiters because of their role in uncertainty resolution.

Thanks for your input though, I'll have a think and send out an update before next call.

Regards

Steve Hanson
Strategy, Common Transformation & DFDL
Co-Chair, OGF DFDL WG
IBM SWG, Hursley, UK,
smh@uk.ibm.com,
tel +44-(0)1962-815848

From:	Suman Kalia <kalia@ca.ibm.com>
To:	Steve Hanson/UK/IBM@IBMGB
Cc:	dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org
Date:	07/07/2010 17:50
Subject:	Re: [DFDL-WG] Subsetting the DFDL spec

Steve - some comments

I suggest we create a category DFDL Advanced features and put support for hidden elements under this as not many users would need it or implement it. One can also make the case for putting "Nils and defaults" under the DFDL advanced features as this is one of the complex part of the specification.

Core - should have support for choice construct as this is the most common building block. I would like to see support for delimited data; the basic and most widely used form is comma separated records which would require lenghtKind=delimited and separators to be moved to core specification..

Suman Kalia
IBM Toronto Lab
WebSphere Message Broker Toolkit Architect and Development Lead
WebSphere Business Integration Application Connectivity Tools

http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html

Tel : 905-413-3923 T/L 969-3923
Fax : 905-413-4850 T/L 969-4850
Internet ID : kalia@ca.ibm.com

From: Steve Hanson <smh@uk.ibm.com>
To: dfdl-wg@ogf.org
Date: 07/07/2010 09:47 AM
Subject: [DFDL-WG] Subsetting the DFDL spec
Sent by: dfdl-wg-bounces@ogf.org

Some thoughts about subsetting the DFDL spec to make it more consumable for readers and implementors.

We need to decide how the use of a subset is indicated in a DFDL xsd. It can be implicit by the properties referenced, or explicit up front. The difference is best illustrated by an example. Let's say Bidi support is a subset and I don't want to use Bidi. If using the implicit method, then I still need the dfdl:textBidi property to be set to 'no' even when in subset mode because the same xsd could be used by a full DFDL processor and it will expect a value. If using explicit, then I don't need to set the dfdl:textBidi property at all, because the DFDL processor will never look for it unless the xsd is switched to include that subset.

Here's a straw man for some subsets.

Regards

Steve Hanson
Strategy, Common Transformation & DFDL
Co-Chair, OGF DFDL WG
IBM SWG, Hursley, UK,
smh@uk.ibm.com,
tel +44-(0)1962-815848

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-- dfdl-wg mailing list dfdl-wg@ogf.orghttp://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

#### Subset_proposal_v1.ppt moved to MyAttachments Repository V3.8 (Link) on 13 July 2010 by Steve Hanson.

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU