
We might be able to retain xs:token for DFDL expression on the grounds that users can use the dfdl:decodeDFDLEntities() function if they need to use white space in XPath string literals. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 09/07/2013 14:52 ----- From: Steve Hanson/UK/IBM To: dfdl-wg@ogf.org, Date: 09/07/2013 12:02 Subject: [DFDL-WG] Action 205: whitespace in DFDL annotations For discussion on today's WG call. Action 205 was raised to ensure that DFDL 'property types' are declared with XML Schema types that provide the correct whitespace handling behaviour. The XML Schema types of the various DFDL 'property types' are given in Part 1 of IBM's Schemas-for-DFDL. The question boils down to whether a 'property type' should be an xs:string or xs:token. The former preserves whitespace, the latter normalizes and trims. (Note that xs:NMTOKEN is intended for attributes only, so should not be used for DFDL properties as they can be expressed in attribute or element forms.) My recommendation is; - Enumeration changed from xs:string to xs:token (reason: to match XSDL more closely and trim leading/trailing whitespace) - DFDL regular expression stays as xs:string (reason: regex may contain literal white space) - DFDL string literal changed from xs:string to xs:token (reason: currently inconsistent with List of DFDL string literal) - List of DFDL string literal stays as list of xs:token - DFDL expression changed from xs:token to xs:string (reason: XPath may contain non-ignorable whitespace) Further: - DFDL regular expression should not trim leading/trailing whitespace - DFDL expression should trim leading whitespace before { and trailing whitespace after } - The enum of DFDL property names should be based on xs:token The xs:unions for DFDL properties that can be two or more of the above may need the member ordering reviewed. Example: <xsd:simpleType name="BinaryFloatRepEnum_Or_DFDLExpression"> <xsd:union> <xsd:simpleType> <xsd:restriction base="dfdl:DFDLExpression" /> </xsd:simpleType> <xsd:simpleType> <xsd:restriction base="dfdl:BinaryFloatRepEnum"/> </xsd:simpleType> </xsd:union> </xsd:simpleType> Usually in a union, the most restrictive member is placed first. With the current types, the above has xs:token followed by xs:string, in accordance with this practice. But the recommendation changes the types of both members, so that the above becomes xs:string followed by xs:token. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Steve Hanson/UK/IBM To: Suman Kalia/Toronto/IBM@IBMCA, Cc: dfdl-wg@ogf.org, Mike Beckerle <mbeckerle.dfdl@gmail.com> Date: 27/03/2013 16:38 Subject: Re: [DFDL-WG] whitespace in DFDL annotations: right now regex is xs:string, expression is xs:token Suman Looking at the XML schema-for-schemas, and doing a test in the XSD editor in eclipse, XSD enumeration facets are modelled as xs:NMTOKEN and not xs:string, like DFDL enums. XSD is perfectly happy to strip/collapse white space. I think therefore that we should be doing the same for DFDL enum properties. I don't see any harm in this - an enum is a contiguous sequence of non-whitespace characters anyway, so any leading/trailing whitespace is harmless. Looks like XSD pattern facet is modelled as xs:string, preserving white space. We should do the same for DFDL regex properties. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Steve Hanson/UK/IBM To: Suman Kalia <kalia@ca.ibm.com>, Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org, Mike Beckerle <mbeckerle.dfdl@gmail.com> Date: 19/03/2013 17:41 Subject: Re: [DFDL-WG] whitespace in DFDL annotations: right now regex is xs:string, expression is xs:token The type called DFDLExpressionOrPatternOrNothing only makes sense for use in one place - the element value of an assert or discriminator. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Suman Kalia <kalia@ca.ibm.com> To: Mike Beckerle <mbeckerle.dfdl@gmail.com>, Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org Date: 19/03/2013 17:29 Subject: Re: [DFDL-WG] whitespace in DFDL annotations: right now regex is xs:string, expression is xs:token Sent by: dfdl-wg-bounces@ogf.org Mike - I am not sure but my gut feeling is that it would start with the most restrictive one first. i.e If empty string ( assuming it has length facet 1) - would match Nothing , then xsd:token which is restricted form of xs:string. I think you are going to get string with white spaces collpsed ( xsd:token) if it not empty string. You can run few tests to see the behavior.. Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 03/19/2013 12:43 PM Subject: [DFDL-WG] whitespace in DFDL annotations: right now regex is xs:string, expression is xs:token Sent by: dfdl-wg-bounces@ogf.org This came up on the call. The schemas I have for DFDL annotations have DFDLRegularExpression as an xs:string, and DFDLExpression as an xs:token. I have no clue what a union of these types behaves like. But we have a union called DFDLExpressionOrPatternOrNothing which is a 3-way union of DFDLExpression, DFDLRegularExpression, and EmptyString (which is also derived from xs:string but has length facet of 0 as well. -- Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU