
Use this function when the value of a DFDL property is obtained from
and the values extracted from the data stream could contain '%' or space characters. If the data already contains DFDL entities, this function should not be used. SKK -- The last statement worries me from usability perspective. User will have to check for presence of entities in the stream before calling
Suman, If you think about the intended usage of this function, a) do you really think that a data stream will contain DFDL entities, and b) in the unlikely event that there is, do you think that the modeller will not know in advance whether a particular dynamic delimiter is using DFDL entity syntax? I think you are trying to solve a problem that does not exist. If you have a different use case in mind, then please can you spell it out, because I'm not seeing it. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Suman Kalia <kalia@ca.ibm.com> To: Steve Hanson/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org Date: 19/06/2012 02:56 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor Sent by: dfdl-wg-bounces@ogf.org I am on course and cannot attend the call , here are my comments.. the data stream using an expression, and the type of the property is DFDL String Literal or List of DFDL String Literals this function and is going to add unnecessary conditional logic in user's code. I can understand the performance argument but that is just an implementation issue which can be made efficient.. If we still want to go with this, then I want to request a function that returns true/false if the source string contains the DFDL entities; As a user I should not have to write such function. Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Steve Hanson <smh@uk.ibm.com> To: dfdl-wg@ogf.org, Date: 06/18/2012 01:28 PM Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor Sent by: dfdl-wg-bounces@ogf.org Here's an updated signature. Let's approve on tomorrow's call, plus need to decide whether DFDL's function names should be camel case as per spec, or hyphenated like XPath's. dfdl:stringLiteralFromString ($arg) Returns a DFDL string literal constructed from the $arg string argument. If $arg contains any '%' and/or space characters, then the return value replaces each '%' with '%%' and each space with '%SP;', otherwise $arg is returned unchanged. Use this function when the value of a DFDL property is obtained from the data stream using an expression, and the type of the property is DFDL String Literal or List of DFDL String Literals, and the values extracted from the data stream could contain '%' or space characters. If the data already contains DFDL entities, this function should not be used. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org, Suman Kalia <kalia@ca.ibm.com> Date: 18/06/2012 17:00 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor I am convinced by this. The right behavior is *not* idempotency. On Tue, Jun 12, 2012 at 12:09 PM, Steve Hanson <smh@uk.ibm.com> wrote: If I had %% in the data, it's far more likely that this is real data and not an already-escaped DFDL string literal. So it should be returned as %%%%. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson/UK/IBM@IBMGB Cc: Suman Kalia <kalia@ca.ibm.com>, dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org Date: 12/06/2012 13:32 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor This thread is so long now that i've lost track of whose comment is last. These kinds of quoting functions have always bothered me because programmers are never sure whether they need to call it or not i.e., maybe it's already been called, etc. So, making the function idempotent, i.e., detects well-formed entities and already-doubled percents, and leaves them alone,... is helpful and prevents the "you escaped everything twice" mistake. ...mike On Mon, Jun 11, 2012 at 10:36 AM, Steve Hanson <smh@uk.ibm.com> wrote: Marginally and dubiously...but at the expense of performance and complexity of implementation. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Suman Kalia <kalia@ca.ibm.com> To: Steve Hanson/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org Date: 11/06/2012 15:14 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor Sent by: dfdl-wg-bounces@ogf.org We should make this enhancement, it improves usability of the function.. Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Steve Hanson <smh@uk.ibm.com> To: Suman Kalia/Toronto/IBM@IBMCA, Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org, Tim Kimber < KIMBERT@uk.ibm.com> Date: 06/11/2012 08:47 AM Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor I see what you are saying. The function should be able to detect if a well-formed DFDL entity is present in the data and leave it alone. Well, yes, it could do. But in practical terms that will make no difference as no real world format will contain DFDL entities in the data. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Suman Kalia <kalia@ca.ibm.com> To: Steve Hanson/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org, Tim Kimber/UK/IBM@IBMGB Date: 11/06/2012 13:14 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor Use this function when the value of a DFDL delimiter property (initiator, terminator, separator) is obtained from the data stream using an expression, and the data might contain '%' or space characters. Do not use if the input string already contains DFDL entities.
SKK - Tim I think the function needs to be enhanced. User should not have to scan the string to determine if it contains % or space characters etc before calling this function.. If the input string contains DFDL entities then they should be left unchanged..
Suman Kalia IBM Canada Lab WMB Toolkit Architect and Development Lead Tel: 905-413-3923 T/L 313-3923 Email: kalia@ca.ibm.com For info on Message broker http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.ht... From: Steve Hanson <smh@uk.ibm.com> To: Tim Kimber <KIMBERT@uk.ibm.com>, Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org Date: 06/11/2012 06:33 AM Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor Sent by: dfdl-wg-bounces@ogf.org Here's an attempt at putting Tim's description in the form used in the spec. Changed the name to drop the 'DFDL'. dfdl:stringLiteralFromString ($arg) Returns a DFDL string literal constructed from the $arg string argument. If $arg contains any '%' and/or space characters, then the return value replaces each '%' with '%%' and each space with '%SP;', otherwise $arg is returned unchanged. Use this function when the value of a DFDL delimiter property (initiator, terminator, separator) is obtained from the data stream using an expression, and the data might contain '%' or space characters. Do not use if the input string already contains DFDL entities. Also, I have just noticed that our DFDL function names are not in keeping with the style of XPath's own function names. XPath uses a hyphen to link words, instead of camel case. Should we change DFDL function names to match? Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Tim Kimber/UK/IBM@IBMGB To: Steve Hanson/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org Date: 25/05/2012 10:59 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor Sent by: dfdl-wg-bounces@ogf.org Proposed function signature : /** * Given a string, returns a DFDL String literal that matches that string. * Intended to be used when a delimiter ( initiator, terminator, separator ) has been extracted from the data stream, and the value might contain * the % or space character. * Each occurrenceof '%' will be replaced by '%%' * Each space character will be replaced by '%SP;' * Do not use if the input string already contains DFDL entities. */ String DFDLStringLiteralFromString( String delimiter ) Note that my proposed description of behaviour omits any mention of %ES;. because it is allowed only in the nilValue property, but that property cannot be set via a DFDL expression. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 246742 From: Steve Hanson/UK/IBM@IBMGB To: dfdl-wg@ogf.org Date: 25/05/2012 07:07 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor Sent by: dfdl-wg-bounces@ogf.org Agreed today that such a function is needed, errata taken. Proposals for function name welcome. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Steve Hanson/UK/IBM To: Mike Beckerle <mbeckerle.dfdl@gmail.com> Cc: dfdl-wg@ogf.org Date: 20/04/2012 09:04 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor I think what we need is a function that takes a string and returns the equivalent DFDL string literal. It looks for the characters that need cause problems (%, space) and replaces them (with '%%' and '%SP;' respectively), and if the string is the empty string replaces it with '%ES;'. I think it's just those that are problematic, as all other characters will be interpreted correctly (the entity syntax is primarily just for convenience of data entry). Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org Date: 19/04/2012 14:27 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re: String literals - various usage patterns thereof Ok, these are very good reasons. So we need the function(s) in our xpath library to make doing this same substitution easy (e.g., the replace function discussed elsewhere in this thread would do it I believe because all we have to do is replace "%" with "%%".) On Thu, Apr 19, 2012 at 9:20 AM, Steve Hanson <smh@uk.ibm.com> wrote: The reason that % needs escaping is that most entities start with just % - eg %NUL; - and it means we have simple rules - you want to use a literal % then escape it. and if you use just % then we expect to see an entity next. If we don't do it this way, then we will not be able to extend the list of entities in the future without breaking existing expressions, and we won't detect the very common error of leaving off the trailing semi-colon by mistake.. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson/UK/IBM@IBMGB Cc: Tim Kimber/UK/IBM@IBMGB, dfdl-wg@ogf.org Date: 19/04/2012 14:01 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re: String literals - various usage patterns thereof I don't think % by itself requires any escaping. There is only a need for escaping when the characters after the % match the syntax for one of our entities. I don't expect Dfdl:terminator="%done%" to require any escaping. On Apr 19, 2012 7:16 AM, "Steve Hanson" <smh@uk.ibm.com> wrote: Tim, thinking some more on this: - A DFDL expression is sometimes allowed to *return* a DFDL String Literal. In this case, the returned value is an xs:string that conforms to the DFDL String Literal syntax That is indeed how properties like initiator, terminator and separator behave today. But there is a problem. Let's say I have dynamically defined a separator at the start of my data. The value in the data is %. My dfdl:separator expression therefore returns %. That will give an error as a badly formed DFDL entity. DFDL string literal rules say that you must use %% to represent a single % character. The expression itself can work around this by checking for % and if so substituting %%, but that's a bit unfriendly especially as fn:replace() is not in the DFDL XPath subset - I think this is because it comes under this category http://www.w3.org/TR/xpath-functions/#string.match and not http://www.w3.org/TR/xpath-functions/#substring.functions. Perhaps we should include fn:replace() or provide a DFDL function that handles %? I started wondering whether these properties' expression should return a list of String. I can envisage no format that has in its data a value that contains DFDL entity syntax and intends it to mean a DFDL entity! That is too contrived to be real. However I can certainly envisage an expression like this: dfdl:separator = "{if ../version eq 1 then %CR;%LF; else %LF;}" So I think String is not sufficient and DFDL String Literal must be allowed. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Tim Kimber/UK/IBM To: Steve Hanson/UK/IBM@IBMGB Cc: Mike Beckerle <mbeckerle.dfdl@gmail.com>, dfdl-wg@ogf.org Date: 19/04/2012 11:32 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re: String literals - various usage patterns thereof I agree with all of that. One refinement, though. I don't think it's necessary to *require* an implementation to auto-cast the result of a DFDL Expression into the target type. If an implementation wants to be picky about the return type AND issue a clearly-worded Schema Definition Error stating what the problem is then I think we should allow it. Arguably, this would reduce the portability of DFDL schemas, but there is precedent for defining a portable subset of a language while allowing conveniences for users who don't need portability ( e.g. ANSI 'C' ). We already take that line for the regular expression syntax, so there is precedent for this in the DFDL specification too. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 246742 From: Steve Hanson/UK/IBM To: Tim Kimber/UK/IBM Cc: Mike Beckerle <mbeckerle.dfdl@gmail.com>, dfdl-wg@ogf.org Date: 19/04/2012 11:04 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re: String literals - various usage patterns thereof Hi Tim Firstly, both your bulleted assertions are correct, but your conclusion is not. Secondly, let me flesh out my earlier reply about constructor functions. The last paragraph of Section 23 says: "The result of evaluating the expression must be a single atomic value of the type expected by the context, and it is a schema definition error otherwise". This is where the XPath constructors come into play. Eg: <element name="myHexBin" type="xs:hexBinary" dfdl:inputValueCalc="{ xs:hexBinary (...) }"/> These xs: constructors, plus the special fn:dateTime() constructor that DFDL adds, allow the correct types to be created. Note that you don't always need the constructors. An expression that returns a quoted value is returning an XPath string literal so that is automatically xs:string. An expression that returns an unquoted number is returning an XPath number literal so that can be xs:decimal, xs:integer, xs:double (depends whether the number contains a '.' or 'e' or 'E'). This is described here: http://www.w3.org/TR/xpath20/#id-literals So simply returning the literal 'DEADBEEF' will return an xs:string and if the context requires xs:hexBinary that is a schema definition error according to DFDL spec. A clarification is worth while though. Take the following expression: {if ../type eq 'A' then 10000 else 20000}. That returns xs:integer. - What if my context was xs:decimal? xs:integer is a restriction of xs:decimal so the value will always be in range, so is that 'auto-cast' allowed? - What if my context was xs:long or another restriction of xs:integer? The value may or may not be in range, so is that 'auto-cast' iff value in range? I think that we should auto-cast when type restrictoions are involved, and clarify that in the spec. We *could* change the spec to say that the result of the expression is always automatically cast to the type expected by the context. That takes some of the burden off the modeler and makes it much more likely that expressions written by XPath novices will return the correct results. But it could also hide accidental errors. Note this proposal I shall call (d) as it not the same as Mike's (c). If we made this change, then returning the literal 'DEADBEEF' for xs:hexBinary would succeed. I don't think it affects the desire for expressions to be statically type checkable - because it is known whether type X can be cast to type Y, so a cast mismatch can be statically detected. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Tim Kimber/UK/IBM To: Mike Beckerle <mbeckerle.dfdl@gmail.com> Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org, Steve Hanson/UK/IBM@IBMGB Date: 19/04/2012 09:35 Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re: String literals - various usage patterns thereof I'm pretty sure that the rules are: - DFDL expressions must not *contain* DFDL String Literals. They must be valid XPath 2.0 expressions except that the list of allowable function names includes the DFDL extension functions. - A DFDL expression is sometimes allowed to *return* a DFDL String Literal. In this case, the returned value is an xs:string that conforms to the DFDL String Literal syntax. But that does not apply to your example because the dfdl:inputValueCalc must return a value ( an XML value ) that is valid for the type of the element. I think that corresponds to your answer a) ; 'DEADBEEF' is a valid xs:hexBinary lexical value. regards, Tim Kimber, Common Transformation Team, Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 246742 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: Steve Hanson/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org Date: 19/04/2012 07:42 Subject: [DFDL-WG] String literal syntax for hexBinary ?? - Re: String literals - various usage patterns thereof Sent by: dfdl-wg-bounces@ogf.org What is the DFDL string literal syntax for a hexBinary type value? E.g., I want a hex binary whose value is the 4 bytes described by this hex: DE AD BE EF. <element name="myHexBin" type="xs:hexBinary" dfdl:inputValueCalc="{ ... }"/> So, what can one syntactically put, for literal constant values, in the input value calculation expression? Note that this is legal pure (non-DFDL) XSD (I think) <element name="aHexBin" type="xs:hexBinary" fixed="DeadBeef"/> That is, the fixed/default are allowed and one specifies these values as just strings of hex digits. Notice no special escaping or anything. You just use a string that just so happens to contain hex digits. I think there are three possibilites (a) we allow "DEADBEEF" i.e., because the type of the expression is hexBinary, a string is cast to hexBinary by interpreting it as hex nibbles. (b) we require a special kind of string literal - a bytes-only string literal, so for example: "%#rDE;%#rAD;%#rBE;%#rEF;" is the way you create 4 bytes. If you just put characters, then that's a processing error - like a cast failure. Only raw-bytes allowed. (c) Anything you return from the expression is converted to a hexBinary by unparsing it to bytes (using current properties), then using the bytes as the hexBinary data. So you could have an expression that returns a double, and that would create 8 bytes if representation="binary". In this case the decimal number 3735928559 (hex 0xdeadbeef) as a binary bigEndian int would produce the 4 bytes I want. -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU