Re: [DFDL-WG] Simplified Escape Scheme V3

Steve Thanks for the comment. Wording changes accepted. Can you discuss on this weeks WG call. I cannot attend unfortunately. We need to agree 1.Should data containing the escapeEscapeCharacater cause escaping to be used if if so how should it be escaped. 2.Should we only look for escapeStartString at the beginning of the data 3.Property names (everyone has their own favourite so lets just pick one.) Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 From: Steve Hanson/UK/IBM To: Alan Powell/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org Date: 19/04/2009 12:24 Subject: Re: [DFDL-WG] Simplified Escape Scheme V3 Alan Comments: - I think escapeBlockStart and escapeBlockEnd are better names, that way you can immediately see they are for use with escapeBlock. - escapeKind. Clarification to escapeBlock parsing behaviour. "On parsing the escapeStartString is removed from the beginning of the data and escapeEndString is removed from end of the data and any escapeEscapeCharacters are removed when they precede any other occurences of the escapeEndString in the data." - extraEscapedCharacters. Clarification: "A space separated list of single characters that must be escaped in addition to in-scope markup" - generateEscape. The behaviour when escapeKind = escapeCharacter and value is 'always' is not defined. I would prefer that: a) The descriptions of 'whenNeeded' behaviour are moved into the escapeKind property to keep all the rules in one place. b) generateEscape is renamed generateEscapeBlock and only applies to escapeKind = escapeBlock, as that is only when it has an effect. Regards Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 Alan Powell/UK/IBM@IBMGB Sent by: dfdl-wg-bounces@ogf.org 17/04/2009 15:22 To dfdl-wg@ogf.org cc Subject [DFDL-WG] Simplified Escape Scheme V3 Attached is the latest version of escape schemes. It includes Steve and Mike's comments (although not renaming properties), removed escapeBlock2 and added uses cases in section 5 which you might like to start with. The uses cases confirm that the syntax works with some minor clarifications but highlights two questions: 1. Should data containing the escapeEscapeCharacater cause escaping to be used if if so how should it be escaped. 2. Should we only look for escapeStartString at the beginning of the data. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU [attachment "ggf-dfdl-simplified-escape-scheme-v3.doc" deleted by Alan Powell/UK/IBM] -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

My views on these 3 weighty issues: 1.Should data containing the escapeEscapeCharacater cause escaping to be used if if so how should it be escaped. No. I think the EEC alone isn't an active character. it has to be followed by the EC to be interpreted at all. That said, if the pair EEC EC appears in the data, then yes, we must escape the EC, with another EEC, to avoid this being misinterpreted at read time. Resulting in EEC EEC EC in the final data stream that we output. When we read it, we get EEC (first EEC is not followed by an EC, so it is literal), second EEC is followed by EC, so we get a literal EC. Trick: if the EEC and EC are the same character, then you have to escape both of them, with themselves... er ah. so, taking "\" as an example, if "\" is in the data item, then we must output "\\", and if "\\" is in the data item, then we must output "\\\\" <file://\\ > (which for some reason microsoft outlook keeps removing my surrounding quotes from... must be some sort of escape sequence for them!) The rule is consistent though. The above "trick" isn't really a special case. Just apply the rule uniformly that if you find the EC, you must precede it by EEC for output. 2.Should we only look for escapeStartString at the beginning of the data I'd prefer that we respect them anywhere, but canonical form when generated is at the beginning of the data. However, if we want to be more restrictive/conservative for v1.0 I'm fine with that. 3.Property names (everyone has their own favourite so lets just pick one.) Don't care. (Recall - I wanted to call these things quoting schemes....) Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 From: Steve Hanson/UK/IBM To: Alan Powell/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org, dfdl-wg-bounces@ogf.org Date: 19/04/2009 12:24 Subject: Re: [DFDL-WG] Simplified Escape Scheme V3 _____ Alan Comments: - I think escapeBlockStart and escapeBlockEnd are better names, that way you can immediately see they are for use with escapeBlock. - escapeKind. Clarification to escapeBlock parsing behaviour. "On parsing the escapeStartString is removed from the beginning of the data and escapeEndString is removed from end of the data and any escapeEscapeCharacters are removed when they precede any other occurences of the escapeEndString in the data." - extraEscapedCharacters. Clarification: "A space separated list of single characters that must be escaped in addition to in-scope markup" - generateEscape. The behaviour when escapeKind = escapeCharacter and value is 'always' is not defined. I would prefer that: a) The descriptions of 'whenNeeded' behaviour are moved into the escapeKind property to keep all the rules in one place. b) generateEscape is renamed generateEscapeBlock and only applies to escapeKind = escapeBlock, as that is only when it has an effect. Regards Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 Alan Powell/UK/IBM@IBMGB Sent by: dfdl-wg-bounces@ogf.org 17/04/2009 15:22 To dfdl-wg@ogf.org cc Subject [DFDL-WG] Simplified Escape Scheme V3 Attached is the latest version of escape schemes. It includes Steve and Mike's comments (although not renaming properties), removed escapeBlock2 and added uses cases in section 5 which you might like to start with. The uses cases confirm that the syntax works with some minor clarifications but highlights two questions: 1. Should data containing the escapeEscapeCharacater cause escaping to be used if if so how should it be escaped. 2. Should we only look for escapeStartString at the beginning of the data. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 _____ Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU [attachment "ggf-dfdl-simplified-escape-scheme-v3.doc" deleted by Alan Powell/UK/IBM] -- dfdl-wg mailing list dfdl-wg@ogf.org <http://www.ogf.org/mailman/listinfo/dfdl-wg> http://www.ogf.org/mailman/listinfo/dfdl-wg _____ Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Alan Powell
-
Mike Beckerle