
So if every boundary value that includes an = has quotes around it, then the escape scheme solution is good. But if any does not include the quotes, then the escape scheme solution is not sufficient. Very helpful, guys, thank you. From: Mike Beckerle [mailto:mbeckerle.dfdl@gmail.com] Sent: Wednesday, June 19, 2013 10:04 AM To: Steve Hanson Cc: Garriss Jr., James P.; dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org Subject: Re: [DFDL-WG] clarification on when escape characters are needed I would state it this way: Your format requires that you can have an unescaped, unquoted appearance of the 'separator' which is '=' inside the data value of an element. That is inconsistent with separators. (Albeit there is debate about whether it should be ok for a known-statically-last element when there are only infix separators, but that is a subtlety that will make the format fragile if you depend on it. E.g., what if a subsequent field is added somehow?) On Wed, Jun 19, 2013 at 9:53 AM, Steve Hanson <smh@uk.ibm.com<mailto:smh@uk.ibm.com>> wrote: James Escape schemes work by specifying special character(s) that indicate that other characters in the data are not to be treated as delimiters. The line in question is: password=f82+=7&%q There's no quotes around the data part of it. That's why you can't use an escape scheme here. Your other example: boundary="----=_Part_150709_149622714.1370937621731" There's quotes around the data part of it. That's why you can use an escape scheme (that specifies quotes as start/end characters) and it works. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group<http://www.ogf.org/dfdl/> IBM SWG, Hursley, UK smh@uk.ibm.com<mailto:smh@uk.ibm.com> tel:+44-1962-815848<tel:%2B44-1962-815848> From: "Garriss Jr., James P." <jgarriss@mitre.org<mailto:jgarriss@mitre.org>> To: "dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org>" <dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org>>, Date: 19/06/2013 14:12 Subject: Re: [DFDL-WG] clarification on when escape characters are needed Sent by: dfdl-wg-bounces@ogf.org<mailto:dfdl-wg-bounces@ogf.org> ________________________________ This origin of this issue is the Content-Type header in email, where the parameters can be quoted, but sometimes are not: Content-Type: text/html; charset="UTF-8" Content-Type: text/html; charset=UTF-8 This was not a big deal until I ran into a parameter that included an = in the value: Content-Type: multipart/alternative; boundary="----=_Part_150709_149622714.1370937621731" When confronted with this issue, I was told:
there's a pretty simple fix: specify an escape scheme that says that anything inside quotes is not a delimiter. And fortunately your DefaultProperties.xsd file actually comes an escape scheme that does exactly that.
So all you have to do is add this:
dfdl:escapeSchemeRef="DefaultPropertiesEscapeScheme"
to this:
<xsd:element name="value" type="xsd:string" />
You may well recognize this scheme, as it's yours: <dfdl:defineEscapeScheme name="DefaultPropertiesEscapeScheme"> <dfdl:escapeScheme escapeBlockEnd=""" escapeBlockStart=""" escapeCharacter=""" escapeEscapeCharacter=""" escapeKind="escapeBlock" extraEscapedCharacters=", %#x0D; %#x0A;" generateEscapeBlock="whenNeeded" > </dfdl:escapeScheme> </dfdl:defineEscapeScheme> I used this solution for the parameters of the Content-Type header, which are key/value pairs. <xsd:sequence dfdl:separator="="> <!-- this init is a workaround for Daffodil 0.10 bug (see ContentType element above) --> <xsd:element name="key" dfdl:initiator="%WSP*;"> <xsd:annotation> <xsd:appinfo source="http://www.ogf.org/dfdl/dfdl-1.0/"> <dfdl:assert test="{ dfdl:checkConstraints(.) }" message="The parameter key must match one of the values on the enumerated list."/> </xsd:appinfo> </xsd:annotation> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="charset"/> <xsd:enumeration value="name"/> <xsd:enumeration value="boundary"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <!-- Daffodil 0.10.1 fails here if there's an = in the value. --> <xsd:element name="value" type="xsd:string" dfdl:escapeSchemeRef="DefaultPropertiesEscapeScheme"/> </xsd:sequence> Without the scheme, I get an error. With it, it works great. So is this an inappropriate use of an escape scheme? From: Steve Hanson [mailto:smh@uk.ibm.com] Sent: Wednesday, June 19, 2013 8:47 AM To: Garriss Jr., James P. Cc: dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org>; Mike Beckerle Subject: RE: [DFDL-WG] clarification on when escape characters are needed James I don't see how an escape scheme helps here. The "f82+=7&%q" is all data, there's no escape character. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group<http://www.ogf.org/dfdl/> IBM SWG, Hursley, UK smh@uk.ibm.com<mailto:smh@uk.ibm.com> tel:+44-1962-815848 From: "Garriss Jr., James P." <jgarriss@mitre.org<mailto:jgarriss@mitre.org>> To: Steve Hanson/UK/IBM@IBMGB, Mike Beckerle <mbeckerle.dfdl@gmail.com<mailto:mbeckerle.dfdl@gmail.com>>, Cc: "dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org>" <dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org>> Date: 19/06/2013 12:33 Subject: RE: [DFDL-WG] clarification on when escape characters are needed ________________________________
The DFDL 1.0 spec implies the behaviour where you get...
If this is the direction the WG goes, can you please make this explicit rather than implicit? Using Mike's excellent example below would go a long way to making the issue clear. As for a solution, would it not be better to use an escape scheme, like this? <sequence dfdl:separator="=" dfdl:separatorPosition="infix"> <element name="a" type="xs:string"/> <element name="b" type="xs:string" dfdl:escapeSchemeRef="DefaultPropertiesEscapeScheme"/> </sequence> (Cred to Taylor) If so, it would be helpful to include that in the example. From: dfdl-wg-bounces@ogf.org<mailto:dfdl-wg-bounces@ogf.org> [mailto:dfdl-wg-bounces@ogf.org] On Behalf Of Steve Hanson Sent: Wednesday, June 19, 2013 5:29 AM To: Mike Beckerle Cc: dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org> Subject: Re: [DFDL-WG] clarification on when escape characters are needed The DFDL 1.0 spec implies the behaviour where you get: <a>password</a> <b>f82+</b> followed by a processing error. There is no special casing of the last element in the group. Changing the model to the following achieves the desired infoset: <sequence dfdl:separator="=" dfdl:separatorPosition="infix"> <element name="a" type="xs:string"/> <sequence dfdl:separator=""> <element name="b" type="xs:string"/> </sequence> </sequence> Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group<http://www.ogf.org/dfdl/> IBM SWG, Hursley, UK smh@uk.ibm.com<mailto:smh@uk.ibm.com> tel:+44-1962-815848 From: Tim Kimber/UK/IBM@IBMGB To: dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org>, Date: 19/06/2013 09:37 Subject: Re: [DFDL-WG] clarification on when escape characters are needed Sent by: dfdl-wg-bounces@ogf.org<mailto:dfdl-wg-bounces@ogf.org> ________________________________ In the IBM implementation we have taken the view that the separator defines the format for all of the group's content. That means that all separators are counted as being significant, even if they occur within the content region of the final group member. I agree that other interpretations are possible - the MRM parser in earlier versions of WebSphere Message Broker takes an infix separator out of scope when it encounters the final declared child of a group. I intend to address this point when I write up the rules for matching string literals and delimiters. regards, Tim Kimber, DFDL Team, Hursley, UK Internet: kimbert@uk.ibm.com<mailto:kimbert@uk.ibm.com> Tel. 01962-816742 Internal tel. 37246742 From: Mike Beckerle <mbeckerle.dfdl@gmail.com<mailto:mbeckerle.dfdl@gmail.com>> To: dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org>, Date: 19/06/2013 03:52 Subject: [DFDL-WG] clarification on when escape characters are needed Sent by: dfdl-wg-bounces@ogf.org<mailto:dfdl-wg-bounces@ogf.org> ________________________________ Suppose I have a sequence. It has an infix separator which is "=". <sequence dfdl:separator="=" dfdl:separatorPosition="infix"> <element name="a" type="xs:string"/> <element name="b" type="xs:string"/> </sequence> Now, consider this data: password=f82+=7&%q I want <a>password</a> <b>f82+=7&%q</b> Notice how the b element contains an '=' which was not escaped in any way in the sequence. Element b is statically known to be last, the separator is infix; hence, things are unambiguous even if there is no escaping. However, there is an alternative interpretation, which is that the above data should fail, because it produces <a>password</a><b>f82+</b> but then does not find the expected stuff next. Rather it finds the '=7&%q' data. In other words, the sequence separator divides the sequence content into 3 content regions, but there aren't 3 things to consume those, so it is a processing error. Which is correct? -- Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com<http://www.tresys.com/> -- dfdl-wg mailing list dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org> https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU-- dfdl-wg mailing list dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org> https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU-- dfdl-wg mailing list dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org> https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org<mailto:dfdl-wg@ogf.org> https://www.ogf.org/mailman/listinfo/dfdl-wg -- Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com<http://www.tresys.com>