Re: [DFDL-WG] Action 259 - Consider allowing more flexible escapeBlock schemes

WG call 3rd June: DFDL spec will change so that an escape block end does not have to be the last thing in the data (after trimming). It must always be present. A new erratum will be raised. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Steve Hanson/UK/IBM To: dfdl-wg@ogf.org, Date: 20/05/2014 17:50 Subject: Re: [DFDL-WG] Action 259 - Consider allowing more flexible escapeBlock schemes As discussed on the call, there is an import case that is not covered in the table, namely where quotes surround a delimiter but the opening quote is not at the start of the data. I imported the following text string into Excel: This is "," two separate fields And indeed two columns were created, meaning the comma was treated as a delimiter and not escaped. This matches DFDL so good. Interestingly, the first column was as expected... This is " ...but the second was not: two separate fields Notice the leading quote was removed without error, meaning that the absence of the closing quote is permitted! Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Tim Kimber/UK/IBM@IBMGB To: dfdl-wg@ogf.org, Date: 13/05/2014 15:37 Subject: Re: [DFDL-WG] Action 259 - Consider allowing more flexible escapeBlock schemes Sent by: dfdl-wg-bounces@ogf.org That looks fairly conclusive to me. DFDL should fall into line with established practice. regards, Tim Kimber, IBM Integration Bus Development (Industry Packs) Hursley, UK Internet: kimbert@uk.ibm.com Tel. 01962-816742 Internal tel. 37246742 From: Steve Hanson/UK/IBM@IBMGB To: dfdl-wg@ogf.org, Date: 13/05/2014 11:50 Subject: [DFDL-WG] Action 259 - Consider allowing more flexible escapeBlock schemes Sent by: dfdl-wg-bounces@ogf.org Action 259 was raised last call to decide what to do about the following, as minuted: Steve has an example of an escape block where the escape block end is not at the end of the un-trimmed data. This gives a processing error. Another IBM product accepts this usage. Should DFDL allow this? Or should there be a new escapeKind that allows escapeBlockStart/End anywhere? Tried importing these values from a CSV file into an Excel spreadsheet, a Symphony spreadsheet (ie, successor to 123), and also accessing them via ODBC using a Microsoft driver, to compare with IBM DFDL and IBM Cast Iron behaviour. Test Data IBM DFDL IBM Cast Iron MS Excel Lotus Symphony ODBC 1 This is normal This is normal This is normal This is normal This is normal This is normal 2 "This is OK" This is OK This is OK This is OK This is OK This is OK 3 "This| is expected" This| is expected This| is expected This| is expected This| is expected This| is expected 4 This too "is OK" This too "is OK" This too "is OK" This too "is OK" This too "is OK" This too 5 Even "this" is OK Even "this" is OK Even "this" is OK Even "this" is OK Even "this" is OK Even 6 "This" is NOT OK PARSE FAILED This is NOT OK This is NOT OK This is NOT OK This 7 "This"" is still OK" This" is still OK This" is still OK This" is still OK This" is still OK This" is still OK The data under discussion is 6. It looks like DFDL is out of step with the behaviour of Excel / Symphony spreadsheets, and Cast Iron has adopted that behaviour too. Out of interest I also checked the output behaviour from Excel. That escaped all instances of embedded quotes in the same way as DFDL, so no issues there. Regards Steve Hanson Architect, IBM DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (1)
-
Steve Hanson