Re: [DFDL-WG] how to trim inside of escape block?

I don't think there is a way to achieve what you want. As you say, trimming pad chars takes precedence over applying escape scheme. I wondered if you could define the escapeBlockStart and End as "%WSP*; and %WSP*;" respectively but the white space entities are not allowed as escape character or in escape block start/end. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org> Date: 22/11/2017 01:28 Subject: [DFDL-WG] how to trim inside of escape block? Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> I have a CSV file Some lines look like this a,b," started with spaces, appearing right after the escape block start ",c,d,e I reviewed the spec, and I see that pad characters appear outside of the quotation marks (escape block start/end). What I'm trying to do is remove the whitespace after the escape block start, and before the escape block end. This is just spurious whitespace, appears because some of these CSV files were edited by people. In my data the quoting characters are not always present. They are only there if a comma appears in the data string. Is there a technique for getting rid of the leading/trailing whitespace inside the escape block start/end that I have forgotten? ...mikeb Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ogf.org_mailman_listinfo_dfdl-2Dwg&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=AJa9ThEymJXYnOqu84mJuw&m=vfzt-MyHajT591zYQmbcxckPT-mZLjNRPlTrg8kgRgY&s=6PDI_r_U7OUsqAxzv24ZiCuH5zPpWFyzXbneqH1GPXk&e= Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Another related problem: a, b, notList, c, d a, b, "list1, list2, list3",c,d Here the 3rd field is a list, comma separated. Quoted if there is more than one list item. I think to parse this I have to treat the quotation marks as initiator/terminator, and set dfdl:separator="", but since the quotes are optional for the single-list-item case, I'm going to need a choice. I think the best I can do is <ignore:ListOf1__XMLSchemaMakesMeHaveThisForUPA/> <List>notList</List> and <ignore:ListOfN__XMLSchemaMakesMeHaveThisForUPA/> <List>list1</List><List>list2</List><List>list3</List> as the XML representations. Are there any better/cleaner solutions? I did think of this way: (note: I've omitted xs:annotation and xs:appinfo for brevity), but it isn't exactly "clean". This is what I call "modeling syntax as data".... <dfdl:defineVariable name="foundOpenQuote" type="xs:boolean"/> <xs:group name="optionalOpenQuote"> <choice> <xs:sequence dfdl:initiiator='"'> <dfdl:setVariable ref="foundOpenQuote" value="{ fn:true() }"/> </xs:sequence> <xs:sequence dfdl:initiator=""/> </choice> </xs:group> <xs:group name="matchingCloseQuote"> <choice> <xs:sequence dfdl:terminator='"'> <dfdl:discriminator>{ $foundOpenQuote eq fn:true() }</dfdl:assert> </xs:sequence> <xs:sequence /> </choice> </xs:group> // The main sequence for the data would then have this as the list element: <xs:sequence> <dfdl:newVariableInstance ref="foundOpenQuote" defaultValue="false"/> <xs:sequence dfdl:hiddenGroupRef="optionalOpenQuote"/> <xs:sequence dfdl:separator=","> <xs:element name="List" type="xs:string" maxOccurs="unbounded"/> </xs:sequence> <xs:sequence dfdl:hiddenGroupRef="matchingCloseQuote"/> </xs:sequence> I'd try this out, except that we haven't got dfdl:newVariableInstance yet. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php> On Wed, Nov 22, 2017 at 4:11 AM, Steve Hanson <smh@uk.ibm.com> wrote:
I don't think there is a way to achieve what you want. As you say, trimming pad chars takes precedence over applying escape scheme.
I wondered if you could define the escapeBlockStart and End as "%WSP*; and %WSP*;" respectively but the white space entities are not allowed as escape character or in escape block start/end.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK Architect, *IBM DFDL* <http://www.ibm.com/developerworks/library/se-dfdl/index.html> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/> *smh@uk.ibm.com* <smh@uk.ibm.com> tel:+44-1962-815848 <+44%201962%20815848> mob:+44-7717-378890 <+44%207717%20378890>
From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org> Date: 22/11/2017 01:28 Subject: [DFDL-WG] how to trim inside of escape block? Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> ------------------------------
I have a CSV file
Some lines look like this
a,b," started with spaces, appearing right after the escape block start ",c,d,e
I reviewed the spec, and I see that pad characters appear outside of the quotation marks (escape block start/end).
What I'm trying to do is remove the whitespace after the escape block start, and before the escape block end. This is just spurious whitespace, appears because some of these CSV files were edited by people.
In my data the quoting characters are not always present. They are only there if a comma appears in the data string.
Is there a technique for getting rid of the leading/trailing whitespace inside the escape block start/end that I have forgotten?
...mikeb
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *www.tresys.com* <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.tresys.com&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=AJa9ThEymJXYnOqu84mJuw&m=vfzt-MyHajT591zYQmbcxckPT-mZLjNRPlTrg8kgRgY&s=vDa_CXvz_6ZAge5Ddy0xcukdYO5ZecWcijrrwh8LCAI&e=> Please note: Contributions to the DFDL Workgroup's email discussions are subject to the *OGF Intellectual Property Policy* <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ogf.org_About_abt-5Fpolicies.php&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=AJa9ThEymJXYnOqu84mJuw&m=vfzt-MyHajT591zYQmbcxckPT-mZLjNRPlTrg8kgRgY&s=KPFq-Tn_5Fmdo1dbD6fIVEGz348_1uFxuTKdJxqZnqM&e=> -- dfdl-wg mailing list dfdl-wg@ogf.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ogf.org_mailman_ listinfo_dfdl-2Dwg&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= AJa9ThEymJXYnOqu84mJuw&m=vfzt-MyHajT591zYQmbcxckPT- mZLjNRPlTrg8kgRgY&s=6PDI_r_U7OUsqAxzv24ZiCuH5zPpWFyzXbneqH1GPXk&e=
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Mike Beckerle
-
Steve Hanson