
Is the problem that the dfdl:terminator '%CR;%LF;' for the end of the
Correction below. Received: from smtpksrv1.mitre.org (localhost.localdomain [127.0.0.1]) by localhost (Postfix) via Exchange Front-End Server webmail.afmc.af.mil ([131.28.34.85]) with SMTP id 0A8791F116E for <jgarriss@mitre.org>; Tue, 4 Jun 2013 14:03:24 -0400 (EDT) <xs:element name="Received_Header" dfdl:initiator="Received:%WSP*;" dfdl:terminator="%CR;%LF"> <xs:complexType> <xs:sequence dfdl:separator="%CR;%LF;%SP;" dfdl:separatorPosition="infix"> <xs:element name="data" type="xs:string" maxOccurs="unbounded" dfdl:lengthKind="delimited" /> </xs:sequence> </xs:complexType> </xs:element> DFDL consumes the initiator then starts processing the content of the header as an array of records. The CR+LF+SP are consumed as the separator, because that is the longest match. The CR+LF (no SP) is consumed as the terminator of the header. Clearly that only works if there is no SP straight after the CR+LF for the last line of a header. So you don't need a discriminator. You will have to stitch the data together post-parse. I guess you could make the sequence hidden and get DFDL to stitch together the data lines into one long string via an element with dfdl:inputValueCalc. Ah - I think I see where Mike's earlier append to the mailing list was coming from ? Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: "Garriss Jr., James P." <jgarriss@mitre.org> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Date: 05/06/2013 16:25 Subject: Re: [DFDL-WG] Ignore extraneous CRLF w/ space? Sent by: dfdl-wg-bounces@ogf.org header record is firing prematurely when it encounters the CRLF in the data? Exactly.
I would model the data as unbounded repeating records, and use a discriminator to distinguish the repeats from the next header.
Uh, could you repeat that in English? Maybe with a small example? I freely admit that I don’t understand what you just said. Thanks! From: Steve Hanson [mailto:smh@uk.ibm.com] Sent: Wednesday, June 05, 2013 5:21 AM To: Garriss Jr., James P. Cc: dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org Subject: Re: [DFDL-WG] Ignore extraneous CRLF w/ space? James Is the problem that the dfdl:terminator '%CR;%LF;' for the end of the header record is firing prematurely when it encounters the CRLF in the data? If so then I'm not sure that DFDL can ignore the extra %CR;%LF; without using an escape scheme - but there isn't an escape scheme to use. I would model the data as unbounded repeating records, and use a discriminator to distinguish the repeats from the next header. Regards Steve Hanson Architect, IBM Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: "Garriss Jr., James P." <jgarriss@mitre.org> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Date: 04/06/2013 19:56 Subject: [DFDL-WG] Ignore extraneous CRLF w/ space? Sent by: dfdl-wg-bounces@ogf.org Long IMF headers, such as Received, can be wrapped onto the next line by using a CRLF and then a space. This example has 3 such wrappings: Received: from smtpksrv1.mitre.org (localhost.localdomain [127.0.0.1]) by localhost (Postfix) via Exchange Front-End Server webmail.afmc.af.mil ([131.28.34.85]) with SMTP id 0A8791F116E for <jgarriss@mitre.org>; Tue, 4 Jun 2013 14:03:24 -0400 (EDT) How do I get DFDL to ignore these wrappings? For most of the header, it’s not an issue, because I can use a lengthPattern to lookahead to the ; before the date starts. But once the date starts, I have no way of knowing when it ends, so I need to ignore any CRLF with a space. TIA -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU