Received: from
smtpksrv1.mitre.org (localhost.localdomain [127.0.0.1])
by localhost (Postfix) via Exchange Front-End Server webmail.afmc.af.mil
([131.28.34.85]) with SMTP id 0A8791F116E for <jgarriss@mitre.org>;
Tue,
4 Jun 2013 14:03:24 -0400 (EDT)
<xs:element name="Received_Header"
dfdl:initiator="Received:%WSP*;" dfdl:terminator="%CR;%LF">
<xs:complexType>
<xs:sequence dfdl:separator="%CR;%LF;%SP;"
dfdl:separatorPosition="infix">
<xs:element
name="data" type="xs:string" maxOccurs="unbounded"
dfdl:lengthKind="delimited" />
</xs:sequence>
</xs:complexType>
</xs:element>
DFDL consumes the initiator then starts
processing the content
of the header as an array of records. The CR+LF+SP are consumed as the
separator, because that is the longest match. The CR+LF (no SP) is consumed
as the terminator of the header. Clearly that only works if there is no
SP straight after the CR+LF for the first line of a header. So you don't
need a discriminator.
You will have to stitch the data together
post-parse. I guess you could make the sequence hidden and get DFDL to
stitch together the data lines into one long string via an element with
dfdl:inputValueCalc.
Ah - I think I see where Mike's earlier
append to the mailing list was coming from ?
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
"Garriss Jr.,
James P." <jgarriss@mitre.org>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date:
05/06/2013 16:25
Subject:
Re: [DFDL-WG]
Ignore extraneous CRLF w/ space?
Sent by:
dfdl-wg-bounces@ogf.org
> Is
the problem that the dfdl:terminator '%CR;%LF;' for the end of the header
record is firing prematurely when it encounters the CRLF in the data?
Exactly.
> I
would model the data as unbounded repeating records, and use a discriminator
to distinguish the repeats from the next header.
Uh, could you repeat that
in English? Maybe with a small example? I freely admit that
I don’t understand what you just said. Thanks!
From: Steve Hanson [mailto:smh@uk.ibm.com]
Sent: Wednesday, June 05, 2013 5:21 AM
To: Garriss Jr., James P.
Cc: dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org
Subject: Re: [DFDL-WG] Ignore extraneous CRLF w/ space?
James
Is the problem that the dfdl:terminator '%CR;%LF;' for the end of the header
record is firing prematurely when it encounters the CRLF in the data?
If so then I'm not sure that DFDL can ignore the extra %CR;%LF; without
using an escape scheme - but there isn't an escape scheme to use.
I would model the data as unbounded repeating records, and use a discriminator
to distinguish the repeats from the next header.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: "Garriss
Jr., James P." <jgarriss@mitre.org>
To: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date: 04/06/2013
19:56
Subject: [DFDL-WG]
Ignore extraneous CRLF w/ space?
Sent by: dfdl-wg-bounces@ogf.org
Long IMF headers, such as Received, can be wrapped onto the next line by
using a CRLF and then a space. This example has 3 such wrappings:
Received: from smtpksrv1.mitre.org (localhost.localdomain [127.0.0.1])
by localhost (Postfix) via Exchange Front-End Server webmail.afmc.af.mil
([131.28.34.85]) with SMTP id 0A8791F116E for <jgarriss@mitre.org>;
Tue,
4 Jun 2013 14:03:24 -0400 (EDT)
How do I get DFDL to ignore these wrappings? For most of the header,
it’s not an issue, because I can use a lengthPattern to lookahead to the
; before the date starts. But once the date starts, I have no way
of knowing when it ends, so I need to ignore any CRLF with a space.
TIA
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU