Mike
You are interpreting the spec correctly.
I would model this with the quotes as escapeBlockStart/End and generateEscapeBlock="always".
The reason why the format is parse-able is precisely because the quotes
are being used to escape the content.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson <smh@uk.ibm.com>
Cc:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date:
16/08/2017 16:55
Subject:
Re: [DFDL-WG]
clarification question on terminators vs. enclosing group separators/terminators
So the use case that drives the question is syslogd format.
Part of the syntax is a whitespace separated list of pairs
like so:
foo="stuff with spaces" bar="more stuff
with spaces and equal = signs"
The spaces separate the pairs, the quotation marks are
required, not optional, so they're not escapeBlockStart/End, they're initiator
and terminator.
There's a sequence with space separator here.
Inside that is recurring "pairs" containing
name and content separated by "=". Zero or more pairs.
Content has an initiator and terminator which are double
quotes.
The spaces inside the string content are *not* escaped.
Nor equal signs.
emptyValueDelmiterPolicy is 'both', non-nillable, so nilValueDelimiterPolicy
is not relevant.
Seems to me a parser for this does not need escaping of
the spaces or = that appear inside the content, but the DFDL spec can only
express parsing these if those escapes are provided.
Am I interpreting the spec correctly in this case? That
because the surrounding groups have space and = separators, that the content
must escape these if they appear?
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
On Wed, Aug 16, 2017 at 11:28 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
In general, enclosing construct's delimiters
are also relevant. When scanning for the value of an element with a terminator,
there are some circumstances where there might not be a terminator:
- nil value delimiter policy says there is no terminator
- empty value delimiter policy says there is no terminator
- element is optional so if you find enclosing construct delimiter as first
character the element is missing
So you *could* design a wholly delimited format where enclosing construct
delimiters never needed escaping but it would be a bit restrictive in practice.
Formats that I have seen where enclosing construct delimiters are not escaped
usually have fixed length fields.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
From: Mike
Beckerle <mbeckerle.dfdl@gmail.com>
To: "dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>
Date: 16/08/2017
15:48
Subject: [DFDL-WG]
clarification question on terminators vs. enclosing group
separators/terminators
Sent by: "dfdl-wg"
<dfdl-wg-bounces@ogf.org>
The DFDL Spec says:
12.3.2 dfdl:lengthKind 'delimited'
On parsing, the length of an element with dfdl:lengthKind
'delimited' is determined by scanning the datastream for the delimiter.
The data stream is scanned for any of
· the element's
terminator (if specified)
· an enclosing
construct's separator or terminator
· the end
of an enclosing element designated by its known length
· the end
of the data stream
So if an element has a terminator, are the enclosing constructs' separator
or terminator also relevant? Or is ONLY the element's own terminator relevant
for scanning, and hence, only the element's own terminator must be escaped
if it appears in the content.
For example, in a space-separated group, an enclosed element has a terminator
";". When parsing that element, do spaces have to be escaped
if they appear in the content, or does only the terminator ";"
have to be escaped?
Strictly speaking it seems enclosing delimiters shouldn't have to be escaped,
because the data must have the ";", and spaces are only significant
as separators after finding the ";" terminator.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU