An alternative is to specify the terminator
as a single %SP; and then use DFDL's justification and trimming properties
to remove the excess %SP;s before adding to the infoset.
<xs:element name="Data"
type="xs:string" dfdl:terminator="%SP;" dfdl:textStringJustification="left"
dfdl:textTrimKind="padChar" dfdl:textStringPadCharacter="%SP;"
/>
Allowing %SP*; is a slightly slippery slope.
It can be argued that the * and + could be offered for any DFDL entity,
or even bracketed group of entities. The danger is that DFDL entities become
their own matching language. The WG has discussed in the past allowing
a regular expression for delimiters, and this would be a candidate feature
for a future DFDL 2.0.
(In case you are now wondering how come
%WSP*; is allowed, it is because an existing IBM modelling language that
DFDL supersedes had such a facility, and this enables a smooth migration
to DFDL. Hence it is an exception, albeit a very useful one, rather than
the norm.)
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
"Garriss Jr.,
James P." <jgarriss@mitre.org>
To:
"dfdl-wg@ogf.org"
<dfdl-wg@ogf.org>,
Date:
01/03/2013 13:40
Subject:
Re: [DFDL-WG]
Representing multiple spaces
Sent by:
dfdl-wg-bounces@ogf.org
It seems like %SP*; and %SP+;
would be useful additions to the spec. Can that be considered?
(And yes, I meant %SP*; not
%ES*;. Good catch, thank you.)
From: Mike Beckerle [mailto:mbeckerle.dfdl@gmail.com]
Sent: Thursday, February 28, 2013 7:48 PM
To: Garriss Jr., James P.
Cc: dfdl-wg@ogf.org
Subject: Re: [DFDL-WG] Representing multiple spaces
%WSP+; is one or more whitespaces.
that might be what you want.
The only way to do one or more %SP; (I think you meant %SP*; not %ES*;
- ES is empty string) only is like this
dfdl:terminator="%SP; %SP;%SP; %SP;%SP;%SP; ..."
i.e, a whitepace separated list of one space, two spaces, three spaces,
etc. up to as high as you would like to go.
If that just won't cut it, then you have to go to something i call modeling
syntax as data:
You create a group
<group name="spaces">
<sequence>
<xs:element name="spaces" type="xs:string"
dfdl:lengthKind="pattern" dfdl:lengthPattern="\s*"/>
</sequence>
</group>
This spaces group is a model for something that is just a syntactic feature
of your data.
Then you keep the group out of your logical infoset by using a hidden group
ref like so
<sequence>
<element name="beforeSpaces" type="xs:string"
dfdl:terminator="%SP;"/>
<sequence dfdl:hiddenGroupRef="tns:spaces"/>
<element name="afterSpaces" type="xs:string"dfdl:terminator="%SP;"/>
</sequence>
Here's what I'm not sure of....
In XSD, it would be ok to have multiple groups like this between elements,
because the elements aren't named "spaces", so the various instances
of the "spaces" element can't be confused. (There is no UPA problem.)
In DFDL, I'm not sure if we allow this:
<sequence dfdl:sequenceKind="ordered">
<element name="foo" .../>
<element name="spaces" .../>
<element name="bar" .../>
<element name="spaces".../>
....
</sequence>
I.e., more than one child element named "spaces" in the same
sequence, but it's not an array using minOccurs/maxOccurs and dfdl:occursCountKind,
etc.
XML Schema would not have a problem with this, so long as those elements
are all required. (minOccurs >= 1).
I've sent a separate email to the dfdl-wg to see what others' opinions
are on this.
On Thu, Feb 28, 2013 at 3:03 PM,
Garriss Jr., James P. <jgarriss@mitre.org>
wrote:
Suppose I have a terminator that
can be multiple spaces, whether 0 spaces, 1 space, 2 spaces, or more spaces.
No other types of whitespace allowed, just spaces.
Because there’s this entity: %WSP*;
I assumed there would also be this
entity: %ES*;
But there’s not. Why not?
How would I represent this terminator?
TIA
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU