All
Using those rules the Swift 52A motivating
use case would be
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Account#"
minOccurs="0" maxOccurs="1" dfdl:lenghtkind= "delimited"
dfdl:terminator="&crlf;">
<xsd:complexType>
<xsd:sequence
dfdl:separator="/" separatorPosition='prefixed'>
<xsd:element name="ID" type="xsd:string" minOccurs="0"
maxOccurs="1" dfdl:lengthkind="delimited">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:discriminator test="{ length(.)
= '1' }" />
</xsd:appinfo>
</xsd:annotation>
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="1"/>
<xsd:minLength value="1"/>
<xsd:enumeration value="C"/>
<xsd:enumeration value="D"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="Number" type="xsd:string"
minOccurs="0" maxOccurs="1" dfdl:lengthkind="delimited">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="34"/>
<xsd:minLength value="1"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
From:
| Steve Hanson/UK/IBM@IBMGB
|
To:
| dfdl-wg@ogf.org
|
Date:
| 25/11/2009 10:03
|
Subject:
| Re: [DFDL-WG] How to determine the length
of an element which has text representation |
I'd also much rather do this without an extra property.
I agree with Tim's analysis below, except for "endOfParent".
The parent could be lengthKind = "delimited", in which
case you need to be scanning to find the end of your parent. That's
easy enough to explain so I don't think it breaks Tim's principle.
Regards
Steve Hanson
Programming Model Architect, WebSphere Message Brokers,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh@uk.ibm.com,
Phone (+44)/(0) 1962-815848
From:
| Tim Kimber/UK/IBM@IBMGB
|
To:
| Mike Beckerle <mbeckerle.dfdl@gmail.com>
|
Cc:
| "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>
|
Date:
| 24/11/2009 23:05
|
Subject:
| Re: [DFDL-WG] How to determine the length
of an element which has text representation
|
Sent by:
| dfdl-wg-bounces@ogf.org |
I disagree with the direction this conversation is taking. I don't think
we need another property - the dfdl:lengthKind property provides all of
the control which users require.
Let's examine how the parser might behave for *all* of the lengthKind enumerations:
explicit
The parser extracts a fixed number of characters/bytes from the input document
as directed by dfdl:length ( which may be a DFDL expression, and may resolve
to the value of the previous sibling )
prefixed
The parser extracts a fixed number of characters/bytes from the input document
as directed by the prefix length. Note the similarity with the DFDL expression
scenario above.
implicit
The parser extracts a fixed number of characters/bytes from the input document
as directed by the implicit length of the element.
delimited
The parser extracts from the input document all characters between the
current buffer position and the next unescaped item of in-scope terminating
markup.
pattern
The parser extracts from the input document all characters which match
the specified pattern
endOfParent.
The parser extracts from the input document all remaining characters/bytes
allowed by the representation properties of its parent groups/elements.
I think there is a consistency issue here. We either make the in-scope
markup apply to *all* lengthKinds ( including prefixed lengths and cases
where dfdl:length is an expression, which can amount to the same thing
), or we limit it to lengthKind="delimited'. Any in-between position
needs a very good justification.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
From:
| Mike Beckerle <mbeckerle.dfdl@gmail.com>
|
To:
| Alan Powell/UK/IBM@IBMGB
|
Cc:
| Stephanie Fetzer <sfetzer@us.ibm.com>,
"dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, Tim Kimber/UK/IBM@IBMGB
|
Date:
| 24/11/2009 20:52
|
Subject:
| Re: [DFDL-WG] How to determine the length
of an element which has text representation |
I like to use enums instead of booleans, so I suggest this property is
dfdl:textScanningMode as an enum with current values "scanned"
and "notScanned", but as an enum we have the ability to add some
intelligent mixed mode in the future (like "scanExceptFixedLength"
- if that proves useful)
One thought: we might try to think up terminology that is more declarative,
less parse centric. These properties about "scanning" would affect
output direction also, instructing the unparser to not bother inserting
escape characters if the logical element contains say, the parent delimiter.
I currently proceed under the assumption that not scanning turns off the
whole lexical analyzer, so escape sequences detected would also be considered
to be raw string content. You would still convert code points to logical
characters but characters would not be interpreted as delimiters, escapes,
quotation marks....
There's lots of potential for schema definition errors here of course.
E.g., lengthKind='delimited', but textScanningMode="notScanned"
clearly does not work.
...mike
On Tue, Nov 24, 2009 at 11:48 AM, Alan Powell <alan_powell@uk.ibm.com>
wrote:
Stephanie
4. Have a separated property to 'turn off scanning' for dfdl:representation='text'
5. Introduce a new lengthKind. 'fixedLengthDelimited'
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com
Tel: +44 (0)1962 815073
Fax: +44 (0)1962 816898
Yes - agreed. It makes sense that for parsing when delimiters are in scope
that if we hit a non-delimited length that we 'turn off scanning'. If
everyone is agreed on that then..
The decision to be made here is how we will handle elements with length
requirements while parsing when delimiters in scope:
1. We can allow and use dfdl:length for components with lengthKind="delimited"...in
a check that will occur after the element is initially parsed (via delimiter)
2. We can disallow the use of dfdl:length for components with lengthKind="delimited"...and
require that any length constraints be placed on such components via an
assert. An error or a warning will be generated if dfdl:length is
defined explicitly on a component with lengthKind="delimited"
3. We can ignore the use of dfdl:length for components with lengthKind="delimited"...and
require that any length constraints be placed on such components via an
assert.
Any other options? Which way are we leaning on this?
Cheers,
-Steph
WebSphere Transformation Extender
Industry Packs - Software Engineer
I support tim's view here. There needs to be an idiomatic way to shut off
scanning. Rep='binary' is much too obscure.
Question: which other length kinds should switch off scanning? Prefix?
Implicit? None of these?
...mikeb
On Nov 18, 2009, at 12:05 PM, Tim Kimber <KIMBERT@uk.ibm.com>
wrote:
I'd like to record what was discussed and raise another point which Alan
pointed out after meeting,
Discussions in the meeting
- dfdl:lengthKind applies only to the element on which it is specified.
It has no effect whatever on the parsing of child elements/groups.
- there may be some value in tolerating simple elements of type xs:string
with dfdl:representation="binary". Might be useful for schemas
where dfdl:representation="binary" throughout.
- Currently, the position of the WG is that parsers should *always* scan
to extract the text representation if there is any terminating markup in
scope. Even if lengthKind='explicit'.
- TK proposed the scheme outlined in his previous email, in which dfdl:lengthKind
alone specifies how the parser should extract the text representation.
If lengthKind="explicit", scanning is switched off and dfdl:length
is used. If lengthKind="delimited" the text rep is extracted
by scanning and length is ignored.
- A refinement was discussed whereby dfdl:length would be checked after
a scan has been performed if dfdl:lengthKind="delimited". This
would make the modeling of some common formats simpler, and avoid the need
for a dfdl:assert to enforce the length constraint.
- MB raised the possibility that we could actually disallow dfdl:length
if lengthKind='delimited'. This is the most conservative position, but
general opinion was that it would be too restrictive. There still might
be some value in disallowing dfdl:length for other lengthKinds.
Discussions after the meeting
- Alan pointed out that lengthKind="explicit" does not necessarily
mean that the length of the field is fixed. dfdl:length might be specified
as a DFDL expression. A common reason for doing that would be to obtain
the element's length from an earlier integer field. As currently specified,
if there was any markup in scope, the text rep would be extracted by scanning.
Restatement of my position after today's meeting:
I'm now even more convinced that dfdl:lengthKind="explicit" should
switch off scanning. Here's why:
a) The enumerations of lengthKind are explicit, implicit, prefixed,
delimited, pattern, endOfParent. The presence of 'delimited'
in that list means that in some users' minds, the other enumerations are
going to be interpreted as *alternatives* to 'delimited'.
b) If there's markup in scope, scanning cannot be switched off by any means.
Not even by setting lengthKind='explicit' AND obtaining dfdl:length from
a previous integer field. I think that's very counter-intuitive.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU