Good questions.
The intent of this DFDL feature is as
follows:
- a point of uncertainty (POI) cannot
be resolved using an initiator ( the simplest option ). The data format
just doesn't work that way.
- it could be resolved using an assert
or a discriminator, but that would be too heavyweight.
- a simple inspection of the data format
reveals that the discrimination can be done by testing the first few characters
of each branch of the POI.
Example: SWIFT 50K ( multi-line address
field):
:32A:060929EUR25,36&hex;0D&hex;&hex;0A&hex;
:33B:EUR56,78&hex;0D&hex;&hex;0A&hex;
:50K:/IT60X0542811101000000123456
ABC Corporation Times Square 7 NY 1
LINE 2
LINE 3
:52A:/<etc>
Note that field 50K contains lines of
address data, but the actual number of lines is not known. So how will
the DFDL parser know when the 50K field has completed? Answer: it
encounters a line that starts with a colon.
Now, the most natural way to model SWIFT
field 50K is as a series of lines. The SWIFT XML format defines it this
way.
If you work through the possibilities,
it turns out that the only way to achieve this using discriminators is:
- cause the parser to parse each line
and put it into the info set
- add a discriminator to the repeating
'addressLine' element. The DFDL expression would be something like this:
{ if ( fn:exists(./NameAddress_Line)
) then (fn:not(fn:starts-with(./NameAddress_Line, ':'))) else xs:boolean("true")
}
That's a very expensive way to achieve
the intended goal, which is 'treat the data as another addressLine if the
next character is a colon'.
So that was the motivation for the feature.
To answer the questions:
- not intended to be limited to xs:string
only
- not intended to be limited to elements
with text representation ( because dfdl:represention only applies to simple
elements, and the POI might be a group or an element.)
- is intended to be matched against
text or binary data, starting at the POI's byte offset. If the element's
representation is binary then the 'encoding' property will be required.
Sounds as if the spec needs to be clarified
in this area.
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert@uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
dfdl-wg@ogf.org,
Date:
15/10/2012 19:26
Subject:
[DFDL-WG] question/clarification
- asserts using test patterns
Sent by:
dfdl-wg-bounces@ogf.org
Question: Is an assertion using a regular expression pattern allowed on
(a) xs:string type elements
(b) any data with text representation
(c) any data with text or binary representation
and, does the regular expression apply to the representation or the logical
data value?
(a) is the only case that is not ambiguous, because the representation
and the logical value are the same thing.
For everything else, there’s the question of whether the test is on the
representation or the logical value. If it's the logical value, then how
is a regex made meaningful on a logical value that is, for example, a number,
without defining a canonical representation to which the logical value
is converted?
If it's to apply to the representation, then exactly what data? (Eg., what
grammar region) is subject to the regex?
...mikeb
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU