IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
DFDL-WG <dfdl-wg@ogf.org>
Date:
01/09/2020 14:54
Subject:
[EXTERNAL] [DFDL-WG]
clarification needed? dfdl:textNumberCheckPolicy 'strict' - language suggests
more strict than ICU libraries
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
Say we have this schema snippet:
<xs:element name="SimpleDataFormat">
<xs:complexType>
<xs:sequence>
<xs:element name="NumStudents"
type="xs:nonNegativeInteger"
dfdl:textNumberCheckPolicy="strict"
dfdl:textNumberPattern="#,###"
dfdl:textStandardGroupingSeparator=","
dfdl:textStandardDecimalSeparator="."
/>
</xs:sequence>
</xs:complexType>
</xs:element>
This successfully parses the data
1234
Even though textNumberCheckPolicy="strict" and the pattern contains a grouping separator, it still allows data that does not contain grouping separators.
That said, we have generally tried to make DFDL's spec match the behavior of the ICU library for parsing numbers based on the textNumberPattern. This library has this to say about strict parsing of numbers:
The following conditions cause a parse failure relative
to [lax] mode
(examples use the pattern "#,##0.#"):
So based on ICU's description of strict, this is the expected behavior. It doesn't say anything about missing grouping separators causing an error. Only that if they do exist then they must be in the right spot.
The only thing the DFDL specification mentions regarding strict numbers is this:
If 'strict' and dfdl:textNumberRep is 'standard' then the
data must
follow the pattern with the exceptions that digits 0-9, decimal
separator and exponent separator are always recognised and parsed
To me, that reads like the decimal separator should always be required in strict mode, so this feels like the ICU behavior and the behavior described in the DFDL specification do not match. And I believe the DFDL behavior was intended to match ICU behavior, so it's possible the DFDL specification needs to be updated.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber
Defense | www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU