Hi all,
I've hit an interesting case revolving
around trimming and number patterns that doesn't seem quite sane to me.
Consider an element with the following
properties:
textTrimKind='padChar'
textNumberPadCharacter='0'
textNumberPattern='0000+;0000-'
So we have the sign character at the
end of the representation. Now, imagine that the data being parsed
is "0000+". The relevant rules from the DFDL specification
are:
- Section 13.2 on textTrimKind
When
'padChar', the element is trimmed of the dfdl:textStringPadCharacter, dfdl:textNumberPadCharacter,
dfdl:textBooleanPadCharacter or dfdl:textCalendarPadCharacter depending
on the type of the element.
- Section 13.6 on textNumberPadCharacter
When
parsing, if the pad character is '0' and the SimpleContent region consists
entirely of '0' characters, then the last remaining '0' is not trimmed
and a single '0' is the result of the trimming. This rule also applies
when the pad character is a DFDL character entity equivalent to '0'. This
rule does not apply when the pad character is any other character nor when
a pad byte is specified.
Describes
all of the pattern syntax.
In our hypothetical case, the content
region is not all zeros, as it ends in '+'. This means that the rule
in section 13.6 does not apply and we only apply the rule in 13.2. This
results in us trimming away all of the zeros and ending up with '+'. This
then doesn't parse as a number.
The problem seems to be that the rule
in Section 13.6 doesn't take into account that the suffix of the pattern
can result in text in the content region that isn't part of the digits
of the number. Should the rule under section 13.56 be something more
like this...
When parsing, if the pad character is '0'
and the SimpleContent region consists entirely of '0' characters, or
the SimleContent region consists of a string of '0' characters followed
by non-digit characters, then the last remaining '0' is not
trimmed and a single '0' is the result of the trimming. This rule
also applies when the pad character is a DFDL character entity equivalent
to '0'. This rule does not apply when the pad character is any other character
nor when a pad byte is specified.
Thoughts?
Andy
Andy
Edwards - IBM
Integration Bus -
DFDL
|
Email:
| andy.edwards@uk.ibm.com
|
Snail
Mail:
| MP211,
Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
|
Tel
int:
| 247222
|
Tel
ext:
| +44
(0)1962 817222
|
Desk:
| DE3
V17 |
| The
Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times |
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU