Hi Steve
Yep - I agree with your new definition
of the rule taking into account justification-independence and quoted text.
I'm not sure I like the nested if-else
in the new text though, as it took a couple of reads to understand. How
about the text below? It splits out the second 'if/else', which I
find easier to understand.
When parsing, if the pad character is
'0' and dfdl:textTrimKind is 'padChar' then the SimpleContent region is
trimmed of the '0' characters as defined by the trimming rules. If
this trimming results in the next character in the SimpleContent region
being a character other than a digit, the last '0' character is re-instated
and not trimmed. This rule also applies when the pad character is
a DFDL character entity equivalent to '0'. This rule does not apply when
the pad character is any other character nor when a pad byte is specified.
Cheers,
Andy
Andy
Edwards - IBM
Integration Bus -
DFDL
|
Email:
| andy.edwards@uk.ibm.com
|
Snail
Mail:
| MP211,
Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
|
Tel
int:
| 247222
|
Tel
ext:
| +44
(0)1962 817222
|
Desk:
| DE3
V17 |
| The
Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times |
From:
Steve Hanson/UK/IBM
To:
Andrew Edwards/UK/IBM@IBMGB
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
03/11/2014 15:39
Subject:
Re: [DFDL-WG]
Trimming of a text number that's all zeros when the number pattern has
a sign char at the end
Andy
I agree that the existing words do not
cover all the scenarios. Your proposed words are on the right track but
only cover left trimming a right justified text number. We need something
that is independent of justification and can handle patterns where there
is quoted text as well as signs.
One can envisage some bizarre scenarios.
Eg, Text number pattern is "#0'000'" - an attempt to divide by
1000 using the pattern. DFDL parser would trim everything except 1 zero
which would not match the pattern which expects at least 3 zeros. Trimming
happens before pattern is looked at so I don't think we could cater for
this (if we even wanted to).
Perhaps we should say:
When parsing, if the pad character is
'0' and dfdl:textTrimKind is 'padChar' then if the SimpleContent region
is trimmed so that the removal of a '0' character leaves the next character
other than a digit, the last '0' character is re-instated and not trimmed.
This rule also applies when the pad character is a DFDL character
entity equivalent to '0'. This rule does not apply when the pad character
is any other character nor when a pad byte is specified.
That means that "000,000,123" would end up as "0,000,123"
instead of ",000,123" today and "0000.025" would end
up as "0.025" instead of ".025" today but I think that
is good.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Andrew Edwards/UK/IBM@IBMGB
To:
DFDL-WG <dfdl-wg@ogf.org>
Date:
30/10/2014 16:43
Subject:
[DFDL-WG] Trimming
of a text number that's all zeros when the number pattern has a sign char
at the end
Sent by:
dfdl-wg-bounces@ogf.org
Hi all,
I've hit an interesting case revolving around trimming and number patterns
that doesn't seem quite sane to me.
Consider an element with the following properties:
textTrimKind='padChar'
textNumberPadCharacter='0'
textNumberPattern='0000+;0000-'
So we have the sign character at the end of the representation. Now,
imagine that the data being parsed is "0000+". The relevant
rules from the DFDL specification are:
- Section 13.2 on textTrimKind
When
'padChar', the element is trimmed of the dfdl:textStringPadCharacter, dfdl:textNumberPadCharacter,
dfdl:textBooleanPadCharacter or dfdl:textCalendarPadCharacter depending
on the type of the element.
- Section 13.6 on textNumberPadCharacter
When
parsing, if the pad character is '0' and the SimpleContent region consists
entirely of '0' characters, then the last remaining '0' is not trimmed
and a single '0' is the result of the trimming. This rule also applies
when the pad character is a DFDL character entity equivalent to '0'. This
rule does not apply when the pad character is any other character nor when
a pad byte is specified.
Describes
all of the pattern syntax.
In our hypothetical case, the content region is not all zeros, as it ends
in '+'. This means that the rule in section 13.6 does not apply and
we only apply the rule in 13.2. This results in us trimming away
all of the zeros and ending up with '+'. This then doesn't parse
as a number.
The problem seems to be that the rule in Section 13.6 doesn't take into
account that the suffix of the pattern can result in text in the content
region that isn't part of the digits of the number. Should the rule
under section 13.56 be something more like this...
When parsing, if the pad character is '0'
and the SimpleContent region consists entirely of '0' characters, or
the SimleContent region consists of a string of '0' characters followed
by non-digit characters, then the last remaining '0' is not
trimmed and a single '0' is the result of the trimming. This rule
also applies when the pad character is a DFDL character entity equivalent
to '0'. This rule does not apply when the pad character is any other character
nor when a pad byte is specified.
Thoughts?
Andy
Andy
Edwards - IBM
Integration Bus -
DFDL
|
Email:
| andy.edwards@uk.ibm.com
|
Snail
Mail:
| MP211,
Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
|
Tel
int:
| 247222
|
Tel
ext:
| +44
(0)1962 817222
|
Desk:
| DE3
V17 |
| The
Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times |
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU