In certain operations that reference a data item whose PICTURE character-string contains the symbol P, the algebraic value of the data item is used rather than the actual character representation of the data item. This algebraic value assumes the decimal point in the prescribed location and zero in place of the digit position specified by the symbol P. The size of the value is the number of digit positions represented by the PICTURE character-string. These operations are any of the following:
This implies that the scaling should be applied as a lexical operation on the data. In other words two COBOL fields, one with PIC PP9 and value '2' and one with PP999 and value '002' do not result in the same logical number.
There is an equivalence between V and
P. PP999 == V99999 and 999PP == 99999V == 99999. If we consider
things in these terms the reasoning is simpler. To prevent # symbol
zero suppression from changing the value, rule a) must apply and there
must be no # to the right of the V. That restates our rules as:
a) A pattern with a V symbol
must not have # symbols to the right of the V symbol.
b) A pattern with P symbols
at the left end must have no # symbols in the pattern.
c) A pattern with P symbols
at the right end has no restrictions.
There is another problem though. The number can be trimmed using the pad character from either or both ends depending on justification, before applying the number pattern. If the pad character is 0 then this can also cause 0's to be lost and result in mis-application of V and P symbols. I'm not sure there is much we can do about this. Modelers need to be careful when padding/trimming that they get the justification correct. For example, we typically think of numbers as being right justified, but for a number with Ps on the left, it is effectively left justified and should be modeled as such. We added errata 2.25 which prevented trimming from leaving an empty string. I am thinking that this errata should actually say that trimming must leave at least the minimum number of digits implied by the pattern, as an extra safeguard? We mustn't disallow trimming/padding altogether as it is used to remove spaces.
The ICU pad character symbol * is used to provide a pad character when the data is shorter than the pattern. This is only used to pad when unparsing, it is not used to trim. But it might be safer to disallow P and V symbols when * is used?
Reading the ICU description of significant digit symbol @, explicit decimal points are disallowed. I think we should disallow P and V symbols when @ symbol is used. Errata 2.28 should be updated.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 22/03/2012 09:22 -----
From:
Steve Hanson/UK/IBM
To:
dfdl-wg@ogf.org
Date:
21/03/2012 10:34
Subject:
Action 167:
textNumberPatterns with P,V, # - allowable combinations
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU