When testing I found that the data was
corrupted when I got to > 9 'S' symbols, due to ICU's use of int32 to
store the value. I raised a ticket and it has been accepted as a defect.
But it shows that normal use does not go beyond 9.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson/UK/IBM@IBMGB
Cc:
Andrew Edwards/UK/IBM@IBMGB,
DFDL-WG <dfdl-wg@ogf.org>
Date:
25/08/2015 14:16
Subject:
Re: [DFDL-WG]
Action 284 - agenda item on ICU 'S' symbol
Do we really want to allow "any number of S"
?
Quantum mechanics based on plank's constant and the speed
of light, the smallest unit of time is about 3.3x10-44 seconds, so there's
never going to be a need for more than 45 S's in this universe, at least
until time-travel is discovered. (http://www.physlink.com/Education/AskExperts/ae598.cfm)
This is a place where an "implementation specific
maximum" makes sense to me, though I'd be happy to put a floor under
it like not less than 6 "S".
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology
| www.tresys.com
Please note: Contributions to the DFDL Workgroup's email
discussions are subject to the OGF
Intellectual Property Policy
On Tue, Aug 25, 2015 at 6:54 AM, Steve Hanson <smh@uk.ibm.com>
wrote:
Section 13.11.1 to be updated as follows:
S
| fractional
second (see note 1)
| Number
| S
SS
SSS
| 2
23
235 |
The count of pattern letters determines the format as indicated in the
table. <---- moved
earlier
When numeric fields abut one another directly, with no intervening delimiter
characters, they constitute a run of abutting numeric fields. Such runs
are parsed specially as described at [ICUDateTime].
Unlike other fields, fractional seconds "S" are padded on the
right with zero. <---
moved earlier
Any number of "S"
symbols may by specified in the pattern.
Implementations must accept
any number of "S" symbols and must support at least millisecond
accuracy. When
the number of "S" symbols exceeds the supported accuracy, excess
fractional seconds are truncated
from the right (not rounded) when parsing, and zeros are added to the right
when unparsing. For example, for xs:time with dfdl:calendarPattern "ss.SSSS"
and millisecond accuracy, parsing data "12.3456" creates infoset
value "00:00:12:345", which when unparsing creates data "12.3450".
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Steve
Hanson/UK/IBM
To: Andrew
Edwards/UK/IBM@IBMGB
Cc: DFDL-WG
<dfdl-wg@ogf.org>
Date: 12/08/2015
09:06
Subject: Re:
[DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 - agenda item on ICU 'S' symbol
Andy - yes that is the behaviour I am seeing.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Andrew
Edwards/UK/IBM
To: Steve
Hanson/UK/IBM@IBMGB
Cc: DFDL-WG
<dfdl-wg@ogf.org>
Date: 11/08/2015
17:28
Subject: Re:
[DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 - agenda item on ICU 'S' symbol
Following on from today's call, the relevant piece of documentation is
in http://icu-project.org/apiref/icu4c/classicu_1_1SimpleDateFormat.html
When numeric fields abut one another directly, with no intervening delimiter
characters, they constitute a run of abutting numeric fields. Such runs
are parsed specially. For example, the format "HHmmss" parses
the input text "123456" to 12:34:56, parses the input text "12345"
to 1:23:45, and fails to parse "1234". In other words, the leftmost
field of the run is flexible, while the others keep a fixed width. If the
parse fails anywhere in the run, then the leftmost field is shortened by
one character, and the entire run is parsed again. This is repeated until
either the parse succeeds or the leftmost field is one character in length.
If the parse still fails at that point, the parse of the run fails.
So it seems that when the 'S' is next to other numeric units in the pattern,
it will be subject to the above behaviour. Therefore:
- A pattern of HHmmssSSS with the input 112233123 will become 11:22:33.123
but the input 1122331234 will trigger an error.
- If the pattern includes a '.' to become HHmmss.SSS, I think the
input 1122331234 will become 11:22:33.123 but I'll try and confirm.
Steve - Does that description match what you were seeing?
HTH,
Andy
Andy
Edwards - IBM
Integration Bus -
DFDL
|
Email:
| andy.edwards@uk.ibm.com
|
Snail
Mail:
| MP211,
Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
|
Tel
int:
| 247222
|
Tel
ext:
| +44
(0)1962 817222
|
Desk:
| DE3
V17 |
| The
Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times |
From: Steve
Hanson/UK/IBM
To: Andrew
Edwards/UK/IBM@IBMGB
Cc: DFDL-WG
<dfdl-wg@ogf.org>
Date: 11/08/2015
12:53
Subject: Re:
[DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11 - agenda item on ICU 'S' symbol
Hi Andy
Your internal ticket #630 gave rise to external ticket http://bugs.icu-project.org/trac/ticket/10962,
which claims to have fixed the API docs to clarify the behaviour.
S
| fractional
second - truncates (like other time fields)
to the count of letters when formatting. Appends
zeros if more than 3 letters specified. Truncates at
three significant digits when parsing.
| S
SS
SSS
SSSS
| 2
23
235
2350 |
I can't see anywhere that addresses your point about about abutting versus
non-abutting numeric symbols though?
As far as DFDL spec is concerned, this is what we say today:
S
| fractional
second (see note 1)
| Number
| S
SS
SSS
| 2
24
235 |
There is no 'note 1', I think the note was made into a normal paragraph,
which reads:
Any number of fractional seconds "S" may by specified in the
pattern and accepted by implementations, but an implementation is free
to represent a limited number of fractional seconds internally. Excess
fractional seconds are truncated, not rounded up. At least millisecond
accuracy must be implemented. Unlike other fields, fractional seconds are
padded on the right with zero.
Regards
Steve Hanson
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From: Andrew
Edwards/UK/IBM
To: Steve
Hanson/UK/IBM@IBMGB
Date: 11/08/2015
11:56
Subject: Re:
[DFDL-WG] OGF DFDL WG Call Agenda 2015-08-11
Hi Steve
Re agenda item 2 and calendar patterns with 'S', this ICU ticket from last
year might be relevant - https://icu.sanjose.ibm.com/gcoctrac/ticket/630.
It seems that the error reporting may also depend on whether the
pattern has 'S' on it's own or next to other numeric pattern entities.
i.e. 'HHmmssS' is subject to length checking, but 'HHmmss S' is not,
due to the space before the 'S'.
HTH,
Andy
Andy
Edwards - IBM
Integration Bus -
DFDL
|
Email:
| andy.edwards@uk.ibm.com
|
Snail
Mail:
| MP211,
Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN
|
Tel
int:
| 247222
|
Tel
ext:
| +44
(0)1962 817222
|
Desk:
| DE3
V17 |
| The
Feynman problem solving Algorithm
1) Write down the problem
2) Think real hard
3) Write down the answer
-- Murray Gell-mann in the NY Times |
From: Steve
Hanson/UK/IBM@IBMGB
To: dfdl-wg@ogf.org
Cc: Mike
Beckerle <mbeckerle@tresys.com>,
jorge.marizan@gmail.com
Date: 10/08/2015
18:29
Subject: [DFDL-WG]
OGF DFDL WG Call Agenda 2015-08-11
Sent by: dfdl-wg-bounces@ogf.org
Please find agenda for call on Redmine at https://redmine.ogf.org/dmsf_files/13489?download=
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU