"Mike Beckerle"
<mbeckerle.dfdl@gmail.com>
Sent by: dfdl-wg-bounces@ogf.org 05/05/2009 14:09
|
|
I wanted to comment on this.
There are three choices here:
1. unicode
codepoints - we may need to preserve the mapping table (from representation
encoding to unicode) as part of the infoset.
2. "As
Encoded" codepoints - we must add the encoding to the infoset.
3. Both
In favor of unicode codepoints
- simplicity. Minor issue is that some mappings will lose information making
perfect round-tripping of string contents impossible.
E.g., EBCDIC has two different
line-endings both of which normally are translated to ASCII/Unicode linefeed.
Hence, translating back is ambiguous.
In favor of "as encoded"
- simplicity. We just add an encoding attribute to the string infoset object
which returns the information that the dfdl:encoding representation property
contained. Note that the encoding information really is already available
via the schema component associated with the string, so there is some redundancy
here. Also, there's the issue when dealing with this of whether one wants
codepoints, or raw access to the bytes. E.g., if the encoding is UTF-8
or shifted JIS, then the characters take up 1 or more bytes. Do you want
the bytes, or the interpreted code points or both?
In favor of "both" -
complexity, but eliminates all the ambiguity.
My suggestion: keep it simple
for v1.0 - Choose number 1 - because we can always expand the capabilities
later by providing access to the unencoded representation one way or another.
If you badly need infoset-level
contents which expose the actual representation character codes, you can
always model this as an array of bytes instead of a character string.
...mike
Mike Beckerle | OGF DFDL
WG Co-Chair | CTO | Oco, Inc.
Tel: 781-810-2125 | 100 Fifth Ave., 4th Floor, Waltham MA 02451
| mbeckerle.dfdl@gmail.com
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU