Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 12/11/2013 13:55 -----
From:
Steve Hanson/UK/IBM
To:
Alex Wood1/UK/IBM@IBMGB,
Date:
12/11/2013 12:19
Subject:
Re: decoding
UTF-16 sequence with an unpaired surrogate in ICU.
Thanks Alex.
So we can control what ICU does in this
scenario using dfdl:encodingErrorPolicy in the expected way, as the DFDL
spec says.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Alex Wood1/UK/IBM
To:
Steve Hanson/UK/IBM@IBMGB,
Date:
12/11/2013 12:12
Subject:
decoding UTF-16
sequence with an unpaired surrogate in ICU.
So I coded a java program to test this
in ICU4J
So when decoding in ICU it seems to
class an unpaired UTF-16 surrogate as malformed input.
ICU API allows the programmer to specify
the behaviour for malformed input.
ignore, replace or report the offending
code point.
default is to report it and therefore
the decode would fail with an error.
the ICU4C api has similar options available.
test program:
public
class
test1 {
/**
* @param
args
*/
public
static
void
main(String[] args) {
//
TODO
Auto-generated method stub
final
byte[]
byteArray = { (byte)
0xD8, 0x34, (byte)
0xDD, 0x1E, (byte)
0xD8, 0x34};
CharsetProvider cp = new
CharsetProviderICU();
CharsetDecoder decoder = cp.charsetForName("UTF-16").newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
decoder.reset();
ByteBuffer bb = ByteBuffer.wrap(byteArray,
0, 6);
CharBuffer cb = CharBuffer.allocate(6);
CoderResult decodeResult = decoder.decode(bb,
cb, true);
if
(decodeResult.isMalformed() || decodeResult.isUnmappable()) {
System.out.println("Error
at " + bb.position() );
}
System.out.println("Result"
+ cb.toString() );
}
}
Kind Regards,
- Alex
Alex Wood -
Software Engineer -
WebSphere Message Broker Development
DFDL Development
MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
Tel: Internal 246272, External 01962 816272
Notes: Alex Wood1/UK/IBM@IBMGB
e-mail: wooda@uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU