Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848

----- Forwarded by Steve Hanson/UK/IBM on 12/11/2013 13:55 -----

From:        Steve Hanson/UK/IBM
To:        Alex Wood1/UK/IBM@IBMGB,
Date:        12/11/2013 12:19
Subject:        Re: decoding UTF-16 sequence with an unpaired surrogate in ICU.



Thanks Alex.

So we can control what ICU does in this scenario using dfdl:encodingErrorPolicy in the expected way, as the DFDL spec says.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair,
OGF DFDL Working Group
IBM SWG, Hursley, UK

smh@uk.ibm.com
tel:+44-1962-815848




From:        Alex Wood1/UK/IBM
To:        Steve Hanson/UK/IBM@IBMGB,
Date:        12/11/2013 12:12
Subject:        decoding UTF-16 sequence with an unpaired surrogate in ICU.



So I coded a java program to test this in ICU4J

So when decoding in ICU it seems to class an unpaired UTF-16 surrogate as malformed input.

ICU API allows the programmer to specify the behaviour for malformed input.

ignore, replace or report the offending code point.

default is to report it and therefore the decode would fail with an error.

the ICU4C api has similar options available.

test program:


public class test1 {

        /**
         * @param args
         */
        public static void main(String[] args) {
                // TODO Auto-generated method stub

                final byte[] byteArray = { (byte) 0xD8, 0x34, (byte) 0xDD, 0x1E, (byte) 0xD8, 0x34};
               
                CharsetProvider cp = new CharsetProviderICU();
               
                CharsetDecoder decoder = cp.charsetForName("UTF-16").newDecoder();
                decoder.onMalformedInput(CodingErrorAction.IGNORE);
                decoder.reset();
                ByteBuffer bb = ByteBuffer.wrap(byteArray, 0, 6);
                CharBuffer cb = CharBuffer.allocate(6);
                CoderResult decodeResult = decoder.decode(bb, cb, true);

                if (decodeResult.isMalformed() || decodeResult.isUnmappable()) {
                        System.out.println("Error at " + bb.position() );
                }                
                System.out.println("Result" + cb.toString() );
               
        }
}



Kind Regards,

- Alex

Alex Wood -
Software Engineer -
WebSphere Message Broker Development
DFDL Development

MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
Tel: Internal 246272, External 01962 816272
Notes: Alex Wood1/UK/IBM@IBMGB
e-mail: wooda@uk.ibm.com


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU