Assuming that the prefix contains the length
in characters then I think this works ok when the encoding is different.
The parser will first parse the prefix according to prefixLengthType
to get the prefix value, which is always of known length. If prefixIncludesPrefixLength
is 'yes' then it subtracts this known length from the prefix value, giving
the length of the data, which might be in a different encoding.
I think we should continue to allow
this. In the past we have talked about a DFDL 2.0 feature that allowed
the initiator and terminator to be specified using a simple type, precisely
to cover the (rare) cases where the characteristics of these delimiters
are different to the data itself. Doing it this way prevents a property
explosion on the element itself. I view prefixLengthType as the first
example of this principle.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve
Hanson/UK/IBM on 23/08/2013 16:20 -----
From:
Alex Wood1/UK/IBM
To:
Steve Hanson/UK/IBM@IBMGB,
Mark Frost/UK/IBM@IBMGB, dfdl-wg@ogf.org,
Date:
23/08/2013 14:58
Subject:
dfdl:lengthKind='prefixed'
, different encodings for prefix and content where prefixIncludesPrefixLength
is ‘yes.
Hi All,
Considering a case similar to that excluded
by errata 2.76. An element with lengthKind 'prefixed'
and prefixIncludesPrefixLength 'true' but where the prefix type and
the element both have lengthUnits 'characters' but have different encodings
(or specifically encodings with different lengths of characters).
I believe the issue that 2.76 is trying to avoid is the
issue of determining the length value in say characters when the prefix
contains no characters.
I am wondering if there is also a slightly subtler issue
when we are calculating a length in characters but where a part of the
length is in a different encoding from the other.
For example the prefix contains 2 UTF16 (2 byte) characters
and the content contains 2 UTF32 (4 byte) characters..
Do we just quote a length in characters regardless of
encoding. eg. 4 characters. Or is this confusing ....
2.76
.
Section 12.3.4
. When property prefixIncludesPrefixLength is ‘yes’there
are some restrictions that need to be added to enable reliable lengths
to be calculated:
o If the
prefix type is lengthKind 'implicit' or 'explicit' then the lengthUnits
properties of
both the prefix type and the element must be the same.
Kind Regards,
- Alex
Alex Wood -
Software Engineer -
WebSphere Message Broker Development
IBM DFDL Development
MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
Tel: Internal 246272, External 01962 816272
Notes: Alex Wood1/UK/IBM@IBMGB
e-mail: wooda@uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU