Minutes for OGF DFDL Working Group Call, April-29-2009

Open Grid Forum: Data Format Description Language Working Group OGF DFDL Working Group Call, April-29-2009 Meeting opened, 14:00 UK Attendees Steve Hanson (IBM) Mike Beckerle (Oco) Suman Kalia (IBM) Alan Powell (IBM) Apologies Dave Glick (drac) Agenda: 1. Go through actions. Actions updated below 2. LengthKind on Sequences and choices. LengthKind on sequences and choices and their parent element has proved confusing to new users of DFDL. It is proposed that lengthKind is removed from groups and only allow it to be set on parent element. See email from SH Not discussed. Action raised. Please review SH email and make comments before next call 3. Discuss UnorderedInitated email from SH Not discussed. Action raised. Please review SH email and make comments before next call 4. Infoset codepage and encoding The spec does not say what codepage and encoding is used for string fields. 5. AOB Next version (034) 6. Next call 6 May 14:00 UK Meeting closed, 15:10 UK Actions raised at this meeting No Action 040 SH: LengthKind on complex objects. 29/04: All send comment before next call 041 AP: UnorderedInitiated 29/04: All: Review for next call 042 MB: Complete variable specification. To include how properties such as encoding can be set externally. Must be a known variable name. Current Actions: No Action 012 AP/SH: Update decimalCalendarScheme 10/9: Not allocated yet 17/9: No update 24/9: Add calendar binary formats to actions 22/10: No progress 16/1: proposal distributed and discussed. Will be redistributed 21/1: add locale, 04/02: changed from locale to specific properties 18/2: Need more investigation of ICU strict/lax behaviour. 08/04: Not discussed 22/04: AP to complete asap once the ICU strict/lax behaviour is understood. 29/04: No progress 020 SH: Resolve packedDecimalSignCodes behaviour depends on NumberCheckPolicy 22/10: No progress 10/12: added how to decide to overpunch and sign position 11/02: proposal largely agreed. SH to make minor changes 18/02: AP to document unsigned type behaviour 25/02: no progress 08/04: Not discussed 22/04: SH to complete last remaining issue, which is the behaviour when logical type is signed/unsigned and the physical type is unsigned/signed. 29/04: SH had identified a problem with definition values and types in the infoset and will email proposal. DG to be asked to accelerate action 032 to see if helps 024 <No owner> String XML type 08/04: Not discussed 22/04: Need to allocate owner. Work is to describe the semantics of using dfdl:representation="xml" to model a well-formed XML fragment in an overall non-XML document described by a DFDL schema. 29/04: As no resource availbel to progress this action agreed to defer from V1. Will close next week if no objections 026 SH: Envelopes and Payloads 08/04: Not discussed explicity, but recursive use of DFDL is tied up with this 22/04: Two aspects. Firstly compositional - do sufficient mechanisms exist to model an envelope with a payload that varies. Secondly markup syntax - this might be defined in the envelope. The second of these is very much tied up with the variable markup action 028, so will be considered there. SH to verify the composition aspect. 29/04: SH and AP working on proposal. related to Action 028 027 SH: Property precedence tables 08/04: Not discussed 22/04: Two things missing from the existing precedence trees. Firstly, does not show alternates (eg, initiator v initiatorkind). Secondly, need a tree per concrete DFDL object (eg, element). SH to update. 29/04: No progress 028 SH: Variable markup 08/04: Discussed briefly at end of call, IBM to see whether there any use cases that require recursive use of DFDL. 15/04: Use case was distributed and will be discussed on next call. 22/04: The use case in question is EDI where the terminating markup for the payload segments is defined in the ISA envelope segment. The markup is modelled as an element of simple type where the allowable markup values are defined as enums on the type. But we need to handle two cases - firstly where the envelope is present, so the value used by the payload is taken from the envelope. Secondly where only the payload is present. Here we need a way of scanning for all the enum values, and adopting the one we actually find, when parsing. And using a default when unparsing. SH to explore use of a DFDL variable, where the variable has a default, but also has a type that is the same as the markup element - that way we get to use the enums without defining everything twice. 29/04: SH and AP working on proposal. 029 MB: valueCalc (output length calculation) 08/04: Not discussed 22/04: Action allocated to MB, this is to complete the work started at the Hursley WG F2F meeting. 29/04: No progress 032 DG: Investigate compatibility between DFDL infoset and XDM 08/04: No update 22/04: No update 29/04: No update 033 AP/TK: Assert/Discriminator semantics. AP to document. TK to check uses of discriminator besides choice. 08/04: In progress within IBM 22/04: Waiting for TK to return from leave to complete. 29/04: TK has sent examples shown need for discriminators beyond choice. Agreed. MB to respond to TK 036 SH: Provide use case for floating component in a sequence 08/04: Raised 15/04: Use case sent and discussed. SH to do further investigation 22/04: IBM feedback from WTX team is that alternate suggested ways of modelling the EDI floating NTE segment have significant usability issues. The DFDL principle is that for a problem that can be expressed as two-layered, then two DFDL models are needed. The EDI NTE segment does not fall into this though, as its use is on a per sequence basis. Ongoing. 29/04: Agreed that need to be in V1. SH to make a proposal 037 All: Approach for XML Schema 1.0 UPA checks. 22/04: Several non-XML models, when expressed in their most obvious DFDL Schema form, would fail XML Schema 1.0 Unique Particle Attribution checks that police model ambiguity. And even re-jigging the model sometimes fails to fix this. Note this is equally applicable to XMl Schema 1.1 and 1.0. While the DFDL parser/unparser can happily resolve the ambiguities, the issue is one of definition. If an XSD editor that implements UPA checks is used to create DFDL Schema, then errors will be flagged. DFDL may have to adopt the position that: a)DFDL parser/unparser will not implement some/all UPA checks (exact checks tbd) b) XML Schema editors that implement UPA checks will not be suitable for all DFDL models c) If DFDL annotations are removed, the resulting pure XSD will not always be valid (ie, the equivalent XML is ambiguous and can't be modelled by XML Schema 1.0) Ongoing in case another solution can be found. 29/04: Will ask DG and S Gao for oppinion before closing 038 MB: Submit response to OMG RFI for non-XML standardization 22/04: First step is for MB to mail the OGF Data Area chair to say that we want to submit 29/04: MB has been in contact with OMG and will sunbit dfdl. 039 SKK: Approach for creating Schema-For-DFDL xsds. 22/04: Resolve issue around multiple declarations needed for DFDL properties, perhaps using MB's meta approach 29/04: Don't like qualified attributes in long form. SKK to check there are no code gen implications, eg EMF. Closed actions: 025 AP: Escape schemes 21/1: discussed requirements 04/02: AP/SH to describe behaviour for known length text fields. Need to discuss if comment escapes should be supported. 11/02 new draft distributed: 18/02: SH up document concerns 25/02: SH and AP have refined proposal ready for approval. 04/03: SH and AP have further refined proposal. 11/03: discussed. suggested a simplified proposal be evaluated. 18/03: SH and AP had further discussions on simplified proposal 08/04: See minutes, review in detail for next call 15/04: See minutes, review for next call 22/04: MB mailed answers to the mailing list in response to AP's last few questions. Following agreed: 1.Should data containing the escapeEscapeCharacater cause escaping to be used if if so how should it be escaped. EEC alone isn't an active character. it has to be followed by the EC to be interpreted at all. That said, if the pair EEC EC appears in the data, then yes, we must escape the EC, with another EEC. 2.Should we only look for escapeStartString at the beginning of the data Yes, we will be restrictive/conservative for v1.0 3.Property names (everyone has their own favourite so lets just pick one.) Only changes areescapeBlockStart and escapeBlockEnd. AP to incorporate the agreed scheme into draft 0.34. 29/04: closed. Moved to workitems 034 AP: Remove redundant properties, correct old examples 08/04: No update 22/04: In progress as part of draft 0.34. 29/04: closed. Moved to work item 035 AP: Add validation ranges to spec, update specialized annotations in spec. 08/04: Raised. For draft 0.34 22/04: In progress as part of draft 0.34. 29/04: closed. Moved to work item Work items: No Item target version status 001 String XML type (Ian P) - Apr 30, 2008 002 Escape schemes (Ian P) - Apr 30, 2008 034 003 Variables - ??, 2008 (Mike) 005 Improvements on property descriptions - ??, 2008 (All - split TBD) 006 Envelopes and Payloads (Steve) - Apr 30, 2008 007 (from draft 32) valueCalc (Mike) - ??, 2008 mostly complete 008 (from draft 32) Property precedence for writing (Steve) - under review 009 (from draft 32) Variable markup (Steve) - Mar 31, 2008 proposal needs writing up 010 (from draft 32) Assertions, discriminators and choice, including discussion of timing option (Suman) - Mar 31, 2008 (A033) 034 in progress 011 (from draft 32) How speculative parsing works (combining choice and variable-occurence - currently these are separate) ??, 2008 (IBM) in progress 012 (from draft 32) Reordering the properties discussion: move representation earlier, improve flow of topics ??, 2008 (Alan) not started 025 Augmented infoset and unparsing (Alan) 034 added but needs work 026 Remove duration 034 027 Calendar schemes 034 028 Validation ranges (A035) 034 029 Decimals (A020) - document unsigned type behaviour - packedDecimalSignCodes behaviour depends on NumberCheckPolicy 034 030 Remove redundant properties, fix examples. (A036) 034 031 Specialized annotations 034 032 Floating components 033 Specialized annotations 034 Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

4. Infoset codepage and encoding The spec does not say what codepage and encoding is used for string fields. I wanted to comment on this. There are three choices here: 1. unicode codepoints - we may need to preserve the mapping table (from representation encoding to unicode) as part of the infoset. 2. "As Encoded" codepoints - we must add the encoding to the infoset. 3. Both In favor of unicode codepoints - simplicity. Minor issue is that some mappings will lose information making perfect round-tripping of string contents impossible. E.g., EBCDIC has two different line-endings both of which normally are translated to ASCII/Unicode linefeed. Hence, translating back is ambiguous. In favor of "as encoded" - simplicity. We just add an encoding attribute to the string infoset object which returns the information that the dfdl:encoding representation property contained. Note that the encoding information really is already available via the schema component associated with the string, so there is some redundancy here. Also, there's the issue when dealing with this of whether one wants codepoints, or raw access to the bytes. E.g., if the encoding is UTF-8 or shifted JIS, then the characters take up 1 or more bytes. Do you want the bytes, or the interpreted code points or both? In favor of "both" - complexity, but eliminates all the ambiguity. My suggestion: keep it simple for v1.0 - Choose number 1 - because we can always expand the capabilities later by providing access to the unencoded representation one way or another. If you badly need infoset-level contents which expose the actual representation character codes, you can always model this as an array of bytes instead of a character string. ...mike Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc. Tel: 781-810-2125 | 100 Fifth Ave., 4th Floor, Waltham MA 02451 | <mailto:mbeckerle.dfdl@gmail.com> mbeckerle.dfdl@gmail.com

There is a 4th option - remain silent and leave it up to the implementation. Reason: Within IBM we have different products that will embed DFDL parser/unparser. WMB requires strings in UTF-16, that is not always the case for others. Regards Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 "Mike Beckerle" <mbeckerle.dfdl@gmail.com> Sent by: dfdl-wg-bounces@ogf.org 05/05/2009 14:09 Please respond to mbeckerle.dfdl@gmail.com To Alan Powell/UK/IBM@IBMGB, <dfdl-wg@ogf.org> cc Subject [DFDL-WG] Infoset codepage 4. Infoset codepage and encoding The spec does not say what codepage and encoding is used for string fields. I wanted to comment on this. There are three choices here: 1. unicode codepoints - we may need to preserve the mapping table (from representation encoding to unicode) as part of the infoset. 2. "As Encoded" codepoints - we must add the encoding to the infoset. 3. Both In favor of unicode codepoints - simplicity. Minor issue is that some mappings will lose information making perfect round-tripping of string contents impossible. E.g., EBCDIC has two different line-endings both of which normally are translated to ASCII/Unicode linefeed. Hence, translating back is ambiguous. In favor of "as encoded" - simplicity. We just add an encoding attribute to the string infoset object which returns the information that the dfdl:encoding representation property contained. Note that the encoding information really is already available via the schema component associated with the string, so there is some redundancy here. Also, there's the issue when dealing with this of whether one wants codepoints, or raw access to the bytes. E.g., if the encoding is UTF-8 or shifted JIS, then the characters take up 1 or more bytes. Do you want the bytes, or the interpreted code points or both? In favor of "both" - complexity, but eliminates all the ambiguity. My suggestion: keep it simple for v1.0 - Choose number 1 - because we can always expand the capabilities later by providing access to the unencoded representation one way or another. If you badly need infoset-level contents which expose the actual representation character codes, you can always model this as an array of bytes instead of a character string. ...mike Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc. Tel: 781-810-2125 | 100 Fifth Ave., 4th Floor, Waltham MA 02451 | mbeckerle.dfdl@gmail.com -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

How about we specify unicode codepoints but implementations can have limitations on the numeric range of codepoints. Reason: keeps us out of the codepoints vs. encodings morass. ...mikeb On May 5, 2009, at 10:20 AM, Steve Hanson <smh@uk.ibm.com> wrote:
There is a 4th option - remain silent and leave it up to the implementation.
Reason: Within IBM we have different products that will embed DFDL parser/unparser. WMB requires strings in UTF-16, that is not always the case for others.
Regards
Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848
"Mike Beckerle" <mbeckerle.dfdl@gmail.com> Sent by: dfdl-wg-bounces@ogf.org 05/05/2009 14:09
Please respond to mbeckerle.dfdl@gmail.com
To Alan Powell/UK/IBM@IBMGB, <dfdl-wg@ogf.org> cc Subject [DFDL-WG] Infoset codepage
4. Infoset codepage and encoding
The spec does not say what codepage and encoding is used for string fields. I wanted to comment on this.
There are three choices here: 1. unicode codepoints - we may need to preserve the mapping table (from representation encoding to unicode) as part of the infoset. 2. "As Encoded" codepoints - we must add the encoding to the infoset. 3. Both In favor of unicode codepoints - simplicity. Minor issue is that some mappings will lose information making perfect round-tripping of string contents impossible. E.g., EBCDIC has two different line-endings both of which normally are translated to ASCII/Unicode linefeed. Hence, translating back is ambiguous.
In favor of "as encoded" - simplicity. We just add an encoding attribute to the string infoset object which returns the information that the dfdl:encoding representation property contained. Note that the encoding information really is already available via the schema component associated with the string, so there is some redundancy here. Also, there's the issue when dealing with this of whether one wants codepoints, or raw access to the bytes. E.g., if the encoding is UTF-8 or shifted JIS, then the characters take up 1 or more bytes. Do you want the bytes, or the interpreted code points or both?
In favor of "both" - complexity, but eliminates all the ambiguity.
My suggestion: keep it simple for v1.0 - Choose number 1 - because we can always expand the capabilities later by providing access to the unencoded representation one way or another.
If you badly need infoset-level contents which expose the actual representation character codes, you can always model this as an array of bytes instead of a character string.
...mike
Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc. Tel: 781-810-2125 | 100 Fifth Ave., 4th Floor, Waltham MA 02451 | mbeckerle.dfdl@gmail.com -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Isn't choice 2 the most flexible? The caller can convert to what they need. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 From: DFDL <mbeckerle.dfdl@gmail.com> To: Steve Hanson/UK/IBM@IBMGB Cc: Alan Powell/UK/IBM@IBMGB, "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, "dfdl-wg-bounces@ogf.org" <dfdl-wg-bounces@ogf.org> Date: 05/05/2009 15:35 Subject: Re: [DFDL-WG] Infoset codepage How about we specify unicode codepoints but implementations can have limitations on the numeric range of codepoints. Reason: keeps us out of the codepoints vs. encodings morass. ...mikeb On May 5, 2009, at 10:20 AM, Steve Hanson <smh@uk.ibm.com> wrote: There is a 4th option - remain silent and leave it up to the implementation. Reason: Within IBM we have different products that will embed DFDL parser/unparser. WMB requires strings in UTF-16, that is not always the case for others. Regards Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: smh@uk.ibm.com Phone (+44)/(0) 1962-815848 "Mike Beckerle" <mbeckerle.dfdl@gmail.com> Sent by: dfdl-wg-bounces@ogf.org 05/05/2009 14:09 Please respond to mbeckerle.dfdl@gmail.com To Alan Powell/UK/IBM@IBMGB, <dfdl-wg@ogf.org> cc Subject [DFDL-WG] Infoset codepage 4. Infoset codepage and encoding The spec does not say what codepage and encoding is used for string fields. I wanted to comment on this. There are three choices here: 1. unicode codepoints - we may need to preserve the mapping table (from representation encoding to unicode) as part of the infoset. 2. "As Encoded" codepoints - we must add the encoding to the infoset. 3. Both In favor of unicode codepoints - simplicity. Minor issue is that some mappings will lose information making perfect round-tripping of string contents impossible. E.g., EBCDIC has two different line-endings both of which normally are translated to ASCII/Unicode linefeed. Hence, translating back is ambiguous. In favor of "as encoded" - simplicity. We just add an encoding attribute to the string infoset object which returns the information that the dfdl:encoding representation property contained. Note that the encoding information really is already available via the schema component associated with the string, so there is some redundancy here. Also, there's the issue when dealing with this of whether one wants codepoints, or raw access to the bytes. E.g., if the encoding is UTF-8 or shifted JIS, then the characters take up 1 or more bytes. Do you want the bytes, or the interpreted code points or both? In favor of "both" - complexity, but eliminates all the ambiguity. My suggestion: keep it simple for v1.0 - Choose number 1 - because we can always expand the capabilities later by providing access to the unencoded representation one way or another. If you badly need infoset-level contents which expose the actual representation character codes, you can always model this as an array of bytes instead of a character string. ...mike Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc. Tel: 781-810-2125 | 100 Fifth Ave., 4th Floor, Waltham MA 02451 | mbeckerle.dfdl@gmail.com -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

The problem with choice 2 is that when you have a string with an encoding, then there's the issue of what do you encounter when you index into the string at say position 3. Do you get the 3rd byte of the encoding? or is the encoding somehow decoded into individual character codepoints ... but for many encodings that's not crisply defined. If we go with choice 2 we should flat out say that the string is an array of bytes representing a string by way of the encoding. There's a variation we didn't explore which is that implementations can supply the strings in whatever form they want. But they make the encoding available. This allows an implementation to provide say, UTF-16 always, if it chooses. I'm in favor of the simplest possible thing here. So, for example, if you guys have a UTF-16 constraint, then I'd be happy just picking that as the encoding that is always used by the infoset. ...mike Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc. Tel: 781-810-2125 | 100 Fifth Ave., 4th Floor, Waltham MA 02451 | <mailto:mbeckerle.dfdl@gmail.com> mbeckerle.dfdl@gmail.com _____ From: Alan Powell [mailto:alan_powell@uk.ibm.com] Sent: Tuesday, May 05, 2009 11:14 AM To: DFDL Cc: dfdl-wg@ogf.org; dfdl-wg-bounces@ogf.org; Steve Hanson Subject: Re: [DFDL-WG] Infoset codepage Isn't choice 2 the most flexible? The caller can convert to what they need. Alan Powell MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England Notes Id: Alan Powell/UK/IBM email: alan_powell@uk.ibm.com Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898 From: DFDL <mbeckerle.dfdl@gmail.com> To: Steve Hanson/UK/IBM@IBMGB Cc: Alan Powell/UK/IBM@IBMGB, "dfdl-wg@ogf.org" <dfdl-wg@ogf.org>, "dfdl-wg-bounces@ogf.org" <dfdl-wg-bounces@ogf.org> Date: 05/05/2009 15:35 Subject: Re: [DFDL-WG] Infoset codepage _____ How about we specify unicode codepoints but implementations can have limitations on the numeric range of codepoints. Reason: keeps us out of the codepoints vs. encodings morass. ...mikeb On May 5, 2009, at 10:20 AM, Steve Hanson < <mailto:smh@uk.ibm.com> smh@uk.ibm.com> wrote: There is a 4th option - remain silent and leave it up to the implementation. Reason: Within IBM we have different products that will embed DFDL parser/unparser. WMB requires strings in UTF-16, that is not always the case for others. Regards Steve Hanson Programming Model Architect WebSphere Message Brokers Hursley, UK Internet: <mailto:smh@uk.ibm.com> <mailto:smh@uk.ibm.com> smh@uk.ibm.com Phone (+44)/(0) 1962-815848 "Mike Beckerle" < <mailto:mbeckerle.dfdl@gmail.com> mbeckerle.dfdl@gmail.com> Sent by: <mailto:dfdl-wg-bounces@ogf.org> <mailto:dfdl-wg-bounces@ogf.org> dfdl-wg-bounces@ogf.org 05/05/2009 14:09 Please respond to <mailto:mbeckerle.dfdl@gmail.com> <mailto:mbeckerle.dfdl@gmail.com> mbeckerle.dfdl@gmail.com To Alan Powell/UK/IBM@IBMGB, < <mailto:dfdl-wg@ogf.org> dfdl-wg@ogf.org> cc Subject [DFDL-WG] Infoset codepage 4. Infoset codepage and encoding The spec does not say what codepage and encoding is used for string fields. I wanted to comment on this. There are three choices here: 1. unicode codepoints - we may need to preserve the mapping table (from representation encoding to unicode) as part of the infoset. 2. "As Encoded" codepoints - we must add the encoding to the infoset. 3. Both In favor of unicode codepoints - simplicity. Minor issue is that some mappings will lose information making perfect round-tripping of string contents impossible. E.g., EBCDIC has two different line-endings both of which normally are translated to ASCII/Unicode linefeed. Hence, translating back is ambiguous. In favor of "as encoded" - simplicity. We just add an encoding attribute to the string infoset object which returns the information that the dfdl:encoding representation property contained. Note that the encoding information really is already available via the schema component associated with the string, so there is some redundancy here. Also, there's the issue when dealing with this of whether one wants codepoints, or raw access to the bytes. E.g., if the encoding is UTF-8 or shifted JIS, then the characters take up 1 or more bytes. Do you want the bytes, or the interpreted code points or both? In favor of "both" - complexity, but eliminates all the ambiguity. My suggestion: keep it simple for v1.0 - Choose number 1 - because we can always expand the capabilities later by providing access to the unencoded representation one way or another. If you badly need infoset-level contents which expose the actual representation character codes, you can always model this as an array of bytes instead of a character string. ...mike Mike Beckerle | OGF DFDL WG Co-Chair | CTO | Oco, Inc. Tel: 781-810-2125 | 100 Fifth Ave., 4th Floor, Waltham MA 02451 | <mailto:mbeckerle.dfdl@gmail.com> mbeckerle.dfdl@gmail.com -- dfdl-wg mailing list <mailto:dfdl-wg@ogf.org> <mailto:dfdl-wg@ogf.org> dfdl-wg@ogf.org <http://www.ogf.org/mailman/listinfo/dfdl-wg> <http://www.ogf.org/mailman/listinfo/dfdl-wg> http://www.ogf.org/mailman/listinfo/dfdl-wg _____ Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU _____ Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (4)
-
Alan Powell
-
DFDL
-
Mike Beckerle
-
Steve Hanson