Proposed feature: lookup tables via simple type unions

A write up of the proposal, which we are prototyping in Daffodil, is here: https://opensource.ncsa.illinois.edu/confluence/display/DFDL/Enumerations+an... This is needed by a number of data formats we are working with where there are large enumerations having as many as 2000 members. Often these enumerations are a mixture where single values correspond to some enumerated strings, and ranges of values correspond to others. Using expressions to translate representation integers into strings is infeasible, as no constant-time case-statement-like construct is available in DFDL. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy <http://www.ogf.org/About/abt_policies.php>

Although signed-up to Confluence I couldn't see a way to comment on the proposal there. I'm not comfortable with the proposal. 1) It relies upon validation being enabled, which is an optional feature. There is a long-held principle that switching on validation does not change the behaviour of a parse, but this will do exactly that. [unionMemberSchema] String. For simple element information items, this member contains an SCD reference to the member of the union that matched the value of the element. Empty if validation is not enabled. Empty if the element's type is not a union. 2) DFDL does not allow the concept of an annotation on a simple type that is a union member. We would be allowing that, but with a completely disjoint property set. Unions; the memberTypes must be derived from the same simple type. DFDL annotations are not permitted on union members 3) The proposal does not help the case where I am not using unions but still would like enums translated into meaningful strings, a far more common situation. I think this processing is best left to either an independent post-parse step or a parser extension via a set of non-DFDL annotations. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: "dfdl-wg@ogf.org" <dfdl-wg@ogf.org> Date: 22/02/2017 17:38 Subject: [DFDL-WG] Proposed feature: lookup tables via simple type unions Sent by: "dfdl-wg" <dfdl-wg-bounces@ogf.org> A write up of the proposal, which we are prototyping in Daffodil, is here: https://opensource.ncsa.illinois.edu/confluence/display/DFDL/Enumerations+an... This is needed by a number of data formats we are working with where there are large enumerations having as many as 2000 members. Often these enumerations are a mixture where single values correspond to some enumerated strings, and ranges of values correspond to others. Using expressions to translate representation integers into strings is infeasible, as no constant-time case-statement-like construct is available in DFDL. Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com Please note: Contributions to the DFDL Workgroup's email discussions are subject to the OGF Intellectual Property Policy -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (2)
-
Mike Beckerle
-
Steve Hanson