Re: Enums - Re: split into multiple topics - Re: [dfdl-wg] Issues: additional data types

6 Sep 2005

      The MRM model COBOL importer gives the user an option to create
xs:enumeration values from level 88 values, and that is all.  This is so
the MRM parser can validate a byte stream against the model. This is in
keeping with the rule that values supplied with metadata are part of the
logical model, not the physical model. What's the motivation for doing
anything more than this?

Using your example that gives:

<xs:element  name="AS-COM-TRAN-ID">
   <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:enumeration value="CP80"/>
            <xs:enumeration value="IC40"/>
...
        </xs:restriction>
    </xs:simpleType>
</xs:element>

Regards, Steve

Steve Hanson
WebSphere Business Integration Brokers,
IBM Hursley, England
Internet: smh@uk.ibm.com
Phone (+44)/(0) 1962-815848

             Mike Beckerle                                                 
             <beckerle@us.ibm.                                             
             com>                                                       To 
             Sent by:                  dfdl-wg@gridforum.org               
             owner-dfdl-wg@ggf                                          cc 
             .org                                                          
                                                                   Subject 
                                       Enums - Re: split into multiple     
             06/09/2005 17:40          topics - Re: [dfdl-wg] Issues:      
                                       additional data types               

About enums. Here's starting thoughts:

Here's a real-world example from COBOL:

01  AS-CPST-REC.
06  AS-CPCOM.
        09  AS-COM-STORE-TYPE           PIC X.
        09  AS-COM-STORE-NO             PIC 9(05).
        09  AS-COM-TRAN-ID              PIC X(04).
                88  TRAN-COUPON          VALUE 'CP80'.
                88  TRAN-REVENUE         VALUE 'IC40'
                                               'RA40'.
                88  TRAN-SALES           VALUE 'IC40'.
                88  TRAN-DELIVER         VALUE 'IC44'.
                88  TRAN-RENTS           VALUE 'RA40'
                                               'RA42'.
                88  TRAN-RENT-RETURN         VALUE 'RA41'.
        09  AS-COM-QUANTITY             PIC S9(05).
        09  AS-COM-PART-NO              PIC 9(06).
        .... more fields elided ....

Those "88" entries in there are enumerated constants. Note that for the
TRAN-REVENUE and TRAN-RENTS constants, multiple values are associated with
the same name. On reference this means that the constant matches either
value. When written, this means the first value is used. Cobol doesn't
strongly associate these enumerated values with the field to which they can
be assigned, but usually it's obvious. In this case it is the
AS-COM-TRAN-ID field which is the 4-character-long string which has the
string constants associated with it.

This particular example is a common one. The record has variant structure
(not shown above) depending on the tag field which is this AS-COM-TRAN-ID
field.

Working out how we want this example to work in XSD is the first step.

I like using a hidden field here. For example: Here's a possible idea for
how this is represented in XSD:

<xs:element  name="AS-COM-TRAN-ID"> <!-- logical tag element -->
   <xs:simpleType>
        <xs:restriction base="xs:NCName">
            <xs:enumeration value="TRAN-COUPON"/>
            <xs:enumeration value="TRAN-REVENUE"/>
            <xs:enumeration value="TRAN-SALES"/>
            <xs:enumeration value="TRAN-DELIVER"/>
           <xs:enumeration value="TRAN-RENTS"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>
<xs:sequence>
   <xs:annotation>
     <xs:appinfo source="http://dataformat.org/">
         <xs:layer name="rep" type="AS-COM-TRAN-ID-repType"/> <!-- hidden
physical tag rep -->
    </xs:appinfo>
  </xs:annotation>
</xs:sequence>

<xs:simpleType name="AS-COM-TRAN-ID-repType">
     <xs:restriction base="xs:string>
         <xs:enumeration value="CP80"/>
         <xs:enumeration value="IC40"/> <!-- rest of the tag values elided
here -->
         ....
      </xs:restriction>
</xs:simpleType>

Now in an annotation (not shown above) on the element AS-COM-TRAN-ID there
would be a dfdl:valueCalc property which would compute the value of
AS-COM-TRAN-ID based on the value of the hidden field. Symmetrically, the
'rep' hidden field would have a dfdl:repCalc property which would give the
inverse formula for output.

One difficulty I have with this is the notion that we're projecting into
the string type. I.e., these symbolic constants aren't names for integers,
but rather we're expressing operations on strings. In the above example the
enumerated constants actually are strings, but in other examples they would
be integers. The next tier of interpretation, i.e., where we're decidng the
variant based on the value of AS-COM-TRAN-ID would be expressed as string
comparisons which is potentially inefficient. This is part of the problem
with using XSD as our type system basis. XSD doesn't have a notion of
symbolic named constant.

Alternative: There is the DTD named entity stuff. Does anybody want to
propose that?

Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA

 Mike                                                                      
 Beckerle/Worcester/IBM@IBMUS                                              

 Sent by:                                                               To 
 owner-dfdl-wg@ggf.org                    "Robert E. McGrath"              
                                          <mcgrath@ncsa.uiuc.edu>          
                                                                        cc 
 09/02/2005 04:34 PM                      dfdl-wg@gridforum.org,           
                                          owner-dfdl-wg@ggf.org            
                                                                   Subject 
                                          split into multiple topics - Re: 
                                          [dfdl-wg] Issues: additional     
                                          data types                       

I'd like to split this topic into several distinct ones:

Arrays - I have a placeholder for this in the doc.

Opaque and "code" types are separate. This is related also to the concept
of "open content".

Enums

Bitfields

Pointers

Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA

 "Robert E. McGrath"                                                       
 <mcgrath@ncsa.uiuc.edu>                                                   
 Sent by: owner-dfdl-wg@ggf.org                                            
                                                                        To 
                                                     dfdl-wg@gridforum.org 
 09/02/2005 03:13 PM                                                    cc 

                                                                   Subject 
                                                     [dfdl-wg] Issues:     
                                                     additional data types 

Greetings,

Here is an "issue" for the DFDL: additional data types that should
be considered.

Please see attached.

---
Robert E. McGrath
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign
Champaign, Illinois 61820
(217)-333-6549

mcgrath@ncsa.uiuc.edu (See attached file: DT.htm)