Re: [DFDL-WG] Fw: Hidden elements - summary of approaches

All Proposed section on hidden sequences: The current dfdl:hidden annotation section is removed 14.5 Hidden Sequence Groups Some fields in the physical stream provide information about other fields in the stream and are not really part of the data. For example, a field could give the number of repeats in a following array. These fields may not be of interest to an application so may be removed from the Infoset on parsing by marking them as hidden. A hidden sequence group allows fields to be defined that will not be added to the infoset on parsing and will not be expected in the Infoset on unparsing. <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:element name="firstElement" type="xs:int" <xs:sequence> <dfdl:sequence hiddenGroupRef="tns:hiddenRepeatCount"> </xs:sequence> <xs:element name="arrayElement" type="xs:int" minOccurs="0" maxOccurs="unbounded" dfdl:occursCountKind=?expression? dfdl:occurCount= ?{./repeatCount}? /> </xs:sequence> </xs:complexType> </xs:element> <xs:group name="hiddenRepeatCount" > <xs:sequence> <xs:element name="repeatCount" type=int dfdl:outputValueCalc=?{count(./arrayElement)}? dfdl:representation=?binary? dfdl:lengthKind=?implicit? /> </xs:sequence> </xs:group> Hidden elements within a hidden sequence can be referenced via path expressions using the same DFDL expression that we would have if it were not hidden. Hidden elements can (typically will) contain the regular DFDL annotations to define their physical properties and on unparsing to set their value. They are processed using the same behavior as non-hidden elements. When the dfdl:hiddenGroupRef property is specified, all other DFDL are ignored. It is a schema definition error if the sequence is not empty. A hidden sequence may appear within another hidden sequence. Property Name Description hiddenGroupRef QName Reference to a global model group definition that defines the hidden element or elements. The model group within the model group definition must be a sequence Annotation: dfdl:sequence Table 11 Hidden sequence properties Regards Alan Powell Development - MQSeries, Message Broker, ESB IBM Software Group, Application and Integration Middleware Software ------------------------------------------------------------------------------------------------------------------------------------------- IBM MP211, Hursley Park Hursley, SO21 2JN United Kingdom Phone: +44-1962-815073 e-mail: alan_powell@uk.ibm.com From: Steve Hanson/UK/IBM To: remcgrat@illinois.edu Cc: alejandr@ncsa.illinois.edu, Suman Kalia/Toronto/IBM@IBMCA, Alan Powell/UK/IBM@IBMGB, Stephanie Fetzer/Charlotte/IBM@IBMUS, Tim Kimber/UK/IBM@IBMGB, Sandy Gao/Toronto/IBM@IBMCA Date: 08/09/2010 17:13 Subject: Fw: Hidden elements - summary of approaches Hi Bob Alejandro included two extensions to the DFDL hidden syntax. Here's the current spec syntax for making a local element called 'repeat count' hidden (exactly same syntax for element ref, sequence, choice, or group ref) <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:sequence> <xs:annotation><xs:appinfo source=http://www.ogf.org/dfdl/" /> <dfdl:hidden groupref="tns:hiddenRepeatCount"> </xs:appinfo></xs:annotation> </xs:sequence> <xs:element name="array" type="xs:string" maxOccurs="unbounded" dfdl:occursCountKind=?expression? dfdl:occurCount= ?{./repeatCount}? /> </xs:sequence> </xs:complexType> </xs:element> <xs:group name="hiddenRepeatCount" > <xs:sequence> <xs:element name="repeatCount" type="int" dfdl:representation=?binary? dfdl:lengthKind=?implicit? /> </xs:sequence> </xs:group> 1) Hiding a local element Here's what I think Alejandro has added in Daffodil, as an optimised syntax for hiding a local element. <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:sequence> <xs:annotation><xs:appinfo source=http://www.ogf.org/dfdl/" /> <dfdl:hidden elementref="tns:repeatCount"> </xs:appinfo></xs:annotation> </xs:sequence> <xs:element name="array" type="xs:string" maxOccurs="unbounded" dfdl:occursCountKind=?expression? dfdl:occurCount= ?{./repeatCount}? /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="repeatCount" type="int" dfdl:representation=?binary? dfdl:lengthKind=?implicit? /> There's a restriction though - it can only be used when minOccurs=maxOccurs=1, because we have actually lost the XML Schema particle. There's no loss of DFDL semantic as far as I am aware, because we do not have particle-specific properties. Applying my proposed simplified syntax to Alejandro's optimisation gives the syntax below. <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:sequence dfdl:hiddenElementRef="tns:repeatCount" /> <xs:element name="array" type="xs:string" maxOccurs="unbounded" dfdl:occursCountKind=?expression? dfdl:occurCount= ?{./repeatCount}? /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="repeatCount" type="int" dfdl:representation=?binary? dfdl:lengthKind=?implicit? /> This is quite compact, and does make the act of hiding easier. But it can only be used under specific circumstances. Requires new property dfdl:hiddenElementRef, or (better) we rename dfdl:hiddenGroupRef to dfdl:hiddenRef and allow it to point at groups or elements. It also allows the hiding of an element reference with minOccurs=maxOccurs=1 and no explicit DFDL properties to be achieved without creating a global group. (I can hide a group reference with no explicit DFDL properties in this manner today). 2) Hiding a local choice If the object to be hidden is a choice, I think Alejandro is allowing the xs:choice to be the content of the global group, instead of requiring the sequence to wrap it. <xs:group name="hiddenRepeatCount" > <xs:choice> <xs:element name="repeatCount" type="int" dfdl:representation=?binary? dfdl:lengthKind=?implicit? /> <xs:element name="repeatString" type="int" dfdl:representation=?text? dfdl:lengthKind=?explicit? dfdl:length="10" /> </xs:choice> </xs:group> Makes sense - if I was hiding a local sequence, I wouldn't bother to wrap the sequence in yet another sequence in the global group, so why do so with a choice? Thoughts welcome. Personally I like both. Regards Steve Hanson Strategy, Common Transformation & DFDL Co-Chair, OGF DFDL WG IBM SWG, Hursley, UK, smh@uk.ibm.com, tel +44-(0)1962-815848 ----- Forwarded by Steve Hanson/UK/IBM on 08/09/2010 15:54 ----- From: Steve Hanson/UK/IBM To: Sandy Gao/Toronto/IBM@IBMCA Cc: Alan Powell/UK/IBM@IBMGB, Michael Hudson/Boca Raton/IBM@IBMUS, Richard Schofield/UK/IBM@IBMGB, Stephanie Fetzer/Charlotte/IBM@IBMUS, Suman Kalia/Toronto/IBM@IBMCA, Tim Kimber/UK/IBM@IBMGB, dfdl-wg@ogf.org Date: 08/09/2010 10:59 Subject: Hidden elements - summary of approaches Let's state the two options being considered, as I said I'd do this for the wider DFDL WG for the call today: 1) Global group approach Summary: Particle to hide can be a local element, element ref, local sequence, local choice or group ref Particle is removed from its parent into a dedicated global group of composition sequence and replaced in the parent by a new empty local sequence The new empty local sequence carries a dfdl:hidden annotation that has a property dfdl:groupRef, other DFDL properties are not allowed Alternatively, the new empty local sequence carries a dfdl:hiddenGroupRef property, other DFDL properties are not allowed Pros: Removal of all DFDL annotations and use of the resultant pure XSD results in same infoset Global group can be reused Cons: Making something hidden is a refactor operation Global group sequence needs DFDL properties setting correctly 2) Hidden flag approach Summary: Particle to hide can be a local element, element ref Particle takes a dfdl:hidden property xs:minOccurs MUST be 0 A dfdl:minOccurs property takes the place of xs:minOccurs. Pros: Easy to make something hidden Cons: Removal of all DFDL annotations and using pure XSD does not guarantee the same infoset Breaks validation Duplication of minOccurs property Have to wrap a local sequence, choice or group ref in a complex element in order to hide it (they can't take minOccurs = 0) Regards Steve Hanson Strategy, Common Transformation & DFDL Co-Chair, OGF DFDL WG IBM SWG, Hursley, UK, smh@uk.ibm.com, tel +44-(0)1962-815848 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
participants (1)
-
Alan Powell