Steve Hanson <smh@uk.ibm.com> · 24.03.2012 02:27

The whole point of this thing is to be faster, not
more general, so my

reaction is too much XPath expression complexity here.

Consider this. If the tag is some 2-character code that does NOT want

to be the same as the element names (for example because they're

digits, so they can't be the element names exactly since element names

have to begin with an alpha char. Digits also aren't useful from

readability perspective as names), then you'll need a big lookup table

in the choiceBranchRef expression that translates from the codes to

the QNames. We don't have a case statement in the expression language,

so you've just moved the big linear evaluation chain out of evaluating

choice discriminators one after another and into a big if-then-else

nest in the choiceBranchRef expression. I don't see a performance gain

here.

I suggest dropping the QName stuff, and requiring a dfdl:choiceID

property on the elements that is an NCName.  (Well, we might want

those QName functions anyway in the expression language. But I

wouldn't use them for this rapid choice dispatch feature. You could

certainly use them in discriminators. )

The expression would then have to evaluate to a value that is matched

against this choiceID. I suggest exact match, not respecting

ignoreCase for example.

That eliminates all the QName complexity and is amenable to high-speed

compact lookup table implementation.

I tend to think the element names want to be a little bit more

descriptive than these tag values would want to be so using the

element names as the tags feels undesirable to me.

Particularly because we want the tags to be conveniently computed, for

example by just grabbing a fixed-length string out of a data field.

You end up with something like this:

    <element name="tag" type="string"
dfdl:length="{ 2 }" ..../>

    <choice dfdl:choiceBranchRef="{ ../tag }">

        <element name="someName"  
  dfdl:choiceID="02" .../>

        <element name="anotherName" dfdl:choiceID="73"
.../>

        ....

   </choice>

As for the wild-card issue. I think we can finesse this. Consider this
model:

<element name="tag" type="string" dfdl:length="{
2 }" ..../>

<choice>

    <!-- fast dispatch for known record types -->

    <choice dfdl:choiceBranchRef="{ ../tag }">

        <element name="someName"  
  dfdl:choiceID="02" .../>

        <element name="anotherName" dfdl:choiceID="73"
.../>

        ....

   </choice>

   <!-- wildcard -->

   <element name="extensionRecord">

      <complexType>

        <sequence>

           <!-- keep tag copy in the extension
-->

           <element name="extensionType"
type="string"

dfdl:inputValueCalc="../../tag" .../>

           ....

        </sequence>

     </complexType>

  </element>

</choice>

The inner choice uses the fast dispatch. The outer choice lets me also

have an alternative that absorbs a more general syntax to provide some

way for a user to model their extensions to the choice set. The

extensionType field captures the tag and stores it inside the

extension record where it won't get disassociated.

The user's "extensionRecord" would not be a special DFDL wildcard

construct, just an element format they create which is general enough

to parse their extensions. Or this extension could be predefined as

part of a standard, to accept any standard-defined acceptable

extension record a user might need so long as there is some set of

rules all extension records must obey.

Given this, do we really need special wildcard constructs?

...mikeb
--

  dfdl-wg mailing list

  dfdl-wg@ogf.org

https://www.ogf.org/mailman/listinfo/dfdl-wg

From:	Steve Hanson <smh@uk.ibm.com>
To:	dfdl-wg@ogf.org,
Date:	24.03.2012 02:27
Subject:	[DFDL-WG] Action 145: 'dispatch' way of discriminating a choice for better performance