Re: [DFDL-WG] Expressions too restrictive? Example from IPFIX format

13 Nov 2012

      Mike

I think more complex predicates is something for the next release of DFDL. 

For this particular format, you at least have the length of the record set 
so you can parse the record fields as xs:hexBinary and then apply a schema 
generated from the template in a post-parse step. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848

From:   Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:     dfdl-wg@ogf.org, 
Date:   12/11/2012 21:07
Subject:        [DFDL-WG] Expressions too restrictive? Example from IPFIX 
format
Sent by:        dfdl-wg-bounces@ogf.org

I'm looking at RFC 5101 and RFC 5102

These describe a dense binary file format for observing network flows. The 
application is related to network security.

This format has a 'meta' structure to it that I'm not sure how to deal 
with in DFDL currently.

Here's the problem as succinctly as I can make it.

One configures some network information capture tools to capture some 
information. This is flexible, so what is captured can be quite variable.

The resulting data stream contains first sets of templates, then sets of 
data records, then more templates, then more records, and so on. 

The templates each are identified by an ID number which is an integer from 
256 to 65534. A template then contains a count of how many fields are in 
the template, and then a field descriptor for each field, which includes 
the length of the field in octets.

Each data record set begins with the ID of its template, then a total 
length in octets for the set, then the records, which are just field, 
field, field, each as described by the associated template with no record 
separators or anything. 

Now, I can model a template in DFDL as an array of field descriptors.

I can also model a data record set as a templateID and an array of data 
records, and each data record as an array of fields, where the number of 
field occurrences is given in the template, and each field is a byte array 
of occurs count given in the corresponding template's field descriptor for 
that occursIndex. 

The one thing I can't figure out how to do is to create an XPath to the 
right template given the template ID in the data record set header. 

The problem, is that the TemplateID is not an array index. The templates 
might have arbitrary ID numbers. They might not be say, 256, 257, etc. in 
order. They could be scattered, etc. The standard only requires that they 
are unique IDs. So the set of templates is truly a set with these 
identifiers. 

So I need a way to write an XPath that would choose the template from the 
set, whose ID matches a particular integer value from the data record 
set's header.

Right now I think our XPath subset doesn't allow this. We can only index 
arrays with integers, and we have no searching capability that processes a 
set of nodes to identify a node having any particular value or 
characteristic.

So: are we being too strict here. Should we allow somewhat more complex 
predicates? such as { ...../template[idField eq $templateID]  }

(For speed reasons, I might not do that lookup each time I parse a data 
record. I might pre-fill an array of 65535 structures by way of 
inputValueCalc so that I can actually use the template IDs as indexes into 
that array. But either way I need the select from matching set 
capability.)

In general, templates can be interleaved in IPFix in that the requirement 
is just that they are transmitted before data record sets that reference 
them. So in general, an application that reads IPFix data cannot say, 
first scrape off all the templates and generate a schema for them, and 
then use that schema to parse. The late availability of the templates is 
something that is inherent in the format.

...mikeb

-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412
--
  dfdl-wg mailing list
  dfdl-wg@ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU