Mike
I think more complex predicates is something
for the next release of DFDL.
For this particular format, you at least
have the length of the record set so you can parse the record fields as
xs:hexBinary and then apply a schema generated from the template in a post-parse
step.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF
DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
dfdl-wg@ogf.org,
Date:
12/11/2012 21:07
Subject:
[DFDL-WG] Expressions
too restrictive? Example from IPFIX format
Sent by:
dfdl-wg-bounces@ogf.org
I'm looking at RFC 5101 and RFC 5102
These describe a dense binary file format for observing network flows.
The application is related to network security.
This format has a 'meta' structure to it that I'm not sure how to deal
with in DFDL currently.
Here's the problem as succinctly as I can make it.
One configures some network information capture tools to capture some information.
This is flexible, so what is captured can be quite variable.
The resulting data stream contains first sets of templates, then sets of
data records, then more templates, then more records, and so on.
The templates each are identified by an ID number which is an integer from
256 to 65534. A template then contains a count of how many fields are in
the template, and then a field descriptor for each field, which includes
the length of the field in octets.
Each data record set begins with the ID of its template, then a total length
in octets for the set, then the records, which are just field, field, field,
each as described by the associated template with no record separators
or anything.
Now, I can model a template in DFDL as an array of field descriptors.
I can also model a data record set as a templateID and an array of data
records, and each data record as an array of fields, where the number of
field occurrences is given in the template, and each field is a byte array
of occurs count given in the corresponding template's field descriptor
for that occursIndex.
The one thing I can't figure out how to do is to create an XPath to the
right template given the template ID in the data record set header.
The problem, is that the TemplateID is not an array index. The templates
might have arbitrary ID numbers. They might not be say, 256, 257, etc.
in order. They could be scattered, etc. The standard only requires that
they are unique IDs. So the set of templates is truly a set with these
identifiers.
So I need a way to write an XPath that would choose the template from the
set, whose ID matches a particular integer value from the data record set's
header.
Right now I think our XPath subset doesn't allow this. We can only index
arrays with integers, and we have no searching capability that processes
a set of nodes to identify a node having any particular value or characteristic.
So: are we being too strict here. Should we allow somewhat more complex
predicates? such as { ...../template[idField eq $templateID] }
(For speed reasons, I might not do that lookup each time I parse a data
record. I might pre-fill an array of 65535 structures by way of inputValueCalc
so that I can actually use the template IDs as indexes into that array.
But either way I need the select from matching set capability.)
In general, templates can be interleaved in IPFix in that the requirement
is just that they are transmitted before data record sets that reference
them. So in general, an application that reads IPFix data cannot say, first
scrape off all the templates and generate a schema for them, and then use
that schema to parse. The late availability of the templates is something
that is inherent in the format.
...mikeb
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU