
Mike I think more complex predicates is something for the next release of DFDL. For this particular format, you at least have the length of the record set so you can parse the record fields as xs:hexBinary and then apply a schema generated from the template in a post-parse step. Regards Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848 From: Mike Beckerle <mbeckerle.dfdl@gmail.com> To: dfdl-wg@ogf.org, Date: 12/11/2012 21:07 Subject: [DFDL-WG] Expressions too restrictive? Example from IPFIX format Sent by: dfdl-wg-bounces@ogf.org I'm looking at RFC 5101 and RFC 5102 These describe a dense binary file format for observing network flows. The application is related to network security. This format has a 'meta' structure to it that I'm not sure how to deal with in DFDL currently. Here's the problem as succinctly as I can make it. One configures some network information capture tools to capture some information. This is flexible, so what is captured can be quite variable. The resulting data stream contains first sets of templates, then sets of data records, then more templates, then more records, and so on. The templates each are identified by an ID number which is an integer from 256 to 65534. A template then contains a count of how many fields are in the template, and then a field descriptor for each field, which includes the length of the field in octets. Each data record set begins with the ID of its template, then a total length in octets for the set, then the records, which are just field, field, field, each as described by the associated template with no record separators or anything. Now, I can model a template in DFDL as an array of field descriptors. I can also model a data record set as a templateID and an array of data records, and each data record as an array of fields, where the number of field occurrences is given in the template, and each field is a byte array of occurs count given in the corresponding template's field descriptor for that occursIndex. The one thing I can't figure out how to do is to create an XPath to the right template given the template ID in the data record set header. The problem, is that the TemplateID is not an array index. The templates might have arbitrary ID numbers. They might not be say, 256, 257, etc. in order. They could be scattered, etc. The standard only requires that they are unique IDs. So the set of templates is truly a set with these identifiers. So I need a way to write an XPath that would choose the template from the set, whose ID matches a particular integer value from the data record set's header. Right now I think our XPath subset doesn't allow this. We can only index arrays with integers, and we have no searching capability that processes a set of nodes to identify a node having any particular value or characteristic. So: are we being too strict here. Should we allow somewhat more complex predicates? such as { ...../template[idField eq $templateID] } (For speed reasons, I might not do that lookup each time I parse a data record. I might pre-fill an array of 65535 structures by way of inputValueCalc so that I can actually use the template IDs as indexes into that array. But either way I need the select from matching set capability.) In general, templates can be interleaved in IPFix in that the requirement is just that they are transmitted before data record sets that reference them. So in general, an application that reads IPFix data cannot say, first scrape off all the templates and generate a schema for them, and then use that schema to parse. The late availability of the templates is something that is inherent in the format. ...mikeb -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412 -- dfdl-wg mailing list dfdl-wg@ogf.org https://www.ogf.org/mailman/listinfo/dfdl-wg Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU