
I'm just letting the working group know that we are submitting an abstract today to give a presentation on DFDL at the NIST Data Science Symposium. Below is what we plan to submit. - Steve ---------------------------------------------- Title: Stop Writing Custom Data Parsers -- Write DFDL Instead! This talk gives an introduction to the Data Format Description Language (DFDL), how it can be used to parse both textual and binary data in a standardized way, and how this leads to less time spent on custom data parser development and consequently, more time spent on data processing and analysis. The talk will then describe the current DFDL implementations, with focus on the open-source Daffodil project and its design. It will conclude with a brief walkthrough of real DFDL examples, including commercial and scientific formats, and a demonstration of the parsing capabilities of Daffodil. The DFDL specification, which has completed a second round of public comments as part of the Open Grid Forum (OGF), is a modeling language for describing general text and binary data using a subset of XML Schema augmented with data format annotations. DFDL allows data to be read from its native format and presented as an instance of an information set or an XML document. DFDL also allows the reverse, through conversion of an information set back to its native format. By using the information set, this cleanly integrates with common XML utilities (e.g. XProc, XSLT, XQuery) for data processing and analysis regardless of the format of the native data. Two implementations of DFDL exist, as is required by the OGF to become a standard. The first, created by IBM as part of IBM WebSphere V8, is written in both Java and C and includes graphical tools for modeling, running, and debugging DFDL schemas. The second implementation, Daffodil, is an open-source project written in Scala, with a design focused on speed and correctness. With the two implementations making great strides, and the DFDL specification nearing standardization, DFDL is becoming a promising tool that will ease data parsing, processing, and analysis. Biography: Stephen Lawrence has worked as a software engineer at Tresys Technology since 2007, while contributing to the open-source Daffodil project as a core maintainer for almost two years. He works alongside Michael Beckerle, the co-chair of the DFDL Working Group, to develop Daffodil and improve the DFDL specification. Outside of Daffodil, he focuses on computer security applications, including file inspection and sanitization, Security Enhanced Linux (SELinux), and cross domain solutions.