Expressions and Variables: Rationale language

Some rationale around the expression language and single-assignment variables. Not sure if this helps, but I'm trying not to rewrite a thesis on the topic here. DFDL is intended to be a description language. That is, the capture of a data format should be as descriptive/declarative as possible. An additional quite critical goal for DFDL is that it allows very high performance implementations, including use of parallel processing wherever possible. DFDL contains an expression language with variables for use in creating parameterized DFDL schemas. However, the way variables can be used in DFDL is quite constrained. Specifically, the variables are single-assignment. Single-assignment variables solve a number of problems. First, they keep the schema more declarative, because the name of a variable represents a value, not a location. Before assignment, the value is not yet known, after the assignment the value is known, but the consumer of the value need only know the name, and need not be aware of the mechanism by which it gets its value or when. Second, single-assignment variables avoid over-constraining the implementation, thereby preserving the potential for high-performance and parallel processing. Some digression is useful here: Any variable creates a data dependency in order of processing. The part of the schema reading/using the variable's value depends upon the data value coming from the part of the schema providing that value. This kind of data dependency is inherent and inescapable. Values must be created before they can be used. However, if you consider a variable to be a location that can be assigned repeatedly, then things are more complex because you not only have data dependency on the value (one part of the schema writes the location, another reads that location), but you have the dependency in the other direction: you must read the location before it can be used again for the * next* value. This is usually called anti-dependency. Anti-dependency is the enemy of high-performance and parallel execution. It forces specific and artificial sequential ordering on things that is due to the way variable names are allocated to storage locations. If variables are single-assignment only, then only data-dependencies exist. Anti-dependencies don't exist, and implementations are free to work in any way consistent with the (inescapable) data dependencies. -- Mike Beckerle | OGF DFDL WG Co-Chair Tel: 781-330-0412
participants (1)
-
Mike Beckerle