Mike Beckerle, Alan Powell, Steve Hanson,
Suman Kalia attended.
Discussed these questions from Alan
about expression language.
1. Accessing hidden values - it seems
inconsistent to allow access to hidden values when xpath is used within
the DFDL domain but not when used outside.
2. Where xpath is allowed in the schema
- It is currently allowed in an arbitrary set of properties (initiator,
terminator, separator, occurseparator, null, etc ). Why not allow it everywhere?
Wr.t. (1) we decided this is correct.
path expressions for dfdl properties can see hidden elements, path expressions
in other places (e.g., schematron assertions) cannot.
Wr.t (2) we decided that expressions
should be allowed in principle everywhere for the value of any property;
however, there may be exceptions for certain properties. Particularly,
it seems some enum-valued properties are unlikely to ever want to be expressions.
Example: dfdl:representation.
However, it was also pointed out that
once we put selectors back into the language you can interleave multiple
formats in the same schema, and for any enumerated property you could just
have one selector-chosen format for each possible value of the enumerated
property.
The reason we don't want a blanket statement
that you can have expressions anywhere you need a property value is that
there is some potential that this makes implementations unnecessarily complex
due to the excess flexibility.
Digression: (This added by MikeB - was
not part of the call today.)
Consider
dfdl:byteOrder="
if (../../x = 'B') then 'bigEndian' else if (../../x='L') then 'littleEndian'
else 'I don't know' }"
DFDL implementations must be prepared
to cope with recieving "I don't know" as the proposed value for
the byteOrder. This is a schema definition error, but it is happening at
run time so becomes a processing error. The only way to rule this
out is to treat enumerated property values not as strings but as an enum
type and force the expressions that compute them to return an enum type,
not a string.
This is a kind of type inference I had
hoped implementations would not need.
Selectors have the advantage of being
statically verifiable. i.e., each selected format is known to use a value
of the enum that is valid or a diagnostic could be issued by the DFDL processor.
If we allow an arbitrary expression to return the value of an enumerated
property then it presumably could also return a nonsense value:
We discussed proposals circulated by
MikeB:
Here's an update to the first one. We
decided sequences shouldn't be another way to carry opaque data. Easy and
conservative way to fix this is to require the length of an empty sequence
to be zero.
Second proposal to eliminate hexBinary
and base64Binary was discussed lightly. It was suggested that one could
have both, and that would make it easy to explain what the hexBinary type
is, because it is a shorthand for a string with encoding="hex",
and similarly for base64Binary. We did not resolve this issue on the call.
Finally, we discussed regular expression
features for DFDL.
There does appear to be need for regexp
features to support parsing data which is delimited by changing data content.
E.g. consider "12345Mike Beckerle". and a two-element sequence.
One is a number which continues until the first non-digit character. The
other is a string which begins with a non-digit character. Regexp length
appears to be a good way to handle this kind of thing.
Alan Powell has the action item to talk
with the IBM internal TX product group. They have a speculative parser
and so have fewer regular-expression features in their language. We want
to understand how they deal with the header, body[], trailer use case.
This case is where the data is lines of text, the header is the first line,
the trailer is the last line, the body records are everything in between
and there's no content that can be used to distinguish the record types.
This is handled in some format-description systems with regexp features.
In TX this is handled by speculative parsing and we want to understand
how this comes out and if it is preferable to adding regexp features.
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046