This is not a
meeting summary. I'll work on that on the plane today and send it subsequently,
but people asked about having these notes. This is truly random note
taking.
...mikeb
---------------------------------------------------------------
Starting Points:
* xml xsd - what I already have, but not what
I want
- what I
want
* Cobol/C-struct + system and compiler
details
* data dictionary (ad-hoc spreadsheet or text
description)
* example data files only
Some degree of structural similarity required.
Reasonably compatible.
You could express any trasnformation at all,
but the intent is to make
easy those where structural similarity is
present.
Q: is this just a taste and style
issue?
------------------------------------------------
- desirable to have symmetric read/write
capability given a DFDL descriptor
- all built-in types and reps should
implement both directions
- choices and use of runtimeValue expressions can
create non-invertible parsers
- mechanisms can allow explicit introduction of
the output formatting properties needed symmetric with the input parsing
properties
-----------------------------------------------
Ways to refer to one element from the dfdl
annotations of another:
1) use value of other element in runtimeValue
expression.
2) use value of other element as source for dfdl read
conversion
3) use value of other element as a parameter for dfdl read
conversion
(note 2 and 3 are the same if the source is just another
parameter)
-----------------------------------------------
Hypothesis: no XSD syntax is needed inside
DFDL rep annotations. Can instead reference a type name, and name elements
within it.
---------------------------------------------
Choice groups imply the need to have an
additional element to provide a name for the choice. This is required only when
the alternatives of the choice contain a single value.
--------------------------------------------
Key topic: bining and passing mechanisms for
property values aka parmeters to read/write conversions
Parameterization and binding
examples
1) Mime type - image format
logical model looks like bmp
black box read
conversion
2) complex number with 2 possible component
order, realFirst or imaginaryFirst
white box
complexComponentOrder is the parameter
-------------------------------------------------------
Agreed: "transforms" will be called readers
and writers, collectively converters and conversions
--------------------------------------------------------
Proposal for parking lot: discontiguous
representations. E.g., file full of variable length strings where the length
fields are all first, then all the contents.
--------------------------------------------------------
XSLT - has variables and things we can use as
constructs. E.g., they use this idiom
<xsl:value-of name="variable"
select="...."/> and equivalently <xsl:value-of
name="variable">....</xsl:value-of>
---------------------------------------------------------
Issue: when annotations are added on to an
element, can we validate that only relevant properties are asserted for that
element?
Is it desirable to insure that only relevant properties are
asserted, or should irrelevant properties simply be ignored?
Position - rule out irrelevant attributes
improves validity checking, catches errors earlier.
E.g., I keep changing the
byteOrder setting, but nothing is changing in the data I'm reading (turns out
it's because byte order is irrelevant, but if nothing was checking that nothing
would help you find that out.)
Position - tolerate irrelevant attributes
improves flexibility (e.g., if you change the overall representation, you don't
have to edit all the other properties that no longer apply. A single file of
DFDL can capture characteristicts of more than one representation (at least one
text and one binary flavor, though this doesn't generalize.)
-----------------------------------------------------------
Issue: parameterization of
transforms
seems like the OMG DT model and the transform
descriptions (alan's proposal) are very very close conceptually, but exactly how
isn't entirely clear.
-----------------------------------------------------------
Preprocessing
an attribute called source (and presumably another called target)
----------------------------------------------------------
5 kinds of operations
reader
writer
filter
change filter
function
known signatures
we can chain them together
conceptually think of this as pull model, or perhaps the DFDL expressions
don't take any position on whether the implementation is pull or push.
should be a way to create pull-model code in a programming language and use
it as an augmentation of the DFDL system.
could be ways to also adapt push
model code, or other schemes like stateful threads.
Where can these go in DFDL?
- readers and writers go on elements
- filters go on a special construct
for creating sources or targets from other sources or targets
I/O asymetries - using filters you are discarding information, so it
affects ability to exactly reproduce output.
Box and arrow diagrams using these function types can be used to provide a
semantics for DFDL.
-----------------------------------------------------
<element name="charstream" type="dfdl:sourceStream">
<annotation><appinfo source="...">
<dfdl:sourceStreamTD>
<charset>utf-8</charset>
<source>byteStream</source>
<filter>bytesToChars</filter>
</dfdl:sourceStreamTD>
</appinfo></annotation>
</element>
<element name="s" type="dfdl:sourceStream">
<annotation><appinfo source="...">
<dfdl:sourceStreamTD>
<filter>replaceRegexp("...regexp for C-comments...",
"")</filter>
<source>charstream</source>>
</dfdl:sourceStreamTD>
</appinfo></annotation>
</element>
<element name="t" type="dfdl:targetStream">
<annotation><appinfo source="...">
<dfdl:targetStreamTD>
<charset>utf-8</charset>
<target>outbyteStream</target>
<filter>charsToBytes</filter>
</dfdl:targetStreamTD>
</appinfo></annotation>
</element>
<element name="toplevel">
<annotation><appinfo
source="...">
<dfdl:instanceTD>
<source>s</source>
<target>t</target>
<repType>text</repType>
</dfdl:instanceTD>
</appinfo></annotation>
<sequence>
<element name="len"
type="int">
<annotation><appinfo
source="...">
<intTD>
<terminator>\p{newline}</terminator>
</intTD>
</appinfo></anntation>
</element>
<element name="val"
type="int" minOccurs="0"
maxOccurs="unbounded">
<annotation><appinfo
source="...">
<intTD>
<arrayTD>
<storedLength>../len</storedLength>
<terminator>\p{newline}</terminator>
<separator>\p{space}</separator>
</arrayTD>
<numbase>10</numBase>
<reader
name="myIntReader">
<numberOfBits>13</numberOfBits>
</reader>
</intTD>
</appinfo></anntation>
</element>
</sequence>
</element>
--------------------------------------------------------
Still open issues:
1) scoping of property definitions. Useful or source of bad
interactions?
2) how to organize model of the properties for the types - suman and mike
in rough agreement.
3)
----------------------------------------------