I forgot to clarify Simon's question on sp165.
This was the 'finalTerminatorCanBeMissing" property.
We considered the comment that this might be unnecessary.
Use case: file of text format. Each "record" in the file is terminated by
a CRLF so sez the user. At the top level this file contains an array of
these records.
The file might or might not have a CRLF at the end of the file because
human beings might have edited the file with a text editor, and either
inserted or neglected to insert this final CRLF.
We want the file format to be legal with or without the final CRLF;
however, all prior CRLFs in the file must be present.
So how to express this:
1) CRLF is a terminator of the record
2) CRLF is an occursSeparator of the enclosing array, records have no
terminator. We enclose the array in a sequence group where the array is
followed by a hidden "optional" (minOccurs=0 max=1) element of
fixed="CRLF" string value.
Choice (1) requires that we have finalTerminatorCanBeMissing
Choice (2) is just modeling the behavior that is required directly via
hidden elements. This is tantamount to saying that this keyword is not
worth having because there is a way to model it already. This is true of
many keywords. If we deem this one too obscure, then we need to revisit
many others. (Leading/Trailing Skip Bytes is a good example. Trivially
represented by a hidden element). What are our criteria for inclusion? Up
until now our criteria have been to include things that existing systems
already have found a need for. However, existing systems don't have hidden
field capability.
Note that this same missing final terminator issue can come up not only
with End-of-data, but with any bounded size structure.
E.g., suppose we say that an array has occursUnits="bytes" and
occursPath="874". Then it is 874 bytes long. The array elements can be
terminated by a particular data. E.g., semicolon. For the same reasons as
the CRLF example above, we want to be able to tolerate a missing final
semicolon before the end of the 874 bytes. In effect the
byte-length-limit creates an implicit "end-of-data" for a sub-stream
consisting of just those bytes.
Conclusion: finalTerminatorCanBeMissing seems to be useful enough and
comes up often enough that I think the keyword is worthwhile.
Implication: we should create a list of keywords or enumerated values for
properties that we think are in the grey area where perhaps we want to
drop them. Here's some candidates: byteOrderMarkPolicy,
leading/trailingSkipBytes. Both these can be modeled readily as hidden
elements. There are probably others.
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
Mike Beckerle/Worcester/IBM
08/14/2007 08:40 AM
To
"Simon Parker"
cc
dfdl-wg@ogf.org
Subject
Re: [DFDL-WG] Minutes from 2007-08-08 Call
In conjunction with the annotated document these notes are clear, except
for 'sp165'. Perhaps someone will recapitulate the discussion briefly at
Wednesday's conference. I think only three annotations remain:
sp167 Absent and missing (expanded discussion on the wiki already)
This will be a major topic on a call.
sp172 separatorType="infix"
I'm happy to drop this strange stuff about separatorType=prefix or postfix
and just say separator means infix. However, I would note that at least
two major integration products (IBM WebSphere Transformation Extender -
formerly Mercator, and Microsoft Biztalk, have this concept, so we may end
up putting it back in. Presumably MS copied the earlier Mercator style, or
both got it from common requirements in some EDI standard.
sp173 defaultWhenMissing (expanded discussion on the wiki already)
Same topic as sp167 above. Will have a call topic to discuss.
I've added another contribution to the wiki discussion on 'require'.
This seems to be at resolution I think, which is that we can express this
using assertions. The general style of using DFDL to describe what
fixed-data syntactic constructs look like is a good one.
However, I've amended the Wiki thread on this with a further issue for
group consideration. See bottom of page:
https://forge.gridforum.org/sf/wiki/do/viewPage/projects.dfdl-wg/wiki/Requir...
The 'length and occurs' proposal is an improvement, though I still have
reservations to discuss; likewise the 'opaque data' proposal.
For a call, this week or soon. I will send out an agenda.
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
"Simon Parker"
Sent by: dfdl-wg-bounces@ogf.org
08/13/2007 10:56 AM
To
cc
Subject
Re: [DFDL-WG] Minutes from 2007-08-08 Call
In conjunction with the annotated document these notes are clear, except
for 'sp165'. Perhaps someone will recapitulate the discussion briefly at
Wednesday's conference. I think only three annotations remain:
sp167 Absent and missing (expanded discussion on the wiki already)
sp172 separatorType="infix"
sp173 defaultWhenMissing (expanded discussion on the wiki already)
I've added another contribution to the wiki discussion on 'require'.
The 'length and occurs' proposal is an improvement, though I still have
reservations to discuss; likewise the 'opaque data' proposal.
Regards,
Simon
From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On Behalf
Of Mike Beckerle
Sent: 08 August 2007 18:00
To: dfdl-wg@ogf.org
Subject: [DFDL-WG] Minutes from 2007-08-08 Call
MikeB, Geoff Judd, Alan Powell attended.
Continued through SP's comments.
sp37 - got it.
sp45 - agree. This whole part to be rewritten.
sp115 - ok. strict and "lax" as enums. No built-in default - we never use
defaults in the processor itself. Only in the predefined formats.
sp118 - ok
sp123 - Proposal to simplify length, lengthKind, lengthUnits, and also
occursKind, occursPath, occursPathUnits needed. (along the lines of
byteCount, itemCount, length='delimited' enum, etc.)
sp154 - Need specific proposal to eliminate hexBinary and use what for
opaque (consider also string with encoding='bytes'. ) Or introduce a
dfdl:byteString type or dfdl:opaque type. (derived type - just a standard
name).
sp158 - see sp123
sp165 - needed to have composition property for enclosing groups and or
end-of-data. Regexp doesn't fix this.
Mike Beckerle
STSM, Architect, Scalable Computing
IBM Software Group
Information Platform and Solutions
Westborough, MA 01581
direct: voice and FAX 508-599-7148
assistant: Pam Riordan
priordan@us.ibm.com
508-599-7046
--
dfdl-wg mailing list
dfdl-wg@ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg