dfdl-wg
Threads by month
- ----- 2025 -----
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2004 -----
- December
- November
February 2010
- 5 participants
- 24 discussions

Re: [DFDL-WG] Action Item 049: Built-in specification description and schemas
by Steve Hanson 02 Feb '10
by Steve Hanson 02 Feb '10
02 Feb '10
Thanks for highlighting this Suman.
The reason for hiving off the properties for text numbers into a separate
named annotation was reuse. It was considered that a given data format
might have a large number of text number fields, but that they could be
described by a far lesser number of annotations, because a limited set of
'number patterns' were used. In Suman's example that's clearly not the
case, but it is an artificial one. We need to consider real world formats.
I've had a look through example COBOL copybooks, and while there is a
large variation in text number fields, reuse of 'number patterns' would be
a benefit. For example, a set of related values might be declared the
same:
15 ORIGINAL-PRICE PIC 9(013)V99.
15 DISCOUNTED-PRICE PIC 9(013)V99.
15 SALE-PRICE PIC 9(013)V99.
15 STAFF-PRICE PIC 9(013)V99.
15 TOTAL-PRICE PIC 9(013)V99.
The question then becomes what is the best way to achieve this reuse. If
you look at a dfdl:textNumberFormat annotation, it is the number pattern
that varies. Everything else would be defined once in a dfdl:format
annotation and scoped. So it does seem overkill to have a
dfdl:textNumberFormat for every number pattern, because the contained
properties can not be scoped and must be redeclared each time.
I suggest the best reuse mechanism for this scenario is the simple type.
In the above example I could declare a PRICE simple type and put the
number pattern on that.
I therefore agree with Suman. Remove dfdl:textNumberFormat and
dfdl:defineTextNumberFormat, add all the properties to dfdl:element and
dfdl:simpleType. In practice most will be set in a dfdl:format and scoped,
only the number pattern will vary per element or simple type.
We should also consider whether the same issue applies to
dfdl:calendarFormat and dfdl:escapeScheme. For both these the reuse
opportunity is high. There is likely to be just one escape scheme per data
format. There is likely to be a small number of calendar formats per data
format (eg, one for a date, one for a time, one for a timestamp). But in
the latter case, it is typically just the calendarPattern that would vary,
the rest of the properties would be set once.
I suggest that whatever we adopt for text numbers we also adopt for
calendars, for consistency.
Regards
Steve Hanson
Programming Model Architect, WebSphere Message Broker,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh(a)uk.ibm.com,
Phone (+44)/(0) 1962-815848
From:
Suman Kalia/Toronto/IBM@IBMCA
To:
Alan Powell/UK/IBM@IBMGB, Steve Hanson/UK/IBM@IBMGB, Mike Beckerle
<mbeckerle.dfdl(a)gmail.com>
Cc:
dfdl-wg(a)ogf.org
Date:
02/02/2010 00:21
Subject:
Action Item 049: Built-in specification description and schemas
I am trying to create DFDL definition for COBOL copy book and have
experienced a usability issue with TextNumberFormat which have to be named
and referenced from dfdl:element and dfdl:simpleType annotations. Consider
a sample COBOL copy book, attached below, where I have 3 elements having
PIC 9999 display clause (a.k.a zoned decimal) and 2 external (standard)
decimal. They all have same length but the main difference between them is
number is sign which could leading or trailing. As per the V.38 spec, I
would have to create a named textNumberFormat for each of the picture
clause. The key difference in the named textNumberFormats for these
definitions would be numberPattern and rest of the attributes for standard
decimal and zoned decimal are going to be same for a particular platform
or data definition format. The generated DFDL schema will be containing
many occurrences of TextNumberFormat and in the worst case scenario one
for each element defined in the COBOL copy book. This is not very usable
and also user would have to carefully choose the name for these formats so
he can easily identify and distinguish if wants to resue them something
like TextNumberStandardLength5SignLeading etc..
01 CobolTypes.
* External decimal ( Zoned decimal)
05 elem9 PIC 9999 DISPLAY.
05 elem9Signed PIC S9999 DISPLAY.
05 elem9SignedLeading PIC S9999 DISPLAY
SIGN LEADING.
* in DFDL - modeled as standard decimal
05 elem9SignedLeadingSeparate PIC S9999 DISPLAY
SIGN LEADING SEPARATE.
05 elem9SignedTrailingSeparate PIC S9999 DISPLAY
SIGN TRAILING SEPARATE.
Number Format
When textNumberRepresentation is ‘zoned’ only the pattern for positive
numbers is used. Only the following pattern characters may be used: '+' to
indicate whether the leading or trailing digit carries the overpunched
sign, 'V' to indicate the location of an implied decimal point and '0' to
indicate the number of digits (including overpunched). The number is '0'
characters must match the number of digits in the representation otherwise
it is a schema definition error.
Better approach would be
Add numberPattern to dfdl:element and dfdl:simpleType annotation and rest
of the attributes from TextNumberFormat block to either a) dfdl:format
only or (b) both dfdl:format and dfdl:element and dfdl:simpleType.
Let's discuss this in the DFDL workgroup call tomorrow ..
Attached below is a schema coded with the assumption (a) listed above..
<xsd:complexType name="CobolTypes">
<xsd:sequence>
<!---------------- External Decimal
-------------------------------->
<xsd:element name="elem9" dfdl:ref=
"dfdlCobolFmt:CobolZonedDecimalFormat"
dfdl:length="4" dfdl:representation="text"
dfdl:numberPattern="0000">
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"0" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="elem9Signed" dfdl:ref=
"dfdlCobolFmt:CobolZonedDecimalFormat"
dfdl:length="4" dfdl:representation="text"
dfdl:numberPattern="0000+" >
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"-9999" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="elem9SignedLeading" dfdl:ref=
"dfdlCobolFmt:CobolZonedDecimalFormat"
dfdl:length="4" dfdl:representation="text"
dfdl:numberPattern="+0000">
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"-9999" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="elem9SignedLeadingSeparate"
dfdl:ref="dfdlCobolFmt:CobolStandardDecimalFormat"
dfdl:length="5" dfdl:representation="text"
dfdl:numberPattern="+0000;-00000" >
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"-9999" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="elem9SignedTrailingSeparate"
dfdl:ref="dfdlCobolFmt:CobolStandardDecimalFormat"
dfdl:length="5" dfdl:representation="text"
dfdl:numberPattern="0000+;00000-">
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"-9999" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
----- Data format Definitions
<xsd:defineFormat name=
"CobolStandardDecimalFormat">
<xsd:format ref=
"tns:BaseTextNumberStandardDecimal" dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:alignment="1" dfdl:alignmentUnits="bytes"
dfdl:leadingSkipBytes="0"
dfdl:trailingSkipBytes="0" />
</xsd:defineFormat>
<xsd:defineFormat name="CobolZonedDecimalFormat">
<xsd:format ref=
"tns:BaseTextNumberZonedDecimal" dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:alignment="1" dfdl:alignmentUnits="bytes"
dfdl:leadingSkipBytes="0"
dfdl:trailingSkipBytes="0" />
</xsd:defineFormat>
-- Text number Formats ( added here for reference to identify applicable
attributes for standard and zoned decimal)
<xsd:defineTextNumberFormat name=
"ZonedDecimalNumberFormat">
<xsd:textNumberFormat numberCheckPolicy=
"lax" numberRoundingMode="roundUp"
numberZonedSignStyle=
"asciiStandard" />
</xsd:defineTextNumberFormat>
<xsd:defineTextNumberFormat name=
"StandardDecimalFormat">
<xsd:textNumberFormat
numberGroupingSeparator=","
numberDecimalSeparator="."
numberExponentCharacter="E" numberCheckPolicy="lax"
numberInfinityRep="\u221E"
numberNanRep="\uFFFD" numberRoundingMode="roundUp"
numberZeroRep="" "" />
</xsd:defineTextNumberFormat>
Suman Kalia
IBM Toronto Lab
WMB Toolkit Architect and Development Lead
WebSphere Business Integration Application Connectivity Tools
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.h…
Tel : 905-413-3923 T/L 969-3923
Fax : 905-413-4850 T/L 969-4850
Internet ID : kalia(a)ca.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0

02 Feb '10
I am trying to create DFDL definition for COBOL copy book and have
experienced a usability issue with TextNumberFormat which have to be named
and referenced from dfdl:element and dfdl:simpleType annotations. Consider
a sample COBOL copy book, attached below, where I have 3 elements having
PIC 9999 display clause (a.k.a zoned decimal) and 2 external (standard)
decimal. They all have same length but the main difference between them is
number is sign which could leading or trailing. As per the V.38 spec, I
would have to create a named textNumberFormat for each of the picture
clause. The key difference in the named textNumberFormats for these
definitions would be numberPattern and rest of the attributes for standard
decimal and zoned decimal are going to be same for a particular platform
or data definition format. The generated DFDL schema will be containing
many occurrences of TextNumberFormat and in the worst case scenario one
for each element defined in the COBOL copy book. This is not very usable
and also user would have to carefully choose the name for these formats so
he can easily identify and distinguish if wants to resue them something
like TextNumberStandardLength5SignLeading etc..
01 CobolTypes.
* External decimal ( Zoned decimal)
05 elem9 PIC 9999 DISPLAY.
05 elem9Signed PIC S9999 DISPLAY.
05 elem9SignedLeading PIC S9999 DISPLAY
SIGN LEADING.
* in DFDL - modeled as standard decimal
05 elem9SignedLeadingSeparate PIC S9999 DISPLAY
SIGN LEADING SEPARATE.
05 elem9SignedTrailingSeparate PIC S9999 DISPLAY
SIGN TRAILING SEPARATE.
Number Format
When textNumberRepresentation is ‘zoned’ only the pattern for positive
numbers is used. Only the following pattern characters may be used: '+' to
indicate whether the leading or trailing digit carries the overpunched
sign, 'V' to indicate the location of an implied decimal point and '0' to
indicate the number of digits (including overpunched). The number is '0'
characters must match the number of digits in the representation otherwise
it is a schema definition error.
Better approach would be
Add numberPattern to dfdl:element and dfdl:simpleType annotation and rest
of the attributes from TextNumberFormat block to either a) dfdl:format
only or (b) both dfdl:format and dfdl:element and dfdl:simpleType.
Let's discuss this in the DFDL workgroup call tomorrow ..
Attached below is a schema coded with the assumption (a) listed above..
<xsd:complexType name="CobolTypes">
<xsd:sequence>
<!---------------- External Decimal
-------------------------------->
<xsd:element name="elem9" dfdl:ref=
"dfdlCobolFmt:CobolZonedDecimalFormat"
dfdl:length="4" dfdl:representation="text"
dfdl:numberPattern="0000">
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"0" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="elem9Signed" dfdl:ref=
"dfdlCobolFmt:CobolZonedDecimalFormat"
dfdl:length="4" dfdl:representation="text"
dfdl:numberPattern="0000+" >
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"-9999" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="elem9SignedLeading" dfdl:ref=
"dfdlCobolFmt:CobolZonedDecimalFormat"
dfdl:length="4" dfdl:representation="text"
dfdl:numberPattern="+0000">
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"-9999" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="elem9SignedLeadingSeparate"
dfdl:ref="dfdlCobolFmt:CobolStandardDecimalFormat"
dfdl:length="5" dfdl:representation="text"
dfdl:numberPattern="+0000;-00000" >
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"-9999" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="elem9SignedTrailingSeparate"
dfdl:ref="dfdlCobolFmt:CobolStandardDecimalFormat"
dfdl:length="5" dfdl:representation="text"
dfdl:numberPattern="0000+;00000-">
<xsd:simpleType>
<xsd:restriction base="xsd:short">
<xsd:minInclusive value=
"-9999" />
<xsd:maxInclusive value=
"9999" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
----- Data format Definitions
<xsd:defineFormat name=
"CobolStandardDecimalFormat">
<xsd:format ref=
"tns:BaseTextNumberStandardDecimal" dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:alignment="1" dfdl:alignmentUnits="bytes"
dfdl:leadingSkipBytes="0"
dfdl:trailingSkipBytes="0" />
</xsd:defineFormat>
<xsd:defineFormat name="CobolZonedDecimalFormat">
<xsd:format ref=
"tns:BaseTextNumberZonedDecimal" dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:alignment="1" dfdl:alignmentUnits="bytes"
dfdl:leadingSkipBytes="0"
dfdl:trailingSkipBytes="0" />
</xsd:defineFormat>
-- Text number Formats ( added here for reference to identify applicable
attributes for standard and zoned decimal)
<xsd:defineTextNumberFormat name=
"ZonedDecimalNumberFormat">
<xsd:textNumberFormat numberCheckPolicy=
"lax" numberRoundingMode="roundUp"
numberZonedSignStyle=
"asciiStandard" />
</xsd:defineTextNumberFormat>
<xsd:defineTextNumberFormat name=
"StandardDecimalFormat">
<xsd:textNumberFormat
numberGroupingSeparator=","
numberDecimalSeparator="."
numberExponentCharacter="E" numberCheckPolicy="lax"
numberInfinityRep="\u221E"
numberNanRep="\uFFFD" numberRoundingMode="roundUp"
numberZeroRep="" "" />
</xsd:defineTextNumberFormat>
Suman Kalia
IBM Toronto Lab
WMB Toolkit Architect and Development Lead
WebSphere Business Integration Application Connectivity Tools
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.h…
Tel : 905-413-3923 T/L 969-3923
Fax : 905-413-4850 T/L 969-4850
Internet ID : kalia(a)ca.ibm.com
1
0

01 Feb '10
1. Discriminators
Review two options (attached)
2. Remaining 037 review issues
See below
3. Go through Actions
Current Actions:
No
Action
045
20/05 AP: Speculative Parsing
27/05: Psuedo code has been circulated. Review for next call
03/06: Comments received and will be incorporated
09/06: Progress but not discussed
17/06: Discussed briefly
24/06: No Progress
01/07: No Progress
15/07: No progress. MB not happy with the way the algorithm is documented,
need to find a better way.
29/07: No Progress
05/08: No Progress. Will document behaviour as a set of rules.
12/08: No Progress
...
16/09: no progress
30/09: AP distributed proposal and others commented. Brief discussion AP
to incorporate update and reissue
07/10: Updated proposal was discussed.Comments will be incorporated into
the next version.
14/10: Alan to update proposal to include array scenario where minOccurs >
0
21/10: Updated proposal reviewed
28/10: Updated proposal reviewed see minutes
04/11: Discussed semantics of disciminators on arrays. MB to produce
examples
11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds
are needed after all. MB and SF to continue with examples.
18/11: Went through WTX implementation of example. SF to gather more
documentation about WTX discriminator rules.
25/11: Further discussion. Will get more WTX documentation. Need to
confirm that no changes need to Resolving Uncertainty doc.
04/11: Further discussion about arrays.
09/12: Reviewed proposed discriminator semantic.
16/12: Reviewed discriminator examples and WTX semantic.
23/12: SF to provide better description of WTX behaviour and invite B
Connolley to next call
06/01:B Connolly not available. SF to provide more complete description.
13/01: Stephaine took us through a description of WTX identifiers. Mike
agreed to write up in DFDL terms.
20/01: Mike will write up
27/01: further discussion of discriminators
29/01: Alan had emailed bot proposals but not enough time to discuss
049
20/05 AP Built-in specification description and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these from test cases)
15/07: No progress. Once available, the examples in the spec should use
the dfdl:defineFormat annotations they provide.
...
14/10: no progress
21/10: Discussed the real need for this being in the specification. It
seemed that the main value is it define a schema location for downloading
'known' defaults from the web.
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource to complete this action it can
be deferred
13/01:no progress
20/01: no progress
27/01: no progress
29/01: No progress. The predefined formats do not need to be available
when the spec is published.
Suman said that he had been mapping COBOL structures to DFDL and it didn't
look as though the way text numbers are define is very usable. He will
document for next call
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
20/01: no update
27/01: no progress
29/01: no progress
077
SKK: mapping of COBOL numbers to textNumberFormats.
A few comments in-line below
On Wed, Jan 20, 2010 at 7:01 AM, Alan Powell <alan_powell(a)uk.ibm.com>
wrote:
I have answered most of the issues and comments raised by Steve and Mike
but some need further discussion.
Issues from Steve H
General. Although dfdl:encoding enums are case insensitive, we should
stick to UC throughout in examples.
2. I agree with the existing comment that the RFC2119 key words should be
upper case.
14.3.4. There are type/rep combinations where lengthKind="implicit" is not
allowed - so saying that 'pattern' is replaced by 'implicit' on unparsing
does not work.
TBD
We covered this on the most recent wg call.
16.2. I'm not sure that scannability in this constant encoding sense is
necessary for patterns. I can create a regular expression that extracts
all characters up to hex value xXX or all characters up to xYY, thereby
treating the content as an encoding in-sensitive black box.
If your byte pattern happens to be a legal part of a multi-byte character
sequence, then you'll get a false recognition, or you won't get what you
expect.
Example: You are searching for byte 0xAA, but that can legally appear as
byte 3 of a 3-byte UTF-8 encoded character. When you say you are looking
for hex AA in a string, DFDL is currently defined to mean you are looking
for the character reprsented by that raw byte. If the encoding is UTF-8,
that isn't a legal character encoding sequence even, so the decoder should
cause an error or something.
Even for a fixed length single byte character set, you have to have no
unused code that have no mapping to ISO 10646, because our infoset is
defined in terms of translations into that.
I think we need encoding="none" or encoding="bytes" or something if you
really want to scan bytes without encoding causing problems.
Issues from Mike B
· Tracker issue: codepoints outside BMP, as literals and in data.
· If I put in a value that requires use of a high/low surrogate
pair, is that an error, does it require me to put in two separate %#...;
thingys, one for each of the surrogates (in which case these are not
really code points in ISO10646). If I put in a codepoint for one of the
supplemental characters and the schema itself is written in UTF-16 then
that has to translate into literal surrogate pair. Ok, but I?m very
uncertain about all this stuff
The above item had two issues glomed together. There really are two
separate issues. The above is about these crazy codepoints that use
surrogate pairs. That's a minor corner case given the amount of use those
get.
The bigger issue is the one below, which is about things that either are
in strings and are broken character encodings, but we still need to be
able to process the data. There's also the matter of recovery from errors
in decoding, and what we put out when the infoset contains a character
code where there is no valid encoding, or just a character code which
isn't even in ISO 10646 (e.g., character code 0xFFFFFFFF, which is not a
valid character at all.
Tracker Issue: illegal character encodings for parsing and unparsing. TBD:
how do these make it into the infoset or are they replaced, and if so how
TBD: can one represent these in the infoset for output? Ideally not, but?
· Tracker Issue: Processing-time Schema Definition Errors
This section (2.3.1 in this draft), is problematic as we?re trying to
allow simple DFDL implementations to not do a bunch of static checking,
yet if implementations differ on when Schema Definition errors are
detected, then the second paragraph says they are converted to processing
errors. This lets different implementations do very different things in
terms of how the speculative parsing back-tracks around.
Grammar ambiguity is a very tricky case. Unless a DFDL implementation can
prove a grammar to be unambiguous, then it is very hard to say that any
particular combinatino of delimiters make up a legal DFDL schema
definition. If the parser simply fails because the grammar was ambiguous,
there?s no way to tell the difference between this and just broken data
without proving the grammar is unambiguous. In general it is formally
undecidable whether a grammar is ambiguous or unambiguous. (
http://books.google.com/books?id=lIuu53IcKWoC&pg=PT217&lpg=PT217&dq=proving…
)
Since DFDL v1.0 doesn?t allow recursive declarations/definitions, it may
be possible to provide the ambiguity or unambiguity of a DFDL schema (or
rather, the data syntax grammar described by it ? if you want to bother to
distinguish the two), but recursion isn?t something we want to rule out
for the future, so
Type checking is decidable in DFDL?s expression language, so we could
always detect type safety before run time; however, if we allow a
simplistic DFDL implementation to just check types at run time then this
would, by the definition in this section (2.3.1), issue processing errors
when it detects these at run time, thereby allowing backtracking of the
speculative parser to be driven off of type-checks in the expression
language. It seems to me that we need to find a way to put this problem
back into the hands of the user, and say that a schema where this actually
matters (one where a type error causes a backtrack, which ultimately
causes a successful parse) are illegal but implementations are allowed to
not detect this particular illegality.
It seems to me we need to put this problem back into the hands of the
user.
· Tracker Issue: "round trip" for infoset. Should we omit the
whole point?
· Tracker Issue: [schema] is an absolute or relative SCD. Why
bother allowing absolute?
· Tracker Issue: Glossary as the place for centralized
definitions, or should they be repeated there, but also introduced at
point of first use, or should we put the definitions only at the places
where they are discussed, and xref from the glossary?
· TBD: Issue - semantics of expressions containing relative paths
that are inherited via ref to a dfdl:defineFormat. (also section 10.3)
· TBD: Issue - XPath term - we are not consistent about using the
term XPath, or "expression" when referring to our expression language. I
prefer to call it our expression language, and then in the section that
defines it state that it is a strict subset of XPath 2.0.
· TBD: Issue - fn:position is unclear given that we've just said
we don't support sequences in the expression language.
· TBD: Issue - order of sections. Scoping rules section should
come before variables section, which uses these concepts.
TBD: Issue: Case sensitivity of enum names - did we say whether this is
case sensitive or not? I believe it should be case sensitive.
· Issue: dfdl:representation - Strings in binary rep. I see no
reason why elements of type xs:string will examine dfdl:representation.
They shouldn?t' care what it is, they are always "text". I should be able
to specify a bunch of inter-mixed binary number and string elements
without having to specify dfdl:representation="text' just to avoid an
error on the string type elements. I believe xs:string type ignores
dfdl:representation (always behaves as if dfdl:representation is
'text').(If we change this then the property precedence section for
simpletypes changes slightly as representation="text" is implied if type
is string.)
That will make it impossible to introduce a binary representation of text
later
What is "a binary representation of text"? Is there a real issue here.
This is a primary convenience and clarity issue for me. I do not want to
have to change to representation="text" for every string inside a cobol
structure, which is ultimately a binary representation object. To me
type="string" is enough. I want to put in the file scope level of the
schema a representation="binary", and then decorate the elements with the
specifics of their types, but I do not expect to have to put
representation="text" on anything.
I do not understand what you are trying to achieve by requiring
representation="text" for things that are already textual based on the
type.
The rest of the issues below I think we need to discuss on calls.
textStringPadCharacter textNumberPadCharacter - did we agree that this
character must be a "minimum width" character if the char set encoding is
variable width? (i.e., the pad char must be 1 byte if the encoding is
UTF-8.
numberInfinityRep numberNanRep - Is this applicable only to xs:double and
xs:float? Also, what I've seen requires a distinction of sign. I.e., there
are positive and negative infinities often printing as -inf and +inf.
· TBD: Issue - \n in regular expressions - clarify relationship of
this to entities like NL entity. Also, if I include an entity like WSP* in
a regular expression (can I?) does it then match accordingly?
It appears that some of our multi-valued entities like WSP+ create
conditional "matching" behavior without having to use regular expressions,
e.g., when WSP+ is used as a separator. But can you use entities like WSP+
in a regular expression? It seems you should be able to use regular
"single valued" entities in a regular expression, its these multi-valued
ones that have tricky semantics.
Added Unicode values to /n, /t,/r. Disallow DFDL entities in regular
expressions.
14.1 Alignment - TBD: Issue - zero-based thinking here. But all the bits
stuff and everything else in DFDL uses 1-based reasoning. Need to revisit
to make this sensible for 1 based world.
Added implicit alignment table. TBD zero-based
finalTerminatorCanBeMissing - spec is not clear. Also is there a
finalSeparatorCanBeMissing
Chaned to finalDocumentTerminatorCanBeMissing and
finalDocumentSeparatorCanBeMissing. Not sure where
finalDocumentSeparatorCanBeMissing should be specified. Looks odd on
'distinguished root'. These properties operate differently from other
properties as they are defined on the 'distinguished root' but affect some
lower down element. Effectively they are put in scope by a different
mechanism
Regards
Alan Powell
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell(a)uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0
Open Grid Forum: Data Format Description Language Working Group
OGF DFDL Working Group Call, January-29-2010
Attendees
Mike Beckerle (Oco)
Steve Hanson (IBM)
Alan Powell (IBM)
Suman Kalia (IBM)
Peter Lambros (IBM)
Tim Kimber(IBM)
Apologies
Stephanie Fetzer (IBM)
Steve Marting (Progeny)
1 Review Schedule
The WG acknowledged that we are not going to meet the schedule to be
available for public comment by OGF 28. Agreed to continue to complete the
specification as soon as possible.
OGF prereview is confirmed to take about 4 weeks assuming no document
updates are required. We are behind schedule to be available for public
review by March.
Activity
Schedule
Who
Complete Action items
- 18 Dec 2009
WG
Complete Spec
Write up work items
? 23 Dec 2009
AP
Restructure and complete specification
- 23 Dec 2009
AP
Issue Draft 038
23 Dec 2009
WG review
WG review
7 Dec ? 08 Jan 2010
WG
Incorporate review comments
4 Jan - 29 Jan 2010
AP +
Issue Draft 039
15 Jan 2010
Incorporate review comments
4 Jan - 29 Jan 2010
AP +
Issue Draft 040
29 Jan 2010
Initial OGF Editor Review
Initial Editor review
1 Feb - 1 Mar 2010
OGF
Initial GFSG review
1 Feb - 1 Mar 2010
Issue Draft 041
1 Mar 2010
OGF Public Comment period (60 days)
1 Mar - 30 Apr 2010
OGF
OGF 28 Munich
15-19 March 2010
Incorporate comments
Incorporate comments
28 May 2010
Issue Draft 042
28 May 2010
Final OGF Editor Review
Final Editor review
June 2010
OGF
final GFSG review
June 2010
Issue Final specification
30 June 2010
Publish proposed recommendation
1 July 2010
Grid recommendation process
1 Jan - 1 April 2011
2. Go through Actions
Updated below
Action 077: Suman said that he had been mapping COBOL structures to DFDL
and it didn't look as though the way text numbers are define is very
usable. He will document for next call.
3. TLog
TLOG
The individual fields are a mixture of ASCII strings, proprietary packed
decimals, and the occasional pure binary data. All fields are delimited by
a separator. Fields of all types can be fixed length or variable length
with a maximum. Pure binary data is preceded by a field giving the actual
length. All lengths in bytes.
Packed decimals. Like a packed decimal in the IBM sense. These can carry
negative numbers but use a leading xD sign nibble. No sign nibble if
positive or unsigned. Odd number of digits (including sign if present) are
padded with xF nibble. This is best illustrated using examples.
1234 => x12x34
123 => xF1x23
-1234 => xFDx12x34
-123 => xD1x23
Proposal
1) The 'variable length with a maximum' will be handled using a
post-timing assertion. Note this only applies on parsing. **
2) dfdl:lengthKind 'delimited' is permitted for numbers when
dfdl:representation is 'binary' and dfdl:binaryNumberRep is 'packed' or
'bcd' because it is possible to know in advance the range of bytes being
used, and therefore to choose suitable delimiters.
3) Core DFDL 1.0 will not be enhanced to handle the TLOG packed decimal
type. A future version of DFDL will provide an extensibility mechanism
that allows user-defined types to be handled. In the 1.0 timeframe IBM
may implement its own proprietary extension to handle this type.
** While this can result in output from a DFDL unparser that can not be
re-parsed, that is a problem general to the use of assertions, and a
future version of DFDL may choose to change this by enhancements to the
assertion annotation.
4 Action 071 Semantics of length=0, nil handling and defaults.
Changed unparsing behaviour - we must honour the property - the existing
behaviour of always writing the initiator means we can not successfully
re-parse if writing empty content and enum is 'suppress'. When reading,
assume that section 15.13 has been updated to include complex as well as
simple elements.
No change to enums.
missingValueInitiatorPolicy
Enum
Valid values ?required', ?prohibited'
Specifies whether to expect an initiator when an element is missing.
Ignored unless dfdl:initiator is specified and is not "" (empty string).
'required' - Indicates that the dfdl:initiator followed by empty content
is the required syntax to indicate that the element is missing.
'prohibited' - Indicates that empty content is the required syntax to
indicate that the element is missing. The presence of an initiator implies
that real content must follow.
Use of ?prohibited? implies an ordered sequence. If used on an initiated
element of an unordered group it is a schema definition error.
If the element is required, defaulting occurs as defined above.
This property also applies on unparsing, when the data to be written
(after nil value and default value processing) is empty content.
Annotation: dfdl:element
Unparsing. The branch of a choice output when a complex element is
required but missing from the infoset is the first branch of the choice
that does not result in a processing error.
5. Go through Actions
6. Discriminators
Not discussed)
7. Draft 037 review issues
not discussed
- Case of enumerations. We should follow the XSDL convention which is that
enumerations are case sensitive
- dfdl:lengthKind='Pattern scannability: A complex element with
lengthKind=Pattern will use its dfdl:encoding property as the encoding
when scanning its children irrespective of the child's encoding property.
Go through unanswered issues in Mike's comments document
Meeting closed, 14:10
Next call Tuesday 02 February January 2010 13:00 UK
Next action: 078
Actions raised at this meeting
No
Action
077
SKK: mapping of COBOL numbers to textNumberFormats.
Current Actions:
No
Action
045
20/05 AP: Speculative Parsing
27/05: Psuedo code has been circulated. Review for next call
03/06: Comments received and will be incorporated
09/06: Progress but not discussed
17/06: Discussed briefly
24/06: No Progress
01/07: No Progress
15/07: No progress. MB not happy with the way the algorithm is documented,
need to find a better way.
29/07: No Progress
05/08: No Progress. Will document behaviour as a set of rules.
12/08: No Progress
...
16/09: no progress
30/09: AP distributed proposal and others commented. Brief discussion AP
to incorporate update and reissue
07/10: Updated proposal was discussed.Comments will be incorporated into
the next version.
14/10: Alan to update proposal to include array scenario where minOccurs >
0
21/10: Updated proposal reviewed
28/10: Updated proposal reviewed see minutes
04/11: Discussed semantics of disciminators on arrays. MB to produce
examples
11/11: Absorbing action 033 into 045. Maybe decorated discrminator kinds
are needed after all. MB and SF to continue with examples.
18/11: Went through WTX implementation of example. SF to gather more
documentation about WTX discriminator rules.
25/11: Further discussion. Will get more WTX documentation. Need to
confirm that no changes need to Resolving Uncertainty doc.
04/11: Further discussion about arrays.
09/12: Reviewed proposed discriminator semantic.
16/12: Reviewed discriminator examples and WTX semantic.
23/12: SF to provide better description of WTX behaviour and invite B
Connolley to next call
06/01:B Connolly not available. SF to provide more complete description.
13/01: Stephaine took us through a description of WTX identifiers. Mike
agreed to write up in DFDL terms.
20/01: Mike will write up
27/01: further discussion of discriminators
29/01: Alan had emailed bot proposals but not enough time to discuss
049
20/05 AP Built-in specification description and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these from test cases)
15/07: No progress. Once available, the examples in the spec should use
the dfdl:defineFormat annotations they provide.
...
14/10: no progress
21/10: Discussed the real need for this being in the specification. It
seemed that the main value is it define a schema location for downloading
'known' defaults from the web.
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource to complete this action it can
be deferred
13/01:no progress
20/01: no progress
27/01: no progress
29/01: No progress. The predefined formats do not need to be available
when the spec is published.
Suman said that he had been mapping COBOL structures to DFDL and it didn't
look as though the way text numbers are define is very usable. He will
document for next call
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
20/01: no update
27/01: no progress
29/01: no progress
077
SKK: mapping of COBOL numbers to textNumberFormats.
Closed actions
No
Action
064
MB/SH Request WG presentation at OGF 28
25/11: Session requested
04/12: no update
09/12: no update
16/12: SH has changed request to a general session rather tha WG in the
hope of attracting more people.
23/12: no update
06/01: not heard anything yet
13/01: no update
20/01: no update
27/01: Session confirmed
Closed
071
Semantics of length=0, nil handling and defaults.
23/12:SH no update
06/01: SH has started
13/01: SH proposal review. Minor updates to be made
20/01: Reviewed updated proposal. Need to agree on unparsing empty
choices.
27/01: Steve H had sent update but not discussed due to lack of time
29/01: See minutes. Update 15.3 for complex. document
missingValueInitiatorPolicy. Unparsing. The branch of a choice output when
a complex element is required but missing from the infoset is the first
branch of the choice that does not result in a processing error.
Closed
074
SH: Proposal for parsing TLog
27/01: Proposal discussed and agreed to allow delimited for binary
packed/bcd fields
29/01: See minutes. Confirmed delimited for packed/bcd
Closed
075
SH: rewrite empty sequences section
27/01: Steve provide written section
29/01: scetion rewritten
Closed
076
SH semantics of minOccurs= 0 on choice branches
29/01: Steve confirmed that XSDL allows minOccurs=0 for branches of a
choice which means that the empty sequence in a valid result. WG decided
that DFDL will not allow minOccurs =0 on branches of a choice.
Closed
Work items:
No
Item
target version
status
005
Improvements on property descriptions
not started
012
Reordering the properties discussion: move representation earlier, improve
flow of topics
not started
036
Update dfdl schema with change properties
ongoing
042
Mapping of the DFDL infoset to XDM
none
not required for V1 specification
069
ICU fractional seconds
039
070
Write DFDL primer
071
Write test cases.
072
it is a processing error if the number of occurrences in the data does not
match the value of the expression or prefix
039
073
Rename dfdl:separatorPolicy="required" to "always".
039
Deferred until action 071 agreed
078
document UPA checks
039
079
Semantics of length=0, nil handling and defaults. (A071)
039
080
Tlog: Allow LengthKind delimited for packed/bcd (A074)
039
081
Update empty sequence section (A075)
039
082
semantics of minOccurs= 0 on choice branches (A076)
039
Regards
Alan Powell
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell(a)uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0