dfdl-wg
Threads by month
- ----- 2026 -----
- March
- February
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2004 -----
- December
- November
- 3044 discussions
07 Sep '10
1. Current Actions
2. xs:minLength
The spec currently states
When an element declaration specifies a default value, and has type
xs:string, then xs:minLength must be specified and must be 1 or greater.
It is a schema definition error otherwise.
The process for defaults and nils means this restriction is no longer
needed.
3. Is UTF-16 a fixed width or variable width encoding
Appendix A: About UTF-16 and Unicode Character Codes above 0xFFFF
When we define UTF-16 to be a fixed-width double-byte wide character set
we say that each UTF-16 codepoint is represented by 2 bytes. Notice the
careful use of the term 'codepoint' here. Unicode/ISO10646 characters can
have character codes as large as 0x10FFFF which requires 3 bytes to store
(21 bits actually); however in UTF-16 characters with more than 2 bytes of
code are encoded as two codepoints, called a surrogate pair; hence, UTF-16
is fixed-width, 2 bytes per codepoint. It is not 2 bytes per Unicode
character. UTF-16 is really a variable-width encoding, but the characters
that require the surrogate-pair treatment are so infrequently used that
UTF-16 is most often treated like a 16-bit fixed-width character set. It
is the acknowledgement of the existence of surrogate pairs that leads to
the ?codepoint? vs. ?character code? distinction.
UTF-32 is a fixed width encoding with a full 4-bytes per character code.
It represents all of Unicode with the same width per character.
Hence, when we refer to lengths in character strings we will often refer
to length in characters, but we qualify that it means 2-byte codepoints
when the character set encoding is UTF-16. Hence, when the property
lengthUnitKind is 'characters' and the charset is 'UTF-16', then the units
are actually 16-bit codepoints, not Unicode characters.
Current Actions:
No
Action
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
...
17/02: IBM is willing in principle to publish the test case format and
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
03/03: Discussions have been taking place on the subset of tests that will
be provided.
10/03: work is progressing
17/03: work is progressing
31/03: work is progressing
14/04: And XML test case format has been defined and is being tested.
21/04. Schema for TDML defined. Need to define how this and the test cases
will be made public
05/05: Work still progressing
12/05: Work still progressing
02/06: Work still progressing on technical and legal considerations
...
25/08: Will chase to allow Daffodil access to test cases. The WG should
define how implementation confirm that they 'conform to DFDL v1'
01/09: IBM still progressing the legal aspect. Intends to publish 100 or
so tests as soon as it can, ahead of a full compliance suite.
085
ALL: publicise Public comments phase to ensure a good review..
14/04: see minutes
21/04: Press release, OMG and other standards bodies.
05/05: Alan and Steve H have contacted other standards bodies. Will ask
them to add comments on spec
15/05: still no public comments
02/06: No public comments
16/06: Public comments period has ended with no external comments. Alan
had posted changes made in draft 041. Steve suggested send a note to the
WG highlighting these changes. Steve also suggested requesting an
extension as other IBM groups may review. We discussed whether this was
necessary as changes will need to be made during the implementation phase
anyway. Alan to ask OGF what the process is for changes post public
comment.
23/06: Still no comments. Alan will contact OGF to understand the rest of
the process.
30/06: Alan has emailed Joel asking what the process is now public comment
period is over and can we update the published version with WG updates. No
response yet.
07/07: No response. Alan will chase up
14/07: No response from Joel. Sent email to Greg Newby by no response.
21/07: Still no response.
04/08: Joel has responded that it is up to the WG to decide if the changes
are significant enough to need additional review. Alan to contact David
Martin and Erwin Laure for guidance if we split the specification.
11/08: Received a response from Joel that the WG can decide if a re-
public review is necessary before becoming a 'proposed recommendation'.
Alan responded that the WG agreed that a re-review was not necessary. The
next stage is for OGF review committee to approve publication.
11/08: Specification is now 'awaiting author changes' before being
submitted to the OGF technical committee for approval as a 'proposed
specification'.
Alan would like to have the updated specification complete by Sept 10th.
The WG needs to complete all actions by then or decide that they do not
need to be included in this phase of the process.
01/09: Alan and Steve have discussed and propose Sept 30th for completion
of draft 43 and closure of all actions.
099
Splitting the specification in simpler sections.
07/07: Steve sent a proposal but not discussed. Alan will arrange a
separate call.
14/07:Discussed Steve's proposal and Suman's and Alan's comments.
Need to add choice, validation, facets.
Also how does an implementation declare which subsets it supports.
Suggested levels and/or profiles. Steve highlighted a problem when a DFDL
schema from an implementation of just the core functions was moved to a
full DFDL implementation what should happen about the missing properties.
Does the full implementation need to be aware of subsets of functions?
Should it raise a schema definition error for use of a function not in the
subset.
21/07: no progress
04/08: Steve had updated proposed groups of function.
(Subset_proposal_v2.ppt). We discussed whether its is better to have
discrete sets of functions or expanding levels of function.
Purpose of subsetting is:
1. Allow simpler implementations. (main purpose)
2. Simplify tooling
3. Simplify specification.
Steve to contact previous members of WG to check if we have the correct
subsets
11/08: Steve sent an email to previous members of the WG asking for
opinions on splitting the specification. Bob McGrath from National Center
For Supercomputing responded that they had implemented about 80% of the
function. Alejandro will send a description of the function they have
implemented.
Action will be raised to track the Daffodil implementation
11/08: not discussed
01/09: NCSA implementation description received. Making the unparser
optional is a good idea (NCSA do not need one) . Work will progress on the
subsets.
101
Semantics of 'fixed'
21/07: Discussed whether not matching the 'fixed' value should be a
validation error or processing error. Decided that for consistency it
should be a validation error.
It would be useful however to avoid having to duplication of facet
information in an assert which could become unwieldy for, say, a large
enumeration.
Suggestions
- a parser option that 'converted all validation errors to processing
errors'
- a dfdl expression function that 'applied all facets' or 'applied
specific facet' to a particular element.
Stephanie will produce some examples of how this could be used..
04/08: Stephanie had produced examples but they were not discussed due to
lack of time
11/08: We started to discuss Stephanie's HIPPA example but ran out of
time.
25/08: Not discussed
01/09: Discuss next week
107
teston/testoff dfdl expression functions.
Are these functions still needed. They were introduced to allow individual
bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that
use existence flags to see if they are still required.
04/08: Not discussed
11/08: Not discussed
25/08: Not discussed
01/09: Steve to progress by Sept 30th
108
dfdl:hidden
There has been some discussion on whether the 'hidden' global group should
be indicated in some way.
04/08: A lively discussion. The specification is works as currently
defined so whether changes need to be made to make tooling easier. There
shouldn't be 'conventions' in particular tooling as they must be able to
properly deal with schema from other tools that would not obey those
conventions. Steve stated that it is often dangerous to hide too much from
users when they can see they underlying schema. To be continued.
25/08: there has been some offline discussions about simplifying how
hidden elements are implemented. The proposal is
dfdl:hidden property on xs:element only
xs:minOccurs and xs:maxOccurs MUST be 0 when hidden
dfdl:minOccurs and dfdl:maxOccurs for hidden elements only.
An element is 'required' when dfdl:minOccurs >0 and normal default
processing occurs.
The schema, without dfdl annotations, must match the infoset so
assumption is that non-DFDL tools, such as mappers, will ignore/not show
elements with xs:minOccurs and xs:maxOccurs = '0'
01/09: The above proposal is flawed due to use of maxOccurs = 0 (this was
identified back in 2008 hence current spec).
Bob confirmed that NCSA models use hidden in a big way, so punting hidden
beyond 1.0 is not an option.
Two candidates:
- As per spec but with syntactic improvements to make it clear that the
two xs:sequences do not take any dfdl:sequence properties
- Place a flag directly on a local element and force minOccurs to be 0.
Simpler syntax but the semantic changes, as the element *could* be legally
in the infoset, although a DFDL parser would never put it there.
Steve will circulate the two proposals for next week.
Bob to talk to Alejandro as the NCSA implementation is currently more
flexible than the spec, allowing the groupref to point to a choice, and an
elementref. Are these really needed?
111
Daffodil DFDL parser
11/08: Bob and Alejandro described the new implementation that they have
developed. It is a new code base and is not based on the Deffudle
prototype. It is written in scala and implements approximately 80% of the
features in the public comments draft of DFDL V1. Alejandro will send a
list of the features not implemented.
We discussed the scenarios that motivated the development which was to
extract data from various sources and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the
functionality. IBM WG members will get approval the company to allow them
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation
of DFDL then we will need to work out how that would be funded and
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will
investigate
25/08: Alejandro had sent a list of the functions that he has implemented
and Steve ahd responding indicating the extra functions he thought were
essential.
Since then Alejandro has implemented some of the missing functions, such
as escape schemes, pre-defined variables, binary decimal numbers, etc, and
will update his list.
Bob is planning to make the parser available on the internet to allow
testing.
His organisation is being reorganised and he doesn't know what the
priority of Daffodill will be so it is essential that we move quickly. It
would help if IBM could indicate its support for Daffodil in some
semi-formal way.
01/09: Alejandro updating Daffodil to include escape schemes, unordered
sequences and ignoreCase.
Daffodil being placed under formal source control in anticipation of
external release.
Bob has a start October deadline to create a report on what has been done
for his sponsors.
It would be great if we could get Daffodil on the web and have run some
IBM tests so it could be highlighted at OGF 30 at end October.
112
DFDL certification process
25/08: Discussed how to certify DFDL implementations. Alan to investigate
if OGF have a defined process.
01/09: In progress, spec needs to state what conformance means, as part of
this work
113
2. Regular Expressions.
25/08: The DFDL regular expressions should provide lookahead and
backreferences. Is the current regular expression language sufficient?
Discussed two aspects:
a. Is the XML regular expression language the correct one to use. Tim
asked if DFDL needs to specify an language at all and should leave it to
implementers to pick one. That would inhibit portability of schema.
b. A regular expression property on an assert/discriminator as an
alternative to the test expression. Either a DFDL expression or a regular
expression could be specified but not both.
01/09: There are many variations of regexp language, it seems wise to
specify one that we know contains functions like lookaround, which makes
it easy to say things like 'give me everything up to but not including x'.
This rules out XML Schema and POSIX, it needs Perl 5 or Java.
Tim to convince Steve (via example) that use of regexp in asserts is
needed in 1.0.
114
3. OGF 30
25/08: OGF30 takes place on October 25-29 in Brussels. Should we have a
WG session?
09/01: Given emergence of NCSA implementation and spec completion target
of 30th Sept it makes sense to host a session at OGF 30.
Regards
Alan Powell
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell(a)uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0
I'm going to send this and then duck - we've discussed the subject of
missing-ness and defaulting at considerable length already. However, I
genuinely do have some new information for your consideration so please
hear me out.
I'm seeking the opinion of the working group on the following questions:
a) can an element reliably be categorised as 'missing' when
separatorPolicy='suppressed'?
b) is it possible for an element to be 'missing' if it has
lengthKind='explicit' and its length is a static, non-zero value?
c) is it possible for an element to be 'missing' if it has a discriminator
that has already evaluated to 'true'.
For reference, the specification ( v0.42 ) says this concerning missing
elements:
Definition 'missing element'
On parsing, an element is missing if its content region in the data stream
is empty. The initiator and terminator regions of a missing element may,
or may not, also be empty as controlled by the
dfdl:emptyValueDelimiterPolicy property (simple and complex element), or
dfdl:nilValueDelimiterPolicy property (simple element), .
Question a),
Compare the following data streams. In both cases, assume that
- separator is comma and separatorPosition is 'infix'
- missingValueDelimiterPolicy is set to 'none' so a 'missing' value should
not have an initiator.
- the initiators are A:, B: and C:
- values are a,b,c.
separatorPolicy='required' : A:a,,C:c
separatorPolicy='suppressed' : A:a,C:c
In the 'required' case, the parser detects that the initiator is missing,
then looks to see whether the content region is zero-length. It is, so the
element is 'missing'.
In the 'suppressed' case, the parser detects that the initiator is
missing, then looks to see whether the content region is zero-length. It
looks for a delimiter at the current position and finds 'C'. 'C' is not a
delimiter, so the content region is not zero-length. So the parser throws
a processing error - "initiator for element B was not found in the data".
I don't think the 'suppressed' behaviour is what a user will expect, nor
what the WG intended when these rules were drawn up. The problem is that
the parser cannot reliably determine the length of the content region when
separatorPolicy='suppressed'. It can, however, reliably detect whether
the element is present - the initiator gives a strong hint about that.
Somebody may say "well duh!. Of course the content region is empty if the
initiator is not present". That may be a reasonable rule, but it is not
the rule currently given in the specification. Note that the content
region has not been looked at, so that rule relies on the parser
speculatively parsing the element and then backtracking because the
initiator is not found. If we allow that, then why not allow default
values to be applied after other types of processing error ( even for
cases where no initiator was defined )? There are good reasons for not
applying defaults after normal backtracking ( hence the current rule ) so
any such 'missing initiator implies empty content' rule would have to made
explicit in the specification.
Possible refinements of the rules:
a) IF the length of the content region cannot reliably be determined (
lengthKind='delimited and separatorPolicy=suppressed ) AND
emptyValueDelimiterPolicy does not include the initiator AND the element
has an initiator AND the initiator was not found THEN assume that the
content length is zero and treat the element as missing.
or
b) IF (the element has an initiator AND the initiator was not found )THEN
IF the parent group has initiatedContent='yes' THEN the element is missing
else apply the existing rules.
b) would provide a way to get defaults applied in situations where the
content region's length is either fixed or undefined. Quite a lot of users
might assume this behaviour anyway.
Question b)
A similar situation can arise when lengthKind='explicit' and the length is
fixed ( i.e. is not a DFDL expression ). Unless the missing field occurs
at the end of a known-length structure the length of the content region
will
never be zero. I think a similar rule is required for this case also:
- IF the length of the content region is fixed ( lengthKind='explicit' and
length is a static, non-zero value ) AND emptyValueDelimiterPolicy does
not include the initiator AND the element has an initiator AND the
initiator was not found THEN assume that the content length is zero and
treat the element as missing.
...or apply suggestion b) above.
Question c)
Suppose that an element has a discriminator, and it has already evaluated
to 'true' ( it must have been a backward reference to some
previously-parsed field ). The discriminator has unambiguously stated that
the element *is* present in the data. If it is subsequently found to have
a zero-length content region, should the parser treat it as 'missing' and
attempt to apply a default?. I don't think so.
Please tell me that I'm missing something obvious here - it's starting to
sound complicated again.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert(a)uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0
Regards
Steve Hanson
Strategy, Common Transformation & DFDL
Co-Chair, OGF DFDL WG
IBM SWG, Hursley, UK,
smh(a)uk.ibm.com,
tel +44-(0)1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0
Hi,
I stumbled upon DFDL searching about data archaeology - very interesting
and relevant work! Is it already applied in practise for information
preservation in libraries and archives? Unfortunately DFDL is not
documented very well, compared to standards of W3C and similar
institutions. Do you plan to set up a website with a more readable
description of DFDL like other popular standards? json.org is one of the
good examples because it describes the JSON standard easy to understand
and with links to implementations.
My second question is about the notation of DFDL. Has anyone tried to
create a notation that is not based on XML? For instance Notation 3 is
much more readable than RDF/XML and Backus-Naur-Form is more readable
than a grammar formally defined in mathematical formulas. Especially if
you describe non-XML formats it is a barrier to set up the whole XML
framework stack in oder to use DFDL.
I think that DFDL has strong potential but in the current form (both the
way it is documented and its notation) it does not encourage potential
users to adopt it.
Cheers
Jakob Voss
--
Verbundzentrale des GBV (VZG)
Digitale Bibliothek - Jakob Voß
Platz der Goettinger Sieben 1
37073 Goettingen - Germany
+49 (0)551 39-10242
http://www.gbv.de
jakob.voss(a)gbv.de
3
2
31 Aug '10
1. Current Actions
Current Actions:
No
Action
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
...
17/02: IBM is willing in principle to publish the test case format and
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
03/03: Discussions have been taking place on the subset of tests that will
be provided.
10/03: work is progressing
17/03: work is progressing
31/03: work is progressing
14/04: And XML test case format has been defined and is being tested.
21/04. Schema for TDML defined. Need to define how this and the test cases
will be made public
05/05: Work still progressing
12/05: Work still progressing
02/06: Work still progressing on technical and legal considerations
...
25/08: Will chase to allow Daffodil access to test cases.
The WG should define how implementation confirm that they 'conform to DFDL
v1'
085
ALL: publicise Public comments phase to ensure a good review..
14/04: see minutes
21/04: Press release, OMG and other standards bodies.
05/05: Alan and Steve H have contacted other standards bodies. Will ask
them to add comments on spec
15/05: still no public comments
02/06: No public comments
16/06: Public comments period has ended with no external comments. Alan
had posted changes made in draft 041. Steve suggested send a note to the
WG highlighting these changes. Steve also suggested requesting an
extension as other IBM groups may review. We discussed whether this was
necessary as changes will need to be made during the implementation phase
anyway. Alan to ask OGF what the process is for changes post public
comment.
23/06: Still no comments. Alan will contact OGF to understand the rest of
the process.
30/06: Alan has emailed Joel asking what the process is now public comment
period is over and can we update the published version with WG updates. No
response yet.
07/07: No response. Alan will chase up
14/07: No response from Joel. Sent email to Greg Newby by no response.
21/07: Still no response.
04/08: Joel has responded that it is up to the WG to decide if the changes
are significant enough to need additional review. Alan to contact David
Martin and Erwin Laure for guidance if we split the specification.
11/08: Received a response from Joel that the WG can decide if a re-
public review is necessary before becoming a 'proposed recommendation'.
Alan responded that the WG agreed that a re-review was not necessary. The
next stage is for OGF review committee to approve publication.
11/08: Specification is now 'awaiting author changes' before being
submitted to the OGF technical committee for approval as a 'proposed
specification'.
Alan would like to have the updated specification complete by Sept 10th.
The WG needs to complete all actions by then or decide that they do not
need to be included in this phase of the process.
099
Splitting the specification in simpler sections.
07/07: Steve sent a proposal but not discussed. Alan will arrange a
separate call.
14/07:Discussed Steve's proposal and Suman's and Alan's comments.
Need to add choice, validation, facets.
Also how does an implementation declare which subsets it supports.
Suggested levels and/or profiles. Steve highlighted a problem when a DFDL
schema from an implementation of just the core functions was moved to a
full DFDL implementation what should happen about the missing properties.
Does the full implementation need to be aware of subsets of functions?
Should it raise a schema definition error for use of a function not in the
subset.
21/07: no progress
04/08: Steve had updated proposed groups of function.
(Subset_proposal_v2.ppt). We discussed whether its is better to have
discrete sets of functions or expanding levels of function.
Purpose of subsetting is:
1. Allow simpler implementations. (main purpose)
2. Simplify tooling
3. Simplify specification.
Steve to contact previous members of WG to check if we have the correct
subsets
11/08: Steve sent an email to previous members of the WG asking for
opinions on splitting the specification. Bob McGrath from National Center
For Supercomputing responded that they had implemented about 80% of the
function. Alejandro will send a description of the function they have
implemented.
Action will be raised to track the Daffodil implementation
11/08: not discussed
101
Semantics of 'fixed'
21/07: Discussed whether not matching the 'fixed' value should be a
validation error or processing error. Decided that for consistency it
should be a validation error.
It would be useful however to avoid having to duplication of facet
information in an assert which could become unwieldy for, say, a large
enumeration.
Suggestions
- a parser option that 'converted all validation errors to processing
errors'
- a dfdl expression function that 'applied all facets' or 'applied
specific facet' to a particular element.
Stephanie will produce some examples of how this could be used..
04/08: Stephanie had produced examples but they were not discussed due to
lack of time
11/08: We started to discuss Stephanie's HIPPA example but ran out of
time.
25/08: Not discussed
107
teston/testoff dfdl expression functions.
Are these functions still needed. They were introduced to allow individual
bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that
use existence flags to see if they are still required.
04/08: Not discussed
11/08: Not discussed
25/08: Not discussed
108
dfdl:hidden
There has been some discussion on whether the 'hidden' global group should
be indicated in some way.
04/08: A lively discussion. The specification is works as currently
defined so whether changes need to be made to make tooling easier. There
shouldn't be 'conventions' in particular tooling as they must be able to
properly deal with schema from other tools that would not obey those
conventions. Steve stated that it is often dangerous to hide too much from
users when they can see they underlying schema. To be continued.
25/08: there has been some offline discussions about simplifying how
hidden elements are implemented. The proposal is
dfdl:hidden property on xs:element only
xs:minOccurs and xs:maxOccurs MUST be 0 when hidden
dfdl:minOccurs and dfdl:maxOccurs for hidden elements only.
An element is 'required' when dfdl:minOccurs >0 and normal default
processing occurs.
The schema, without dfdl annotations, must match the infoset so
assumption is that non-DFDL tools, such as mappers, will ignore/not show
elements with xs:minOccurs and xs:maxOccurs = '0'
109
dfdl:discriminator : the 'message' attribute
>From Tim:
I remembered the reason why I thought this was a good idea.
Consider the situation where someone is generating their DFDL schema from
meta-data. The model is large, and consists of many references to global
structures. Each global structure ( e.g. an HL7 segment ) is identified in
a particular way. Sometimes the segment is required, sometimes it is not.
Sometimes it occurs as a child of a choice group, and sometimes not.
Regardless, it is highly likely that the segment will be identified in the
same way wherever it occurs. A natural decision for the modeler would be
to create a dfdl:discriminator on all references to the segement, even if
the ref is not under a point of uncertainty. It's harmless, and it carries
no performance penalty. If we disallow the "message" attribute, it will
force the modeler to put in extra logic to work out whether the ref is
under a POI, and generate an assert/discriminator as appropriate.
I'd be interested to know what Steph thinks about this - I think I've
heard her say that she sometimes uses discriminators where an assert would
have done the job, just to maintain consistency throughout the model.
04/08: not discussed.
11/08: Not discussed
25/08: Not discussed
110
Semantics of newVariableInstance and setVariable
what should a DFDL processor ( parser or serializer ) do when it cannot
evaluate the expression in a newVariableInstance or setVariable
annotation?
Moving the setting of variable values after the element has been parsed
just creates other problems A new instance must be available to other
expressions on the same component, and to the children of a group/element.
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable /
newVariableInstance annotations which *cannot* be evaluated until the
element has been parsed.
For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is
started
- if it cannot be evaluated, add it to a list of annotations that must be
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable
that was being set/created then throw a processing error ( because the
result will be undefined ). This will probably require the
variable/instance to be placed into a 'not available' state until its
expression is resolvable
25/08: There was a brief discussion as IBM needs a resolution soon. Is it
possible to restrict newVariableInstance to backward references only so
remove the problem? setVariable must obviously be able to access the
current value.
111
Daffodil DFDL parser
Bob and Alejandro described the new implementation that they have
developed. It is a new code base and is not based on the Deffudle
prototype. It is written in scala and implements approximately 80% of the
features in the public comments draft of DFDL V1. Alejandro will send a
list of the features not implemented.
We discussed the scenarios that motivated the development which was to
extract data from various sources and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the
functionality. IBM WG members will get approval the company to allow them
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation
of DFDL then we will need to work out how that would be funded and
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will
investigate
25/08: Alejandro had sent a list of the functions that he has implemented
and Steve ahd responding indicating the extra functions he thought were
essential.
Since then Alejandro has implemented some of the missing functions, such
as escape schemes, pre-defined variables, binary decimal numbers, etc, and
will update his list.
Bob is planning to make the parser available on the internet to allow
testing.
His organisation is being reorganised and he doesn't know what the
priority of Daffodill will be so it is essential that we move quickly. It
would help if IBM could indicate its support for Daffodil in some
semi-formal way.
Discussed how to certify DFDL implementations. Alan to investigate if OGF
have a defined process.
112
DFDL certification process
113
2. Regular Expressions.
The DFDL regular expressions should provide lookahead and backreferences.
Is the current regular expression language sufficient?
Discussed two aspects:
a. Is the XML regular expression language the correct one to use. Tim
asked if DFDL needs to specify an language at all and should leave it to
implementers to pick one. That would inhibit portability of schema.
b. A regular expression property on an assert/discriminator as an
alternative to the test expression. Either a DFDL expression or a regular
expression could be specified but not both.
114
3. OGF 30
OGF30 takes place on October 25-29 in Brussels
Should we have a WG session?
Regards
Alan Powell
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell(a)uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0
Open Grid Forum: Data Format Description Language Working Group
OGF DFDL Working Group Call, August 25-2010
Attendees
Alan Powell (IBM)
Suman Kalia (IBM)
Tim Kimber(IBM)
Bob McGrath (National Center for Supercomputing Applications)
Alejandro Rodriguez (National Center for Supercomputing Applications)
Apologies
Mike Beckerle (Oco)
Stephanie Fetzer (IBM)
Steve Hanson (IBM)
1. Current Actions
Updated Below
2. Regular Expressions.
The DFDL regular expressions should provide lookahead and backreferences.
Is the current regular expression language sufficient?
Discussed two aspects:
a. Is the XML regular expression language the correct one to use. Tim
asked if DFDL needs to specify an language at all and should leave it to
implementers to pick one. That would inhibit portability of schema.
b. A regular expression property on an assert/discriminator as an
alternative to the test expression. Either a DFDL expression or a regular
expression could be specified but not both.
3. OGF 30
OGF30 takes place on October 25-29 in Brussels
Should we have a WG session?
Meeting closed, 16:30
Next call Wednesday 1 September 2010 15:00 UK (10:00 ET)
Next action: 115
Actions raised at this meeting
No
Action
112
DFDL certification process
113
2. Regular Expressions.
The DFDL regular expressions should provide lookahead and backreferences.
Is the current regular expression language sufficient?
Discussed two aspects:
a. Is the XML regular expression language the correct one to use. Tim
asked if DFDL needs to specify an language at all and should leave it to
implementers to pick one. That would inhibit portability of schema.
b. A regular expression property on an assert/discriminator as an
alternative to the test expression. Either a DFDL expression or a regular
expression could be specified but not both.
114
3. OGF 30
OGF30 takes place on October 25-29 in Brussels
Should we have a WG session?
Current Actions:
No
Action
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
...
17/02: IBM is willing in principle to publish the test case format and
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
03/03: Discussions have been taking place on the subset of tests that will
be provided.
10/03: work is progressing
17/03: work is progressing
31/03: work is progressing
14/04: And XML test case format has been defined and is being tested.
21/04. Schema for TDML defined. Need to define how this and the test cases
will be made public
05/05: Work still progressing
12/05: Work still progressing
02/06: Work still progressing on technical and legal considerations
...
25/08: Will chase to allow Daffodil access to test cases.
The WG should define how implementation confirm that they 'conform to DFDL
v1'
085
ALL: publicise Public comments phase to ensure a good review..
14/04: see minutes
21/04: Press release, OMG and other standards bodies.
05/05: Alan and Steve H have contacted other standards bodies. Will ask
them to add comments on spec
15/05: still no public comments
02/06: No public comments
16/06: Public comments period has ended with no external comments. Alan
had posted changes made in draft 041. Steve suggested send a note to the
WG highlighting these changes. Steve also suggested requesting an
extension as other IBM groups may review. We discussed whether this was
necessary as changes will need to be made during the implementation phase
anyway. Alan to ask OGF what the process is for changes post public
comment.
23/06: Still no comments. Alan will contact OGF to understand the rest of
the process.
30/06: Alan has emailed Joel asking what the process is now public comment
period is over and can we update the published version with WG updates. No
response yet.
07/07: No response. Alan will chase up
14/07: No response from Joel. Sent email to Greg Newby by no response.
21/07: Still no response.
04/08: Joel has responded that it is up to the WG to decide if the changes
are significant enough to need additional review. Alan to contact David
Martin and Erwin Laure for guidance if we split the specification.
11/08: Received a response from Joel that the WG can decide if a re-
public review is necessary before becoming a 'proposed recommendation'.
Alan responded that the WG agreed that a re-review was not necessary. The
next stage is for OGF review committee to approve publication.
11/08: Specification is now 'awaiting author changes' before being
submitted to the OGF technical committee for approval as a 'proposed
specification'.
Alan would like to have the updated specification complete by Sept 10th.
The WG needs to complete all actions by then or decide that they do not
need to be included in this phase of the process.
099
Splitting the specification in simpler sections.
07/07: Steve sent a proposal but not discussed. Alan will arrange a
separate call.
14/07:Discussed Steve's proposal and Suman's and Alan's comments.
Need to add choice, validation, facets.
Also how does an implementation declare which subsets it supports.
Suggested levels and/or profiles. Steve highlighted a problem when a DFDL
schema from an implementation of just the core functions was moved to a
full DFDL implementation what should happen about the missing properties.
Does the full implementation need to be aware of subsets of functions?
Should it raise a schema definition error for use of a function not in the
subset.
21/07: no progress
04/08: Steve had updated proposed groups of function.
(Subset_proposal_v2.ppt). We discussed whether its is better to have
discrete sets of functions or expanding levels of function.
Purpose of subsetting is:
1. Allow simpler implementations. (main purpose)
2. Simplify tooling
3. Simplify specification.
Steve to contact previous members of WG to check if we have the correct
subsets
11/08: Steve sent an email to previous members of the WG asking for
opinions on splitting the specification. Bob McGrath from National Center
For Supercomputing responded that they had implemented about 80% of the
function. Alejandro will send a description of the function they have
implemented.
Action will be raised to track the Daffodil implementation
11/08: not discussed
101
Semantics of 'fixed'
21/07: Discussed whether not matching the 'fixed' value should be a
validation error or processing error. Decided that for consistency it
should be a validation error.
It would be useful however to avoid having to duplication of facet
information in an assert which could become unwieldy for, say, a large
enumeration.
Suggestions
- a parser option that 'converted all validation errors to processing
errors'
- a dfdl expression function that 'applied all facets' or 'applied
specific facet' to a particular element.
Stephanie will produce some examples of how this could be used..
04/08: Stephanie had produced examples but they were not discussed due to
lack of time
11/08: We started to discuss Stephanie's HIPPA example but ran out of
time.
25/08: Not discussed
107
teston/testoff dfdl expression functions.
Are these functions still needed. They were introduced to allow individual
bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that
use existence flags to see if they are still required.
04/08: Not discussed
11/08: Not discussed
25/08: Not discussed
108
dfdl:hidden
There has been some discussion on whether the 'hidden' global group should
be indicated in some way.
04/08: A lively discussion. The specification is works as currently
defined so whether changes need to be made to make tooling easier. There
shouldn't be 'conventions' in particular tooling as they must be able to
properly deal with schema from other tools that would not obey those
conventions. Steve stated that it is often dangerous to hide too much from
users when they can see they underlying schema. To be continued.
25/08: there has been some offline discussions about simplifying how
hidden elements are implemented. The proposal is
dfdl:hidden property on xs:element only
xs:minOccurs and xs:maxOccurs MUST be 0 when hidden
dfdl:minOccurs and dfdl:maxOccurs for hidden elements only.
An element is 'required' when dfdl:minOccurs >0 and normal default
processing occurs.
The schema, without dfdl annotations, must match the infoset so
assumption is that non-DFDL tools, such as mappers, will ignore/not show
elements with xs:minOccurs and xs:maxOccurs = '0'
109
dfdl:discriminator : the 'message' attribute
>From Tim:
I remembered the reason why I thought this was a good idea.
Consider the situation where someone is generating their DFDL schema from
meta-data. The model is large, and consists of many references to global
structures. Each global structure ( e.g. an HL7 segment ) is identified in
a particular way. Sometimes the segment is required, sometimes it is not.
Sometimes it occurs as a child of a choice group, and sometimes not.
Regardless, it is highly likely that the segment will be identified in the
same way wherever it occurs. A natural decision for the modeler would be
to create a dfdl:discriminator on all references to the segement, even if
the ref is not under a point of uncertainty. It's harmless, and it carries
no performance penalty. If we disallow the "message" attribute, it will
force the modeler to put in extra logic to work out whether the ref is
under a POI, and generate an assert/discriminator as appropriate.
I'd be interested to know what Steph thinks about this - I think I've
heard her say that she sometimes uses discriminators where an assert would
have done the job, just to maintain consistency throughout the model.
04/08: not discussed.
11/08: Not discussed
25/08: Not discussed
110
Semantics of newVariableInstance and setVariable
what should a DFDL processor ( parser or serializer ) do when it cannot
evaluate the expression in a newVariableInstance or setVariable
annotation?
Moving the setting of variable values after the element has been parsed
just creates other problems A new instance must be available to other
expressions on the same component, and to the children of a group/element.
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable /
newVariableInstance annotations which *cannot* be evaluated until the
element has been parsed.
For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is
started
- if it cannot be evaluated, add it to a list of annotations that must be
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable
that was being set/created then throw a processing error ( because the
result will be undefined ). This will probably require the
variable/instance to be placed into a 'not available' state until its
expression is resolvable
25/08: There was a brief discussion as IBM needs a resolution soon. Is it
possible to restrict newVariableInstance to backward references only so
remove the problem? setVariable must obviously be able to access the
current value.
111
Daffodil DFDL parser
Bob and Alejandro described the new implementation that they have
developed. It is a new code base and is not based on the Deffudle
prototype. It is written in scala and implements approximately 80% of the
features in the public comments draft of DFDL V1. Alejandro will send a
list of the features not implemented.
We discussed the scenarios that motivated the development which was to
extract data from various sources and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the
functionality. IBM WG members will get approval the company to allow them
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation
of DFDL then we will need to work out how that would be funded and
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will
investigate
25/08: Alejandro had sent a list of the functions that he has implemented
and Steve ahd responding indicating the extra functions he thought were
essential.
Since then Alejandro has implemented some of the missing functions, such
as escape schemes, pre-defined variables, binary decimal numbers, etc, and
will update his list.
Bob is planning to make the parser available on the internet to allow
testing.
His organisation is being reorganised and he doesn't know what the
priority of Daffodill will be so it is essential that we move quickly. It
would help if IBM could indicate its support for Daffodil in some
semi-formal way.
Discussed how to certify DFDL implementations. Alan to investigate if OGF
have a defined process.
112
DFDL certification process
113
2. Regular Expressions.
The DFDL regular expressions should provide lookahead and backreferences.
Is the current regular expression language sufficient?
Discussed two aspects:
a. Is the XML regular expression language the correct one to use. Tim
asked if DFDL needs to specify an language at all and should leave it to
implementers to pick one. That would inhibit portability of schema.
b. A regular expression property on an assert/discriminator as an
alternative to the test expression. Either a DFDL expression or a regular
expression could be specified but not both.
114
3. OGF 30
OGF30 takes place on October 25-29 in Brussels
Should we have a WG session?
Closed actions
No
Action
104
Expressions
Discuss error behaviour when evaluating an expression in various contexts
- All properties:
wrong type returned : schema definition error
exception when evaluating expression : schema definition error
referenced variables/paths not available : schema definition error
- Properties which allow a forward reference
referenced variables/paths not available : no error. DFDL processor
continues processing until the expression result is available, then acts
on the result.
21/07: Steve stated the current definition that returning the incorrect
type was a schema definition error and everything else was a processing
error.
04/08: Not discussed
25/08: Closed
Work items:
No
Item
target version
status
005
Improvements on property descriptions
not started
012
Reordering the properties discussion: move representation earlier, improve
flow of topics
not started
036
Update dfdl schema with change properties
ongoing
042
Mapping of the DFDL infoset to XDM
none
not required for V1 specification
070
Write DFDL primer
071
Write test cases.
083
Implement RFC2116
Regards
Alan Powell
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell(a)uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0
1. Current Actions
2. Regular Expressions.
The DFDL regular expressions should provide lookahead and backreferences.
Is the current regular expression language sufficient?
3. OGF 30
OGF30 takes place on October 25-29 in Brussels
Current Actions:
No
Action
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
...
17/02: IBM is willing in principle to publish the test case format and
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
03/03: Discussions have been taking place on the subset of tests that will
be provided.
10/03: work is progressing
17/03: work is progressing
31/03: work is progressing
14/04: And XML test case format has been defined and is being tested.
21/04. Schema for TDML defined. Need to define how this and the test cases
will be made public
05/05: Work still progressing
12/05: Work still progressing
02/06: Work still progressing on technical and legal considerations
...
11/08: work continues
085
ALL: publicise Public comments phase to ensure a good review..
14/04: see minutes
21/04: Press release, OMG and other standards bodies.
05/05: Alan and Steve H have contacted other standards bodies. Will ask
them to add comments on spec
15/05: still no public comments
02/06: No public comments
16/06: Public comments period has ended with no external comments. Alan
had posted changes made in draft 041. Steve suggested send a note to the
WG highlighting these changes. Steve also suggested requesting an
extension as other IBM groups may review. We discussed whether this was
necessary as changes will need to be made during the implementation phase
anyway. Alan to ask OGF what the process is for changes post public
comment.
23/06: Still no comments. Alan will contact OGF to understand the rest of
the process.
30/06: Alan has emailed Joel asking what the process is now public comment
period is over and can we update the published version with WG updates. No
response yet.
07/07: No response. Alan will chase up
14/07: No response from Joel. Sent email to Greg Newby by no response.
21/07: Still no response.
04/08: Joel has responded that it is up to the WG to decide if the changes
are significant enough to need additional review. Alan to contact David
Martin and Erwin Laure for guidance if we split the specification.
11/08: Received a response from Joel that the WG can decide if a re-
public review is necessary before becoming a 'proposed recommendation'.
Alan responded that the WG agreed that a re-review was not necessary. The
next stage is for OGF review committee to approve publication.
099
Splitting the specification in simpler sections.
07/07: Steve sent a proposal but not discussed. Alan will arrange a
separate call.
14/07:Discussed Steve's proposal and Suman's and Alan's comments.
Need to add choice, validation, facets.
Also how does an implementation declare which subsets it supports.
Suggested levels and/or profiles. Steve highlighted a problem when a DFDL
schema from an implementation of just the core functions was moved to a
full DFDL implementation what should happen about the missing properties.
Does the full implementation need to be aware of subsets of functions?
Should it raise a schema definition error for use of a function not in the
subset.
21/07: no progress
04/08: Steve had updated proposed groups of function.
(Subset_proposal_v2.ppt). We discussed whether its is better to have
discrete sets of functions or expanding levels of function.
Purpose of subsetting is:
1. Allow simpler implementations. (main purpose)
2. Simplify tooling
3. Simplify specification.
Steve to contact previous members of WG to check if we have the correct
subsets
11/08: Steve sent an email to previous members of the WG asking for
opinions on splitting the specification. Bob McGrath from National Center
For Supercomputing responded that they had implemented about 80% of the
function. Alejandro will send a description of the function they have
implemented.
Action will be raised to track the Daffodil implementation
101
Semantics of 'fixed'
21/07: Discussed whether not matching the 'fixed' value should be a
validation error or processing error. Decided that for consistency it
should be a validation error.
It would be useful however to avoid having to duplication of facet
information in an assert which could become unwieldy for, say, a large
enumeration.
Suggestions
- a parser option that 'converted all validation errors to processing
errors'
- a dfdl expression function that 'applied all facets' or 'applied
specific facet' to a particular element.
Stephanie will produce some examples of how this could be used..
04/08: Stephanie had produced examples but they were not discussed due to
lack of time
11/08: We started to discuss Stephanie's HIPPA example but ran out of
time.
104
Expressions
Discuss error behaviour when evaluating an expression in various contexts
- All properties:
wrong type returned : schema definition error
exception when evaluating expression : schema definition error
referenced variables/paths not available : schema definition error
- Properties which allow a forward reference
referenced variables/paths not available : no error. DFDL processor
continues processing until the expression result is available, then acts
on the result.
21/07: Steve stated the current definition that returning the incorrect
type was a schema definition error and everything else was a processing
error.
04/08: Not discussed
11/08: Not discussed
107
teston/testoff dfdl expression functions.
Are these functions still needed. They were introduced to allow individual
bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that
use existence flags to see if they are still required.
04/08: Not discussed
11/08: Not discussed
108
dfdl:hidden
There has been some discussion on whether the 'hidden' global group should
be indicated in some way.
04/08: A lively discussion. The specification is works as currently
defined so whether changes need to be made to make tooling easier. There
shouldn't be 'conventions' in particular tooling as they must be able to
properly deal with schema from other tools that would not obey those
conventions. Steve stated that it is often dangerous to hide too much from
users when they can see they underlying schema. To be continued.
109
dfdl:discriminator : the 'message' attribute
>From Tim:
I remembered the reason why I thought this was a good idea.
Consider the situation where someone is generating their DFDL schema from
meta-data. The model is large, and consists of many references to global
structures. Each global structure ( e.g. an HL7 segment ) is identified in
a particular way. Sometimes the segment is required, sometimes it is not.
Sometimes it occurs as a child of a choice group, and sometimes not.
Regardless, it is highly likely that the segment will be identified in the
same way wherever it occurs. A natural decision for the modeler would be
to create a dfdl:discriminator on all references to the segement, even if
the ref is not under a point of uncertainty. It's harmless, and it carries
no performance penalty. If we disallow the "message" attribute, it will
force the modeler to put in extra logic to work out whether the ref is
under a POI, and generate an assert/discriminator as appropriate.
I'd be interested to know what Steph thinks about this - I think I've
heard her say that she sometimes uses discriminators where an assert would
have done the job, just to maintain consistency throughout the model.
04/08: not discussed.
11/08: Not discussed
110
Semantics of newVariableInstance and setVariable
what should a DFDL processor ( parser or serializer ) do when it cannot
evaluate the expression in a newVariableInstance or setVariable
annotation?
Moving the setting of variable values after the element has been parsed
just creates other problems A new instance must be available to other
expressions on the same component, and to the children of a group/element.
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable /
newVariableInstance annotations which *cannot* be evaluated until the
element has been parsed.
For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is
started
- if it cannot be evaluated, add it to a list of annotations that must be
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable
that was being set/created then throw a processing error ( because the
result will be undefined ). This will probably require the
variable/instance to be placed into a 'not available' state until its
expression is resolvable
111
Daffodil DFDL parser
Bob and Alejandro described the new implementation that they have
developed. It is a new code base and is not based on the Deffudle
prototype. It is written in scala and implements aproximately 80% of the
features in the public comments draft of DFDL V1. Alejandro will send a
list of the features not implemented.
We discussed the scenarios that motivated the development which was to
extract data from various soucres and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the
functionality. IBM WG members will get approval the company to allow them
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation
of DFDL then we will need to work out how that would be funded and
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will
investigate
Regards
Alan Powell
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell(a)uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0
Open Grid Forum: Data Format Description Language Working Group
OGF DFDL Working Group Call, August 11-2010
Attendees
Steve Hanson (IBM)
Alan Powell (IBM)
Stephanie Fetzer (IBM)
Tim Kimber(IBM)
Bob McGrath (National Center for Supercomputing Applications)
Alejandro Rodriguez (National Center for Supercomputing Applications)
Apologies
Mike Beckerle (Oco)
Suman Kalia (IBM)
1. Current Actions
Updated Below
2 Semantics of newVariableInstance and setVariable
what should a DFDL processor ( parser or serializer ) do when it cannot
evaluate the expression in a newVariableInstance or setVariable
annotation?
Moving the setting of variable values after the element has been parsed
just creates other problems A new instance must be available to other
expressions on the same component, and to the children of a group/element.
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable /
newVariableInstance annotations which *cannot* be evaluated until the
element has been parsed.
For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is
started
- if it cannot be evaluated, add it to a list of annotations that must be
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable
that was being set/created then throw a processing error ( because the
result will be undefined ). This will probably require the
variable/instance to be placed into a 'not available' state until its
expression is resolvable
11/08: Not discussed. Action raised
3 Daffodil DFDL parser implementation at National Center for
Supercomputing Applications
Bob and Alejandro described the new implementation that they have
developed. It is a new code base and is not based on the Deffudle
prototype. It is written in scala and implements aproximately 80% of the
features in the public comments draft of DFDL V1. Alejandro will send a
list of the features not implemented.
We discussed the scenarios that motivated the development which was to
extract data from various soucres and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the
functionality. IBM WG members will get approval the company to allow them
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation
of DFDL then we will need to work out how that would be funded and
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will
investigate
Meeting closed, 16:30
Next call Wednesday 25 August 2010 15:00 UK (10:00 ET)
Next action: 112
Actions raised at this meeting
No
Action
110
Semantics of newVariableInstance and setVariable
what should a DFDL processor ( parser or serializer ) do when it cannot
evaluate the expression in a newVariableInstance or setVariable
annotation?
Moving the setting of variable values after the element has been parsed
just creates other problems A new instance must be available to other
expressions on the same component, and to the children of a group/element.
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable /
newVariableInstance annotations which *cannot* be evaluated until the
element has been parsed.
For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is
started
- if it cannot be evaluated, add it to a list of annotations that must be
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable
that was being set/created then throw a processing error ( because the
result will be undefined ). This will probably require the
variable/instance to be placed into a 'not available' state until its
expression is resolvable
111
Daffodil DFDL parser
Bob and Alejandro described the new implementation that they have
developed. It is a new code base and is not based on the Deffudle
prototype. It is written in scala and implements aproximately 80% of the
features in the public comments draft of DFDL V1. Alejandro will send a
list of the features not implemented.
We discussed the scenarios that motivated the development which was to
extract data from various soucres and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the
functionality. IBM WG members will get approval the company to allow them
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation
of DFDL then we will need to work out how that would be funded and
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will
investigate
Current Actions:
No
Action
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
...
17/02: IBM is willing in principle to publish the test case format and
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
03/03: Discussions have been taking place on the subset of tests that will
be provided.
10/03: work is progressing
17/03: work is progressing
31/03: work is progressing
14/04: And XML test case format has been defined and is being tested.
21/04. Schema for TDML defined. Need to define how this and the test cases
will be made public
05/05: Work still progressing
12/05: Work still progressing
02/06: Work still progressing on technical and legal considerations
...
11/08: work continues
085
ALL: publicise Public comments phase to ensure a good review..
14/04: see minutes
21/04: Press release, OMG and other standards bodies.
05/05: Alan and Steve H have contacted other standards bodies. Will ask
them to add comments on spec
15/05: still no public comments
02/06: No public comments
16/06: Public comments period has ended with no external comments. Alan
had posted changes made in draft 041. Steve suggested send a note to the
WG highlighting these changes. Steve also suggested requesting an
extension as other IBM groups may review. We discussed whether this was
necessary as changes will need to be made during the implementation phase
anyway. Alan to ask OGF what the process is for changes post public
comment.
23/06: Still no comments. Alan will contact OGF to understand the rest of
the process.
30/06: Alan has emailed Joel asking what the process is now public comment
period is over and can we update the published version with WG updates. No
response yet.
07/07: No response. Alan will chase up
14/07: No response from Joel. Sent email to Greg Newby by no response.
21/07: Still no response.
04/08: Joel has responded that it is up to the WG to decide if the changes
are significant enough to need additional review. Alan to contact David
Martin and Erwin Laure for guidance if we split the specification.
11/08: Received a response from Joel that the WG can decide if a re-
public review is necessary before becoming a 'proposed recommendation'.
Alan responded that the WG agreed that a re-review was not necessary. The
next stage is for OGF review committee to approve publication.
099
Splitting the specification in simpler sections.
07/07: Steve sent a proposal but not discussed. Alan will arrange a
separate call.
14/07:Discussed Steve's proposal and Suman's and Alan's comments.
Need to add choice, validation, facets.
Also how does an implementation declare which subsets it supports.
Suggested levels and/or profiles. Steve highlighted a problem when a DFDL
schema from an implementation of just the core functions was moved to a
full DFDL implementation what should happen about the missing properties.
Does the full implementation need to be aware of subsets of functions?
Should it raise a schema definition error for use of a function not in the
subset.
21/07: no progress
04/08: Steve had updated proposed groups of function.
(Subset_proposal_v2.ppt). We discussed whether its is better to have
discrete sets of functions or expanding levels of function.
Purpose of subsetting is:
1. Allow simpler implementations. (main purpose)
2. Simplify tooling
3. Simplify specification.
Steve to contact previous members of WG to check if we have the correct
subsets
11/08: Steve sent an email to previous members of the WG asking for
opinions on splitting the specification. Bob McGrath from National Center
For Supercomputing responded that they had implemented about 80% of the
function. Alejandro will send a description of the function they have
implemented.
Action will be raised to track the Daffodil implementation
101
Semantics of 'fixed'
21/07: Discussed whether not matching the 'fixed' value should be a
validation error or processing error. Decided that for consistency it
should be a validation error.
It would be useful however to avoid having to duplication of facet
information in an assert which could become unwieldy for, say, a large
enumeration.
Suggestions
- a parser option that 'converted all validation errors to processing
errors'
- a dfdl expression function that 'applied all facets' or 'applied
specific facet' to a particular element.
Stephanie will produce some examples of how this could be used..
04/08: Stephanie had produced examples but they were not discussed due to
lack of time
11/08: We started to discuss Stephanie's HIPPA example but ran out of
time.
104
Expressions
Discuss error behaviour when evaluating an expression in various contexts
- All properties:
wrong type returned : schema definition error
exception when evaluating expression : schema definition error
referenced variables/paths not available : schema definition error
- Properties which allow a forward reference
referenced variables/paths not available : no error. DFDL processor
continues processing until the expression result is available, then acts
on the result.
21/07: Steve stated the current definition that returning the incorrect
type was a schema definition error and everything else was a processing
error.
04/08: Not discussed
11/08: Not discussed
107
teston/testoff dfdl expression functions.
Are these functions still needed. They were introduced to allow individual
bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that
use existence flags to see if they are still required.
04/08: Not discussed
11/08: Not discussed
108
dfdl:hidden
There has been some discussion on whether the 'hidden' global group should
be indicated in some way.
04/08: A lively discussion. The specification is works as currently
defined so whether changes need to be made to make tooling easier. There
shouldn't be 'conventions' in particular tooling as they must be able to
properly deal with schema from other tools that would not obey those
conventions. Steve stated that it is often dangerous to hide too much from
users when they can see they underlying schema. To be continued.
109
dfdl:discriminator : the 'message' attribute
>From Tim:
I remembered the reason why I thought this was a good idea.
Consider the situation where someone is generating their DFDL schema from
meta-data. The model is large, and consists of many references to global
structures. Each global structure ( e.g. an HL7 segment ) is identified in
a particular way. Sometimes the segment is required, sometimes it is not.
Sometimes it occurs as a child of a choice group, and sometimes not.
Regardless, it is highly likely that the segment will be identified in the
same way wherever it occurs. A natural decision for the modeler would be
to create a dfdl:discriminator on all references to the segement, even if
the ref is not under a point of uncertainty. It's harmless, and it carries
no performance penalty. If we disallow the "message" attribute, it will
force the modeler to put in extra logic to work out whether the ref is
under a POI, and generate an assert/discriminator as appropriate.
I'd be interested to know what Steph thinks about this - I think I've
heard her say that she sometimes uses discriminators where an assert would
have done the job, just to maintain consistency throughout the model.
04/08: not discussed.
11/08: Not discussed
110
Semantics of newVariableInstance and setVariable
what should a DFDL processor ( parser or serializer ) do when it cannot
evaluate the expression in a newVariableInstance or setVariable
annotation?
Moving the setting of variable values after the element has been parsed
just creates other problems A new instance must be available to other
expressions on the same component, and to the children of a group/element.
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable /
newVariableInstance annotations which *cannot* be evaluated until the
element has been parsed.
For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is
started
- if it cannot be evaluated, add it to a list of annotations that must be
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable
that was being set/created then throw a processing error ( because the
result will be undefined ). This will probably require the
variable/instance to be placed into a 'not available' state until its
expression is resolvable
111
Daffodil DFDL parser
Bob and Alejandro described the new implementation that they have
developed. It is a new code base and is not based on the Deffudle
prototype. It is written in scala and implements aproximately 80% of the
features in the public comments draft of DFDL V1. Alejandro will send a
list of the features not implemented.
We discussed the scenarios that motivated the development which was to
extract data from various soucres and transform into canonical formats.
Bob offered to make Daffodil available for the WG to assess the
functionality. IBM WG members will get approval the company to allow them
to receive Daffodil.
Bob raised the question that if Daffodil becomes the public implementation
of DFDL then we will need to work out how that would be funded and
managed.
It would be helpful if IBM test cases were available to Daffodil. IBM will
investigate
Closed actions
No
Action
Work items:
No
Item
target version
status
005
Improvements on property descriptions
not started
012
Reordering the properties discussion: move representation earlier, improve
flow of topics
not started
036
Update dfdl schema with change properties
ongoing
042
Mapping of the DFDL infoset to XDM
none
not required for V1 specification
070
Write DFDL primer
071
Write test cases.
083
Implement RFC2116
Regards
Alan Powell
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell(a)uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0
Alan and Steve both away
1
0
1. Current Actions
2 Semantics of newVariableInstance and setVariable
what should a DFDL processor ( parser or serializer ) do when it cannot
evaluate the expression in a newVariableInstance or setVariable
annotation?
Moving the setting of variable values into the END_ELEMENT state just
creates other problems A new instance must be available to other
expressions on the same component, and to the children of a group/element.
So it cannot be left until the end of the element.
On the other hand, there are clearly some types of setVariable /
newVariableInstance annotations which *cannot* be evaluated until the
END_ELEMENT state.
For the parser, it might be OK to
- evaluate the expression when the component ( element or group ) is
started
- if it cannot be evaluated, add it to a list of annotations that must be
processed at the end of the component
- if in the mean time any other expressions attempt to access the variable
that was being set/created then throw a processing error ( because the
result will be undefined ). This will probably require the
variable/instance to be placed into a 'not available' state until its
expression is resolvable
Current Actions:
No
Action
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
...
17/02: IBM is willing in principle to publish the test case format and
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
03/03: Discussions have been taking place on the subset of tests that will
be provided.
10/03: work is progressing
17/03: work is progressing
31/03: work is progressing
14/04: And XML test case format has been defined and is being tested.
21/04. Schema for TDML defined. Need to define how this and the test cases
will be made public
05/05: Work still progressing
12/05: Work still progressing
02/06: Work still progressing on technical and legal considerations
...
21/07: work continues
04/08: work continues
085
ALL: publicize Public comments phase to ensure a good review..
14/04: see minutes
21/04: Press release, OMG and other standards bodies.
05/05: Alan and Steve H have contacted other standards bodies. Will ask
them to add comments on spec
15/05: still no public comments
02/06: No public comments
16/06: Public comments period has ended with no external comments. Alan
had posted changes made in draft 041. Steve suggested send a note to the
WG highlighting these changes. Steve also suggested requesting an
extension as other IBM groups may review. We discussed whether this was
necessary as changes will need to be made during the implementation phase
anyway. Alan to ask OGF what the process is for changes post public
comment.
23/06: Still no comments. Alan will contact OGF to understand the rest of
the process.
30/06: Alan has emailed Joel asking what the process is now public comment
period is over andcan we update the published version with WG updates. No
response yet.
07/07: No response. Alan will chase up
14/07: No response from Joel. Sent email to Greg Newby by no response.
21/07: Still no response.
04/08: Joel has responded that it is up to the WG to decide if the changes
are significant enough to need additional review. Alan to contact David
Martin and Erwin Laure for guidance if we split the specification.
099
Splitting the specification in simpler sections.
07/07: Steve sent a proposal but not discussed. Alan will arrange a
separate call.
14/07:Discussed Steve's proposal and Suman's and Alan's comments.
Need to add choice, validation, facets.
Also how does an implementation declare which subsets it supports.
Suggested levels and/or profiles. Steve highlighted a problem when a DFDL
schema from an implementation of just the core functions was moved to a
full DFDL implementation what should happen about the missing properties.
Does the full implementation need to be aware of subsets of functions?
Should it raise a schema definition error for use of a function not in the
subset.
21/07: no progress
04/08: Steve had updated proposed groups of function.
(Subset_proposal_v2.ppt). We discussed whether its is better to have
discrete sets of functions or expanding levels of function.
Purpose of subsetting is:
1. Allow simpler implementations. (main purpose)
2. Simplify tooling
3. Simplify specification.
Steve to contact previous members of WG to check if we have the correct
subsets.
101
Semantics of 'fixed'
21/07: Discussed whether not matching the 'fixed' value should be a
validation error or processing error. Decided that for consistency it
should be a validation error.
It would be useful however to avoid having to duplication of facet
information in an assert which could become unwieldy for, say, a large
enumeration.
Suggestions
- a parser option that 'converted all validation errors to porcessing
errors'
- a dfdl expression function that 'applied all facets' or 'applied
specific facet' to a particular element.
Stephanie will produce some examples of how this could be used..
04/08: Stephanie had produced examples but they were not discussed due to
lack of time
104
Expressions
Discuss error behaviour when evaluating an expression in various contexts
- All properties:
wrong type returned : schema definition error
exception when evaluating expression : schema definition error
referenced variables/paths not available : schema definition error
- Properties which allow a forward reference
referenced variables/paths not available : no error. DFDL processor
continues processing until the expression result is available, then acts
on the result.
21/07: Steve stated the current definition that returning the incorrect
type was a schema definition error and everything else was a processing
error.
04/08: Not discussed
107
teston/testoff dfdl expression functions.
Are these functions still needed. They were introduced to allow individual
bits to be set in a byte. Steve to look at TLog and ISO 8583 formats that
use existence flags to see if they are still required.
04/08: Not discussed
108
dfdl:hidden
There has been some discussion on whether the 'hidden' global group should
be indicated in some way.
04/08: A lively discussion. The specification is works as currently
defined so whether changes need to be made to make tooling easier. There
shouldn't be 'conventions' in particular tooling as they must be able to
properly deal with schema from other tools that would not obey those
conventions. Steve stated that it is often dangerous to hide too much from
users when they can see they underlying schema. To be continued.
109
dfdl:discriminator : the 'message' attribute
>From Tim:
I remembered the reason why I thought this was a good idea.
Consider the situation where someone is generating their DFDL schema from
meta-data. The model is large, and consists of many references to global
structures. Each global structure ( e.g. an HL7 segment ) is identified in
a particular way. Sometimes the segment is required, sometimes it is not.
Sometimes it occurs as a child of a choice group, and sometimes not.
Regardless, it is highly likely that the segment will be identified in the
same way wherever it occurs. A natural decision for the modeler would be
to create a dfdl:discriminator on all references to the segement, even if
the ref is not under a point of uncertainty. It's harmless, and it carries
no performance penalty. If we disallow the "message" attribute, it will
force the modeler to put in extra logic to work out whether the ref is
under a POI, and generate an assert/discriminator as appropriate.
I'd be interested to know what Steph thinks about this - I think I've
heard her say that she sometimes uses discriminators where an assert would
have done the job, just to maintain consistency throughout the model.
04/08: not discussed.
Regards
Alan Powell
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell(a)uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
1
0