Ignore this, see other thread
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
27/03/2019 16:15
Subject:
Re: Part 2 -
Re: Action 307 - Demonstrate implementation interoperability - BOM
Hi Mike
The outstanding item to resolve is what
to about BOMs.
307
| Demonstrate
implementation interoperability (Steve, Mike)
4/9: Need to make sure that DFDL spec
section 21 lists a correct set of optional features, the implication being
that Daffodil and IBM DFDL (and any other minimally conforming implementation)
correctly implement the remaining required features. First step - see if
there are any obvious omissions.
16/10: Steve sent email stating IBM
DFDL's missing core features and non-compliant behaviour, and Mike responded.
Discussion continuing via two separate email threads. Part 1 for core features.
Part 2 for optional features. For the core features, agreed that the following
needs to happen:
1) IBM adds encodingErrorPolicy='replace'
2) Daffodil adds encodingErrorPolicy='error'
3) Daffodil ensures that, if not implementing
default/fixed when parsing, it gives an SDE if a required occurrence has
empty rep and element has default/fixed set.
4) A position is agreed on BOM handling
- ongoing via email.
1/11: Just BOM to conclude on from the
above list
15/11: Not discussed
29/11: No further progress.
10/1/19:
1) IBM have started the work to add encodingErrorPolicy='replace'.
2) Daffodil have a temp setting to tolerate
encodingErrorPolicy='error' with
a warning.
3) Daffodil to investigate whether this
is feasible.
4) More discussion needed on BOM
7/2: Updates:
1) In progress
2) As above.
3) In progress
4) No progress |
I can't find the email thread the action
mentions, but my thoughts are as follows:
There are 3 options -
a)
keep the spec as it is which implies BOM processing is core - a problem
as neither Daffodil nor IBM DFDL implement it
b)
make BOM processing optional - which means there would need to be a property
to switch it on or off in case an implementation started to support it
later
c)
remove BOM processing altogether from 1.0 and add to the 2.0 list
I am leaning towards c) on the following
grounds:
- Only one customer that I know of ever
requested BOM processing for non-XML data (in 2010, for MRM, before IBM
DFDL available)
- BOM processing only applies to the
message as a whole, not to any embedded Unicode fragments, so support is
selective anyway
- It is possible to model an optional
BOM and use it to set a user-defined encoding variable which is then used
by the rest of the schema
I have a schema that models BOM and it
successfully parses and unparses the 3 variants fine (no BOM present, BOM
for BE present, BOM for LE present).
If you have the BOM email thread please
can you forward it, so I can see if I have missed any part of the thought
process?
I have found the original DFDL WG thread
from 2011 when we added BOM support to the spec via erratum 3.7, which
discusses the original motivation and design, I'll send you it.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Steve Hanson/UK/IBM
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
09/10/2018 12:16
Subject:
Re: Part 2 -
Re: Action 307 - Demonstrate implementation interoperability
Mike, responses
in-line below.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
To:
Steve Hanson <smh@uk.ibm.com>
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
04/10/2018 00:50
Subject:
Part 2 - Re:
Action 307 - Demonstrate implementation interoperability
Based on Daffodil JIRA ticket backlog, and documentation
at https://daffodil.apache.org/unsupported/,
below are DFDL non-core features that are not supported by Daffodil, but
that seem to be supported by IBM DFDL (based on my not finding anything
that says they aren't implemented), and so are possibly in use in DFDL
schemas we will need to use for interoperability testing.
Please advise if IBM DFDL does *not* implement any of
these.
* default, fixed - for defaulting values at parse time
- Daffodil support for this is partial at parse time, unsupported at unparse
time. The fixed attribute isn't supported at all.
* unordered sequences. IBM
DFDL does not support default/fixed when parsing (see other thread).
* byte-value entities - in contexts other than fillByte
* ICU symbols 'u' and 'I' in calendarPattern
* binaryFloatRep 'ibm390Hex'
* documentFinalTerminatorCanBeMissing
* textStandardBase - with value not equal to 10
* lengthKind 'prefixed', and prefixIncludesPrefixLength,
prefixLengthType - Note IBM restricts prefixLengthType to a type that itself
cannot be prefixed. Correct.
* assert with failure type 'recoverableError'
* calendarObserveDST
* calendarCenturyStart
* textNumberPattern 'V' and 'P' symbols
* CCSID for specifying dfdl:encoding
* nilKind 'literalValue' for binary data
* choiceLengthKind 'explicit' and choiceLength
* separatorSuppressionPolicy - behaviors for these in
Daffodil are known to be both non-standard currently and also different
from IBM DFDL. This needs correcting. IBM
DFDL does not support 'trailingEmptyStrict'
.
The above list (after review/correction) needs to be crossed
with the published DFDL schemas on github that were published by IBM. The
features required to run those DFDL schemas are required for Daffodil to
implement before the interoperability demonstration.
Below are features of DFDL I believe neither IBM DFDL
nor Daffodil implement, and as they are non-core, they need not be implemented
by either for interoperability testing:
* lengthKind 'endOfParent'
* nilKind 'logicalValue' IBM
DFDL implements this.
* occursCountKind 'stopValue' (and occursStopValue)
* textBiDi - and other related biDi properties
* useNilForDefault IBM
DFDL implements this
* floating
* fn:exactlyOne function
* fn:namespace-URI() function
* dfdl:escapeCharacterPolicy 'delimiters' - daffodil doesn't
implement this property at all.
Below are features of Daffodil that are not implemented
by IBM DFDL and so cannot be used in schemas created using Daffodil that
intend to be interoperable. These are either easy to work around, or impossible
to work around, so are not a big deal, they just have to be kept in mind
if considering a DFDL schema for use in interoperability testing. This
includes schemas published on github, for image formats, CSV, etc.,
and a number of the FOUO schema published on DI2E.net/forge.mil
- some of those quite possibly can work with IBM DFDL, and if they can
do so, they should be modified so that they can be included in the interoperability
testing.
The list below mostly comes from https://www.ibm.com/support/knowledgecenter/en/SSMKHH_10.0.0/com.ibm.etools.mft.doc/df00150_.htm
* calendarTimeZone specified as "" (empty string)
- This is the most problematic one, so I've put it first. The predefined
DFDL named format that is supplied with Daffodil and used as a starting
point by most schemas has calendarTimeZone="". This is because
customers didn't like that their datetimes were all being appended with
"+00:00" for UTC time zone (in the infoset) when the data simply
didn't specify a time zone. Schemas intended for interoperability testing
should specify 'UTC' for this property.
* calendarTimeZone specified as an Olson format time zone
* inputValueCalc, outputValueCalc
* hiddenGroupRef
* Asserts and discriminators on simple type definitions
or global element definitions
* fn:concat with more than 4 arguments
* non-8-bit charset encodings
* bitOrder not mostSignifcantBitFirst
* '@' in textNumberPattern (TBD: unsure if Daffodil has
this)
* "_" in calendarLanguage
* calendarLanguage an expression
* assert & discriminator messages an expression
* binaryBooleanTrueRep as "" (empty string)
* checks for binary packed numbers with length units 'bits'
and not a multiple of 4 length, and similarly for alignmentUnits bits and
alignment not multiple of 4. (relevant to negative tests only)
* lengthKind 'implicit' complex elements inside lengthKind
'delimited' complex elements.
Additionally IBM DFDL
contains these bugs in its expression processing:
1) Path
locations are not correctly validated. Specifically, array elements without
predicates and references into other choice branches are not flagged as
errors.
2) In DFDL expression functions, the namespace
prefixes for http://www.w3.org/2001/XMLSchema
and http://www.w3.org/2005/xpath-functions"
must be 'xs' and 'fn' respectively, even if not declared.
3) In DFDL expressions, namespace prefixes
in paths are ignored and matching is against element name only.
For interoperability testing
therefore:
- For 1) avoid the use
of either example
- For 2) always declare
xmlns:xs and xmlns:fn and always use those prefixes in expressions
- For 3) avoid sibling
elements that have same name but different namespace; use elementFormDefault="unqualified"
to avoid namespaces for local elements altogether
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU