Unable to make devote any time to DFDL WG activity until second half of
June.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
Hursley, UK
smh(a)uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
We discussed lax number processing a while back. We have the same issue
with lax calendar parsing.
The DFDL spec has this language:
- Additional lenient parsing behaviour when in 'lax' mode:
1. Values outside valid ranges are normalized (eg, "March 32 1996" is
treated as "April 1 1996")
2. Ignoring a trailing dot after a non-numeric field
3. Leading and trailing whitespace in the data but not in the pattern is
accepted
4. Whitespace in the pattern can be missing in the data
5. Partial matching on literal strings. E.g., data "20130621d" allowed
for pattern "yyyyMMdd'date' "
I suggest that the first line of that needs to add the word "may" as in
"Additional lenient parsing behaviour when in 'lax' mode MAY include:"
This is because we've discovered that lax behavior in the ICU libraries we
rely on varies from ICU-release to release. So I think we have to make the
spec consistent with the idea that "lax" parsing for numbers and calendars
is implementation-dependent, and really only "strict" behavior can be
relied upon to be durably meaningful even across releases of the same DFDL
implementation.
This doesn't make "lax" behavior entirely useless. Consider you are just
doing a one-time conversion of some data from a native format to JSON, or
XML, or to get it into your favorite data-integration tool. If you can get
it to work one-time using "lax" that's ok, because you intend to discard
the schema once your one-time conversion is complete.
So it doesn't bother me to have lax behavior. I think we just want to say
that you can't rely on it to be consistent, and you can't rely on it to
actually be any different from 'strict' behavior.
I think the alternatives are:
1) that we end up having to fork ICU libraries, carefully characterize lax
behavior in that fork, and maintain it ourselves for ever after. (I really
don't like this option. I'm just mentioning it to point out the difficulty)
2) deprecate and remove 'lax' behavior entirely and the properties
associated with specifying it.
3) make 'lax' an optional DFDL feature, so implementations can choose to
not bother implementing it.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
Apologies but I have a meeting clash.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
Hursley, UK
smh(a)uk.ibm.com
tel:+44-1962-815848
FYI: so that namespace 'sub' thing. The history, at one time we were trying
to fool IDEs like Eclipse into validating DFDL schemas properly, and so at
one point we were binding the XML Schema for DFDL schemas to namespace
prefix "sub" with its own URI which was NOT the regular XML Schema URI. We
tried to have a separate namespace for the XML Schema for DFDL Schemas.
This didn't work out, but the 'sub' artifact is still there.
We concluded that XML schema validation is too strongly wired into IDEs. We
have to simply turn it off entirely in Eclipse and tell it that ".dfdl.xsd"
files are "regular XML".
Even then it doesn't work right. We kind of gave up on this. So the
namespace URI for the target namespace in the XML Schema for DFDL Schemas
is the regular XML Schema namespace.
Interestingly, our TDML files, with extension ".tdml" can contain embedded
DFDL schemas inside the <tdml:defineSchema> element. Validation and IDE XML
support features work there and provide some editing and completion
support.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
www.owlcyberdefense.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
On Tue, May 12, 2020 at 4:32 AM Marcos Bento (external) <
Marcos.Bento(a)esa.int> wrote:
> Mike,
>
> Thank you very much for the feedback.
>
> I've been able to use the schemas you referred to validate our DFDL4S
> schemas (or at least the general structure and the long forms annotations).
> I had to make a couple adaptations to the schema, to be able to load it
> with Xerces-C++ (there were some issues with the namespace xmlns:sub="
> http://www.w3.org/2001/XMLSchema") but it worked out in the end -- using
> a trivial resolver with a simple mapping between schema and location.
>
> Regarding the validation of short form annotation, I'll try to leverage on
> the Xerces-C++ Schema Content Model to implement the check.
>
> Again, thank you for the help.
>
> -Marcos
>
>
>
> On Thu, May 7, 2020 at 4:22 PM Mike Beckerle <mbeckerle.dfdl(a)gmail.com>
> wrote:
>
>>
>> Marcos,
>>
>> We use this approach in Apache Daffodil (incubating).
>>
>> We use the schemas donated by IBM. Slight modifications have happened
>> since them with new properties and such. We also created an XML schema for
>> the DFDL subset of XSD.
>>
>> There are two things you need to make this work. One is the schemas
>> themselves. The other is a "resolver" which is a java class that finds and
>> provides the included/imported file names given namespace URIs, and
>> schemaLocation attributes. This resolver gets passed to Xerces. Your
>> application can get away with a very stripped down resolver. We do have a
>> full-featured one that you can use or learn from or adapt to your purposes.
>>
>> One must still validate that short-form annotations are properly used, or
>> transform them into long form before validation. The schemas will check
>> that the dfdl prefix on things like dfdl:byteOrder="bigEndian" is valid and
>> that byteOrder is a allowed attribute name, but doesn't check that
>> dfdl:byteOrder="bigEndian" is placed on the right kind of XSD construct
>> where byteOrder is allowed.
>>
>> Similarly the property-element form like:
>>
>> <dfdl:element><dfdl:property
>> name="byteOrder">bigEndian</dfdl:property></dfdl:element>
>>
>> Also must be verified that the named property is relevant to the
>> enclosing annotation. Or you can transform to the long form instead before
>> validation. This requires escaping the value content.
>>
>> If you clone the apache daffodil git repo, you can find schemas in
>>
>> daffodil-propgen/src/main/resources/ org/apache/daffodil/xsd -
>> schemas for the annotations (based on the ones originally created by IBM)
>>
>> daffodil-lib/src/main/resources/org/apache/daffodil/xsd -
>> XMLSchema_for_DFDL.xsd - Schema for the DFDL subset of XML Schema.
>>
>> The resolver we created searches class paths to find schema files in jars
>> on the class path, or looks for an XML catalog. This is written in Scala.
>> It searches - (1) relative to current file containing the include/import
>> (2) relative to the root of each jar on the class path in order (3) in an
>> XML Catalog, found by way of a Catalog.properties file which itself is
>> searched for on the class path. We have found (1) and (2) above very
>> helpful in building schemas up from other schemas packaged in jars.
>>
>> It is in
>> daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilLoader.scala -
>> class DFDLCatalogResolver.
>>
>> If you are using C/C++ code obviously this resolver will not apply, but
>> something like what it does may be needed.
>>
>> -mike
>>
>>
>> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Owl Cyber Defense |
>> www.owlcyberdefense.com
>> Please note: Contributions to the DFDL Workgroup's email discussions are
>> subject to the OGF Intellectual Property Policy
>> <http://www.ogf.org/About/abt_policies.php>
>>
>>
>>
>> On Thu, May 7, 2020 at 5:09 AM Steve Hanson <smh(a)uk.ibm.com> wrote:
>>
>>> Hi Marcos
>>>
>>> IBM donated its set of meta schemas to the WG a while back. Mike may
>>> know where they are located, I can't see them anywhere.
>>>
>>> Regards
>>>
>>> Steve Hanson
>>>
>>> IBM Hybrid Integration, Hursley, UK
>>> Architect, *IBM DFDL*
>>> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
>>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>>> *smh(a)uk.ibm.com* <smh(a)uk.ibm.com>
>>> tel:+44-1962-815848
>>> mob:+44-7717-378890
>>> Note: I work Tuesday to Friday
>>>
>>>
>>>
>>> From: "Marcos Bento (external)" <Marcos.Bento(a)esa.int>
>>> To: dfdl-wg(a)ogf.org
>>> Date: 07/05/2020 08:38
>>> Subject: [EXTERNAL] [DFDL-WG] DFDL Meta-schema
>>> Sent by: "dfdl-wg" <dfdl-wg-bounces(a)ogf.org>
>>> ------------------------------
>>>
>>>
>>>
>>> Hi,
>>>
>>> I'm currently trying to find a way to automatically validate the schemas
>>> currently available/being developed in the scope of *DFDL4S*
>>> <http://eop-cfi.esa.int/index.php/applications/dfdl4s>.
>>> Is there an XSD that we could use as DFDL Meta-schema?
>>>
>>> The current proposed approach is to make a small standalone
>>> XercesC-based utility to make the validation.
>>> Do you have other/better suggestions?
>>>
>>> -Marcos
>>>
>>> --
>>> *HE Space for ESA - European Space Agency*
>>> Marcos Bento
>>> Mission Analysis Software Engineer
>>> System Support Division
>>> Earth Observation Projects Department
>>> Directorate of Earth Observation Programmes
>>>
>>> ESTEC
>>> Keplerlaan 1, PO Box 299
>>> NL-2200 AG Noordwijk, The Netherlands
>>> *marcos.bento(a)esa.int* <marcos.bento(a)esa.int> | *www.esa.int*
>>> <http://www.esa.int>
>>> T +31 71 565 3749
>>> This message is intended only for the recipient(s) named above. It may
>>> contain proprietary information and/or
>>> protected content. Any unauthorised disclosure, use, retention or
>>> dissemination is prohibited. If you have received
>>> this e-mail in error, please notify the sender immediately. ESA applies
>>> appropriate organisational measures to protect
>>> personal data, in case of data privacy queries, please contact the ESA
>>> Data Protection Officer (dpo(a)esa.int)
>>>
>>> --
>>> dfdl-wg mailing list
>>> dfdl-wg(a)ogf.org
>>> https://www.ogf.org/mailman/listinfo/dfdl-wg
>>>
>>>
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with number
>>> 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
>>> 3AU
>>> --
>>> dfdl-wg mailing list
>>> dfdl-wg(a)ogf.org
>>> https://www.ogf.org/mailman/listinfo/dfdl-wg
>>>
>>> This message is intended only for the recipient(s) named above. It may contain proprietary information and/or
> protected content. Any unauthorised disclosure, use, retention or dissemination is prohibited. If you have received
> this e-mail in error, please notify the sender immediately. ESA applies appropriate organisational measures to protect
> personal data, in case of data privacy queries, please contact the ESA Data Protection Officer (dpo(a)esa.int)
>
>
>
Hi,
I'm currently trying to find a way to automatically validate the schemas
currently available/being developed in the scope of DFDL4S
<http://eop-cfi.esa.int/index.php/applications/dfdl4s>.
Is there an XSD that we could use as DFDL Meta-schema?
The current proposed approach is to make a small standalone XercesC-based
utility to make the validation.
Do you have other/better suggestions?
-Marcos
--
*HE Space for ESA - European Space Agency*
Marcos Bento
Mission Analysis Software Engineer
System Support Division
Earth Observation Projects Department
Directorate of Earth Observation Programmes
ESTEC
Keplerlaan 1, PO Box 299
NL-2200 AG Noordwijk, The Netherlands
marcos.bento(a)esa.int | www.esa.int
T +31 71 565 3749
This message is intended only for the recipient(s) named above. It may contain proprietary information and/or
protected content. Any unauthorised disclosure, use, retention or dissemination is prohibited. If you have received
this e-mail in error, please notify the sender immediately. ESA applies appropriate organisational measures to protect
personal data, in case of data privacy queries, please contact the ESA Data Protection Officer (dpo(a)esa.int)