309
| Create
example scenarios to illustrate offset & pointer requirements (Bradd)
5/4/19: Daffodil have a draft proposal
for offset support, TPF have experimental implementation for pointer support.
Need examples to show the requirement, especially unparsing.
2/5: Bradd supplied an example of pointers.
On parsing the pointer is used as an absolute address to a piece of accessible
memory, and the element is parsed from that location. On unparsing memory
is allocated and unparsing of the element occurs into that location and
the pointer set to the location (memory allocation is implementation-defined).
Note the pointer value does *not* appear in the infoset. Looks like a useful
and workable addition to DFDL. Could solve the parsing requirements for
TIFF image files. Bradd also has extension for offset, which is like pointer
but uses relative location instead of absolute. Both are examples of indirection.
A further example could be specifying a file to read. Contrast this with
what DFDL has used the term 'offset' for in the past, namely as an alternative
property to alignment/skip which allows the parser/unparser to jump directly
to a point in the current buffer. These are orthogonal concepts. Noted
that parsing of ZIP files may need both. Secure implementations may need
to disallow use of pointers and/or offsets unless they can guarantee to
fill everywhere with the fill byte. Implementations should also be deterministic.
Agreed that recursion not needed to implement this. Bradd mentioned a further
concept 'overflows', an example being an array unparsed into a linked list.
Pointers proposal needs to be written up as an experimental feature.
31/5: Bradd to write up pointers proposal
as an experimental feature.
...
11/7: No update
8/8: Bradd aiming to get this written
up for next time. Also needs issue tracker raising.
29/8: Bradd unable to make the call.
17/10: Written up for review and sent
to WG but not as an experimental feature document. Mike also noted http://www.binarydom.com/sdk/doc/bddl.shtml.
Mike has reviewed and commented on the write-up, Steve needs to do the
same, then send back to Bradd. Main discussion was around unparsing, eg,
buffering implications, whether to try and format exactly or canonically.
In parallel, Bradd to create an experimental feature document (see table
below).
12/12: Bradd sent an updated document.
WG will review for next meeting. There was some discussion about the use
in the document of the term 'empty' and whether that really meant 'missing'.
This led to an in-depth discussion about the different use cases for default
values, it is likely that DFDL 2.0 will introduce support for some of these,
specifically:
- Item exists in Infoset with default
value, so unparse empty rep (the mirror of parsing, as practised by GPB)
- Item exists in data with default value,
so remove from Infoset post-validation (the mirror of unparsing, and a
requirement from z/TPF who have a post-parse option to do this)
9/1/20: No progress, still needs reviewing.
16/4: Steve & Mike to review latest
document dated 2019-12-12 for next call.
30/4: Spent some time discussing
Steve's review comments. Conclusion is that the feature is useful and a
serious candidate for DFDL 2.0. The properties seem to be the minimum needed
to handle the concepts and known use cases. As this is currently an experimental
feature we don't have to get it 100% precise now, and can impose restrictions
that z/TPF users would be ok with (for example, no initiators or terminators
allowed; binary indirection types only). Important though that the properties
and their application is driven by the grammar, so next step is for Bradd
to see how the grammar is affected. It would be nice if all the behaviour
could be handled at the same point in the grammar as 'prefixLength' but
that might not be possible. Property name 'indirectionEmptyValue' probably
needs a better name, eg 'indirectionUnusedRep'.... other suggestions welcome.
Or perhaps the dfdl:fillByte of the indirectionType could be used? |
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Steve Hanson/UK/IBM
To:
"Bradd Kadlecik"
<braddk@us.ibm.com>
Cc:
DFDL-WG <dfdl-wg@ogf.org>,
"dfdl-wg" <dfdl-wg-bounces@ogf.org>, Mike Beckerle <mbeckerle.dfdl@gmail.com>
Date:
30/04/2020 15:46
Subject:
Re: [EXTERNAL]
Re: [DFDL-WG] DFDL pointer & offset proposal
Comments for call today (yellow)
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
"Bradd Kadlecik"
<braddk@us.ibm.com>
To:
Mike Beckerle <mbeckerle.dfdl@gmail.com>
Cc:
DFDL-WG <dfdl-wg@ogf.org>
Date:
12/12/2019 15:42
Subject:
[EXTERNAL] Re:
[DFDL-WG] DFDL pointer & offset proposal
Sent by:
"dfdl-wg"
<dfdl-wg-bounces@ogf.org>
Updated with comments:
(See attached file: DFDL_Indirection_v2.docx)
Regards,
Bradd Kadlecik
z/TPF Development |
|
Phone:
1-845-433-1573
E-mail: braddk@us.ibm.com
|
2455 South Rd
Poughkeepsie, NY 12601-5400
United States |
Mike
Beckerle ---10/16/2019 05:43:52 PM--- I added some comments to your original
document. Attached. Mike Beckerle | OGF DFDL Workgroup Co-Ch
From: Mike Beckerle <mbeckerle.dfdl@gmail.com>
To: Bradd Kadlecik <braddk@us.ibm.com>
Cc: DFDL-WG <dfdl-wg@ogf.org>
Date: 10/16/2019 05:43 PM
Subject: [EXTERNAL] Re: [DFDL-WG] DFDL pointer &
offset proposal
I added some comments to your original document. Attached.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF
Intellectual Property Policy
On Wed, Sep 25, 2019 at 12:55 PM Bradd Kadlecik <braddk@us.ibm.com>
wrote:
Here's what I've put together regarding the pointer &
offset proposal for the next meeting's review.
(See attached file: DFDL_Indirection.docx)
Regards,
Bradd Kadlecik
z/TPF Development |
|
Phone:
1-845-433-1573
E-mail: braddk@us.ibm.com
|
2455 South Rd
Poughkeepsie, NY 12601-5400
United States |
Steve
Hanson---04/05/2019 12:42:34 PM---Regards
From: Steve Hanson/UK/IBM
To: Bradd Kadlecik/Poughkeepsie/IBM@IBMUS
Cc: "Mike Beckerle" <mbeckerle@tresys.com>,
"Michele Zundo" <michele.zundo@esa.int>
Date: 04/05/2019 12:42 PM
Subject: Re: Latest OGF DFDL WG Call Minutes
No
| Action
|
309
| Create
example scenarios to illustrate offset & pointer requirements (Bradd)
5/4/19: Daffodil have a draft proposal for offset support, TPF have experimental
implementation for pointer support. Need examples to show the requirement,
especially unparsing. |
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
Steve
Hanson---05/04/2019 10:29:24---I can see a difference between offsets and
pointers. If I follow an offset and parse an element x t
From: Steve Hanson/UK/IBM
To: Bradd Kadlecik/Poughkeepsie/IBM@IBMUS
Cc: "Mike Beckerle" <mbeckerle@tresys.com>,
"Michele Zundo" <michele.zundo@esa.int>
Date: 05/04/2019 10:29
Subject: Re: Latest OGF DFDL WG Call Minutes
I can see a difference between offsets and pointers. If I follow an offset
and parse an element x then I won't automatically jump back to where I
was - the next element y I parse will continue from location offset + length
x unless I use offset again to jump back to the original location. If I
follow a pointer and parse an element x, then the next element y I parse
will continue from original location + length(pointer).
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
Steve
Hanson---04/04/2019 09:42:36---This is a proposal from the Daffodil team
for offset support, needed for formats like TIFF and if we
From: Steve Hanson/UK/IBM
To: Bradd Kadlecik/Poughkeepsie/IBM@IBMUS
Cc: "Mike Beckerle" <mbeckerle@tresys.com>,
"Michele Zundo" <michele.zundo@esa.int>
Date: 04/04/2019 09:42
Subject: Re: Latest OGF DFDL WG Call Minutes
This is a proposal from the Daffodil team for offset support, needed for
formats like TIFF and if we ever want to be able to handle zip files.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74687382
I think this proposal can implement your requirement - the dfdl:offset
property can be an expression that refers to another element (your pointer
element), which can be hidden so as not to appear in the infoset. I think
what you are proposing is a more convenient way of handling offsets that
are defined dynamically in the data, as opposed to defined statically with
fixed values (though as I said below I need unparsing explained). But I
may be mis-understanding your use cases.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
Bradd
Kadlecik---03/04/2019 16:50:13---Ok, I'll work on putting a scenario together
for various pointer setups (array, complex element, str
From: Bradd Kadlecik/Poughkeepsie/IBM
To: Steve Hanson/UK/IBM@IBMGB
Cc: "Mike Beckerle" <mbeckerle@tresys.com>,
"Michele Zundo" <michele.zundo@esa.int>
Date: 03/04/2019 16:50
Subject: Re: Latest OGF DFDL WG Call Minutes
Ok, I'll work on putting a scenario together for various pointer setups
(array, complex element, string) and show how it looks for both the JSON
and the binary.
I'm currently at a conference this week but will be returning late Thursday
so expect to be available for the call Friday.
I presented the TPF specific pointer implementation at the conference this
week and there are some that will be trying to use it soon.
Regards,
Bradd Kadlecik
z/TPF Development |
|
Phone:
1-845-433-1573
E-mail: braddk@us.ibm.com
|
2455 South Rd
Poughkeepsie, NY 12601-5400
United States |
Steve
Hanson---04/03/2019 10:29:25 AM---Hi Bradd I have a few questions please
... inline below.
From: Steve Hanson/UK/IBM
To: Bradd Kadlecik/Poughkeepsie/IBM@IBMUS
Cc: dfdl-wg@ogf.org,
"Mike Beckerle" <mbeckerle@tresys.com>,
"Michele Zundo" <michele.zundo@esa.int>
Date: 04/03/2019 10:29 AM
Subject: Re: Latest OGF DFDL WG Call Minutes
Hi Bradd
I have a few questions please ... inline
below.
I'm still going to need a real worked example, starting with some actual
data and its schema, what it appears like after paring in the infoset,
and how unparsing lays it back out again.
Are you able to make the rescheduled call this Friday?
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
Bradd
Kadlecik---27/02/2019 22:22:48---Regarding proposal for offsets and pointers:
The following are the properties to be defined:
From: Bradd Kadlecik/Poughkeepsie/IBM
To: Steve Hanson/UK/IBM@IBMGB
Cc: dfdl-wg@ogf.org,
"Mike Beckerle" <mbeckerle@tresys.com>,
"Michele Zundo" <michele.zundo@esa.int>
Date: 27/02/2019 22:22
Subject: Re: Latest OGF DFDL WG Call Minutes
Regarding proposal for offsets and pointers:
The following are the properties to be defined:
SMH: I was expecting to see these properties only on dfdl:element, especially
as you say '...the element contents... ?
indirectKind Enum
Valid values 'pointer', 'offset' (there is also a thought of objectId or
refId for handling BSON but not at this time)
Specifies the type of indirection used to access the element contents in
the data stream.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
dfdl:group
SMH: I am missing the distinction between offset and pointer. Is one relative
to current position and the other relative to start of bitstream?
SMH: In earlier DFDL proposals for offset support, we had used the term
to refer to a property to be used to establish position of the current
element instead of assuming the current element followed straight after
the previous one. It would allow sparse modelling of fixed structures.
The offset could be relative to start of bitstream or some other point.
I don't think that's what you mean when you say 'offset' so I will refer
to your new concept as 'pointer'.
SMH: Assuming that indirectKind is a normal DFDL property, it can be in
scope. It would therefore need to have an enum 'None' which would be the
default used in most schemas.
indirectLength Non-negative Integer or DFDL expression
Specifies the length of the indirection in units according to the indirectUnits
property.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
dfdl:group
indirectUnits Enum
Valid values 'bytes','bits'
Specifies the units to be used for reading or writing the indirection according
to indirectLength.
The default value is 'bytes'.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
dfdl:group
SMH: I think a better approach is to provide a property dfdl:indirectType,
instead of indirectLength/indirectUnits, which refers to a simple type
(not element) that carries its own lengthKind, length & lengthUnits
properties. Similar idea to dfdl:prefixLengthType. That allows a lot of
flexibility on how the pointer can appear.
offsetBase non-empty string containing an absolute or relative XPath expression
for the base element.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
dfdl:group
The proposal would be to have the contents of the indirection be after
the LeadingAlignment and before the TrailingAlignment. This would mean
the aligment and skip factors apply to the indirection values in the data
stream instead of the the contents of the indirection.
SMH: Agree.
This also then means in an array element, each element has its own indirection
value (pointer or offset) and the alignment,skip factors then apply to
each of these indirection values.
SMH: Do you mean '...each occurrence...' ?
It would be thought that the indirection values apply only to the data
stream and not the infoset. During parse when the infoset is populated
from the data stream, the indirection values are replaced by the contents.
During unparse, the indirection values don't exist in the infoset and are
created during the writing to/creation of the data stream.
SMH: I agree that the indirection should be a purely physical thing, but
I am not clear how the value is filled in when unparsing. Where does the
value come from? outputValueCalc? Or maybe it's not needed when unparsing,
and the data is always contiguous?
For pointers, a null pointer creates the scenario of either nil representation
or empty representation depending on whether or not nillable is defined
as true. Unless default values (or 0 occurrence) are defined for all underlying
content, then this is a processing error. During unparse, the only scenario
in which a null pointer would be created is for a nil representation.
SMH: This needs more thought. The nil & default properties apply to
the contents of the indirection, not to the pointer. If you want
to give a nil semantic to the pointer value itself, then that would require
a new enum for dfdl:nilKind. I don't see why a pointer value 0 can't be
treated like any other indirection value. A missing pointer is an error
- it must be present - there is no way to control optionality because minOccurs/maxOccurs
apply to the contents. (Alternatively, if you want the concepts of nil,
default, occurs to apply to the indirect value, then dfdl:indirectType
could point at an element instead of a simple type - but that seems way
too over engineered).
Examples:
The following is the definition for the address of a null-terminated string
in which the string address may be NULL as indicated by a nillable value
of true:
<xs:element name="myString" type="xs:string" dfdl:lengthKind="delimited"
dfdl:encoding="UTF-8" dfdl:terminator="%NUL;" dfdl:indirectKind="pointer"
dfdl:indirectLength="8" dfdl:indirectUnits="bytes"
nillable="true" />
The following is the definition for an array of three 4 byte addresses
of a complex element defined by ns0:myStruct:
<xs:element name="myArray" type="ns0:myStruct" dfdl:lengthKind="implicit"
dfdl:indirectKind="pointer" dfdl:indirectLength="4"
dfdl:indirectUnits="bytes" minOccurs="3" maxOccurs="3"
dfdl:occursCountKind="fixed" />
The following is the definition for a 4 byte offset to a 100 byte hexBinary
value from the start of the parent element definition:
<xs:element name="myData" type="xs:hexBinary" dfdl:lengthKind="explicit"
dfdl:length="100" dfdl:lengthUnits="bytes" dfdl:indirectKind="offset"
dfdl:indirectLength="4" dfdl:indirectUnits="bytes"
dfdl:offsetBase=".." />
SMH: I don't see how unparsing works. What provides the value?
The proposal would also allow for the following optional item but I don't
currently see a need for this:
dfdl:offsetKind with values "startToStart" or "endToStart"
- indicates if the offset is from the start of the base element or the
end of the base element.
I tried getting this out before my vacation so it might take a little bit
to respond for issues. Thank you for your time.
Regards,
Bradd Kadlecik
z/TPF Development |
|
Phone:
1-845-433-1573
E-mail: braddk@us.ibm.com
|
2455 South Rd
Poughkeepsie, NY 12601-5400
United States |
Steve
Hanson---02/07/2019 12:32:26 PM---Please find minutes from the latest call
at https://redmine.ogf.org/projects/dfdl-wg/newsRegards
Ste
From: Steve Hanson/UK/IBM
To: dfdl-wg@ogf.org
Cc: "Mike Beckerle" <mbeckerle@tresys.com>,
"Michele Zundo" <michele.zundo@esa.int>,
Bradd Kadlecik/Poughkeepsie/IBM@IBMUS
Date: 02/07/2019 12:32 PM
Subject: Latest OGF DFDL WG Call Minutes
Please find minutes from the latest call at
https://redmine.ogf.org/projects/dfdl-wg/news
Regards
Steve Hanson
IBM Hybrid Integration
Architect, IBM DFDL,
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg[attachment
"DFDL_Indirection-mikeb-comments.docx" deleted by Bradd Kadlecik/Poughkeepsie/IBM]
--
dfdl-wg mailing list
dfdl-wg@ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU