Hi Bradd
I have a few questions please ... inline
below.
I'm still going to need a real worked
example, starting with some actual data and its schema, what it appears
like after paring in the infoset, and how unparsing lays it back out again.
Are you able to make the rescheduled
call this Friday?
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM
DFDL
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From:
Bradd Kadlecik/Poughkeepsie/IBM
To:
Steve Hanson/UK/IBM@IBMGB
Cc:
dfdl-wg@ogf.org, "Mike
Beckerle" <mbeckerle@tresys.com>, "Michele Zundo"
<michele.zundo@esa.int>
Date:
27/02/2019 22:22
Subject:
Re: Latest OGF
DFDL WG Call Minutes
Regarding proposal for offsets and pointers:
The following are the properties to
be defined:
SMH: I was expecting
to see these properties only on dfdl:element, especially as you say '...the
element contents... ?
indirectKind Enum
Valid values 'pointer', 'offset' (there
is also a thought of objectId or refId for handling BSON but not at this
time)
Specifies the type of indirection used to
access the element contents in the data stream.
Annotation: dfdl:element, dfdl:simpleType,
dfdl:choice, dfdl:sequence, dfdl:group
SMH: I am missing the distinction
between offset and pointer. Is one relative to current position and the
other relative to start of bitstream?
SMH: In earlier DFDL proposals
for offset support, we had used the term to refer to a property to be used
to establish position of the current element instead of assuming
the current element followed straight after the previous one. It
would allow sparse modelling of fixed structures. The offset could be relative
to start of bitstream or some other point. I don't think that's what
you mean when you say 'offset' so I will refer to your new concept as 'pointer'.
SMH: Assuming that indirectKind
is a normal DFDL property, it can be in scope. It would therefore need
to have an enum 'None' which would be the default used in most schemas.
indirectLength
Non-negative Integer or DFDL expression
Specifies the length of the indirection in
units according to the indirectUnits property.
Annotation: dfdl:element, dfdl:simpleType,
dfdl:choice, dfdl:sequence, dfdl:group
indirectUnits Enum
Valid values 'bytes','bits'
Specifies the units to be used for reading
or writing the indirection according to indirectLength.
The default value is 'bytes'.
Annotation: dfdl:element, dfdl:simpleType,
dfdl:choice, dfdl:sequence, dfdl:group
SMH: I think a better approach
is to provide a property dfdl:indirectType, instead of indirectLength/indirectUnits,
which refers to a simple type (not element) that carries its own lengthKind,
length & lengthUnits properties. Similar idea to dfdl:prefixLengthType.
That allows a lot of flexibility on how the pointer can appear.
offsetBase non-empty
string containing an absolute or relative XPath expression for the base
element.
Annotation:
dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence, dfdl:group
The proposal would be to have the contents
of the indirection be after the LeadingAlignment and before the TrailingAlignment.
This would mean the aligment and skip factors apply to the indirection
values in the data stream instead of the the contents of the indirection.
SMH: Agree.
This also then means in an array element,
each element has its own indirection value (pointer or offset) and the
alignment,skip factors then apply to each of these indirection values.
SMH: Do you mean '...each
occurrence...' ?
It would be thought that the indirection
values apply only to the data stream and not the infoset. During
parse when the infoset is populated from the data stream, the indirection
values are replaced by the contents. During unparse, the indirection
values don't exist in the infoset and are created during the writing to/creation
of the data stream.
SMH: I agree that the indirection
should be a purely physical thing, but I am not clear how the value is
filled in when unparsing. Where does the value come from? outputValueCalc?
Or maybe it's not needed when unparsing, and the data is always contiguous?
For pointers, a null pointer creates
the scenario of either nil representation or empty representation depending
on whether or not nillable is defined as true. Unless default values
(or 0 occurrence) are defined for all underlying content, then this is
a processing error. During unparse, the only scenario in which a
null pointer would be created is for a nil representation.
SMH: This needs more thought.
The nil & default properties apply to the contents of the indirection,
not to the pointer. If you want to give a nil semantic to the pointer value
itself, then that would require a new enum for dfdl:nilKind. I don't see
why a pointer value 0 can't be treated like any other indirection value.
A missing pointer is an error - it must be present - there is no way to
control optionality because minOccurs/maxOccurs apply to the contents.
(Alternatively, if you want the concepts of nil, default, occurs to apply
to the indirect value, then dfdl:indirectType could point at an element
instead of a simple type - but that seems way too over engineered).
Examples:
The following is the definition for the address
of a null-terminated string in which the string address may be NULL as
indicated by a nillable value of true:
<xs:element name="myString"
type="xs:string" dfdl:lengthKind="delimited" dfdl:encoding="UTF-8"
dfdl:terminator="%NUL;" dfdl:indirectKind="pointer"
dfdl:indirectLength="8" dfdl:indirectUnits="bytes"
nillable="true" />
The following is the definition for an array
of three 4 byte addresses of a complex element defined by ns0:myStruct:
<xs:element name="myArray" type="ns0:myStruct"
dfdl:lengthKind="implicit" dfdl:indirectKind="pointer"
dfdl:indirectLength="4" dfdl:indirectUnits="bytes"
minOccurs="3" maxOccurs="3" dfdl:occursCountKind="fixed"
/>
The following is the definition for a 4 byte
offset to a 100 byte hexBinary value from the start of the parent element
definition:
<xs:element name="myData" type="xs:hexBinary"
dfdl:lengthKind="explicit" dfdl:length="100" dfdl:lengthUnits="bytes"
dfdl:indirectKind="offset" dfdl:indirectLength="4"
dfdl:indirectUnits="bytes" dfdl:offsetBase=".." />
SMH: I don't see how unparsing
works. What provides the value?
The proposal would also allow for the following
optional item but I don't currently see a need for this:
dfdl:offsetKind with values
"startToStart" or "endToStart" - indicates if the offset
is from the start of the base element or the end of the base element.
I tried getting this out before my vacation
so it might take a little bit to respond for issues. Thank you for
your time.
Regards,
Bradd Kadlecik
z/TPF Development |
|
Phone:
1-845-433-1573
E-mail: braddk@us.ibm.com
|
2455 South Rd
Poughkeepsie, NY 12601-5400
United States |
From:
Steve Hanson/UK/IBM
To:
dfdl-wg@ogf.org
Cc:
"Mike Beckerle"
<mbeckerle@tresys.com>, "Michele Zundo" <michele.zundo@esa.int>,
Bradd Kadlecik/Poughkeepsie/IBM@IBMUS
Date:
02/07/2019 12:32 PM
Subject:
Latest OGF DFDL
WG Call Minutes
Please find minutes from the latest
call at https://redmine.ogf.org/projects/dfdl-wg/news
Regards
Steve Hanson
IBM Hybrid Integration
Architect, IBM DFDL,
Co-Chair, OGF
DFDL Working Group
smh@uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU