
Here's what I've put together regarding the pointer & offset proposal for the next meeting's review. (See attached file: DFDL_Indirection.docx) Regards, Bradd Kadlecik z/TPF Development Phone: 1-845-433-1573 2455 South Rd E-mail: braddk@us.ibm.com Poughkeepsie, NY 12601-5400 United States From: Steve Hanson/UK/IBM To: Bradd Kadlecik/Poughkeepsie/IBM@IBMUS Cc: "Mike Beckerle" <mbeckerle@tresys.com>, "Michele Zundo" <michele.zundo@esa.int> Date: 04/05/2019 12:42 PM Subject: Re: Latest OGF DFDL WG Call Minutes |------+------------------------------------------------------------------| | No |Action | |------+------------------------------------------------------------------| | 309 |Create example scenarios to illustrate offset & pointer | | |requirements (Bradd) | | |5/4/19: Daffodil have a draft proposal for offset support, TPF | | |have experimental implementation for pointer support. Need | | |examples to show the requirement, especially unparsing. | |------+------------------------------------------------------------------| Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Steve Hanson/UK/IBM To: Bradd Kadlecik/Poughkeepsie/IBM@IBMUS Cc: "Mike Beckerle" <mbeckerle@tresys.com>, "Michele Zundo" <michele.zundo@esa.int> Date: 05/04/2019 10:29 Subject: Re: Latest OGF DFDL WG Call Minutes I can see a difference between offsets and pointers. If I follow an offset and parse an element x then I won't automatically jump back to where I was - the next element y I parse will continue from location offset + length x unless I use offset again to jump back to the original location. If I follow a pointer and parse an element x, then the next element y I parse will continue from original location + length(pointer). Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Steve Hanson/UK/IBM To: Bradd Kadlecik/Poughkeepsie/IBM@IBMUS Cc: "Mike Beckerle" <mbeckerle@tresys.com>, "Michele Zundo" <michele.zundo@esa.int> Date: 04/04/2019 09:42 Subject: Re: Latest OGF DFDL WG Call Minutes This is a proposal from the Daffodil team for offset support, needed for formats like TIFF and if we ever want to be able to handle zip files. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74687382 I think this proposal can implement your requirement - the dfdl:offset property can be an expression that refers to another element (your pointer element), which can be hidden so as not to appear in the infoset. I think what you are proposing is a more convenient way of handling offsets that are defined dynamically in the data, as opposed to defined statically with fixed values (though as I said below I need unparsing explained). But I may be mis-understanding your use cases. Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Bradd Kadlecik/Poughkeepsie/IBM To: Steve Hanson/UK/IBM@IBMGB Cc: "Mike Beckerle" <mbeckerle@tresys.com>, "Michele Zundo" <michele.zundo@esa.int> Date: 03/04/2019 16:50 Subject: Re: Latest OGF DFDL WG Call Minutes Ok, I'll work on putting a scenario together for various pointer setups (array, complex element, string) and show how it looks for both the JSON and the binary. I'm currently at a conference this week but will be returning late Thursday so expect to be available for the call Friday. I presented the TPF specific pointer implementation at the conference this week and there are some that will be trying to use it soon. Regards, Bradd Kadlecik z/TPF Development Phone: 1-845-433-1573 2455 South Rd E-mail: braddk@us.ibm.com Poughkeepsie, NY 12601-5400 United States From: Steve Hanson/UK/IBM To: Bradd Kadlecik/Poughkeepsie/IBM@IBMUS Cc: dfdl-wg@ogf.org, "Mike Beckerle" <mbeckerle@tresys.com>, "Michele Zundo" <michele.zundo@esa.int> Date: 04/03/2019 10:29 AM Subject: Re: Latest OGF DFDL WG Call Minutes Hi Bradd I have a few questions please ... inline below. I'm still going to need a real worked example, starting with some actual data and its schema, what it appears like after paring in the infoset, and how unparsing lays it back out again. Are you able to make the rescheduled call this Friday? Regards Steve Hanson IBM Hybrid Integration, Hursley, UK Architect, IBM DFDL Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Note: I work Tuesday to Friday From: Bradd Kadlecik/Poughkeepsie/IBM To: Steve Hanson/UK/IBM@IBMGB Cc: dfdl-wg@ogf.org, "Mike Beckerle" <mbeckerle@tresys.com>, "Michele Zundo" <michele.zundo@esa.int> Date: 27/02/2019 22:22 Subject: Re: Latest OGF DFDL WG Call Minutes Regarding proposal for offsets and pointers: The following are the properties to be defined: SMH: I was expecting to see these properties only on dfdl:element, especially as you say '...the element contents... ? indirectKind Enum Valid values 'pointer', 'offset' (there is also a thought of objectId or refId for handling BSON but not at this time) Specifies the type of indirection used to access the element contents in the data stream. Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence, dfdl:group SMH: I am missing the distinction between offset and pointer. Is one relative to current position and the other relative to start of bitstream? SMH: In earlier DFDL proposals for offset support, we had used the term to refer to a property to be used to establish position of the current element instead of assuming the current element followed straight after the previous one. It would allow sparse modelling of fixed structures. The offset could be relative to start of bitstream or some other point. I don't think that's what you mean when you say 'offset' so I will refer to your new concept as 'pointer'. SMH: Assuming that indirectKind is a normal DFDL property, it can be in scope. It would therefore need to have an enum 'None' which would be the default used in most schemas. indirectLength Non-negative Integer or DFDL expression Specifies the length of the indirection in units according to the indirectUnits property. Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence, dfdl:group indirectUnits Enum Valid values 'bytes','bits' Specifies the units to be used for reading or writing the indirection according to indirectLength. The default value is 'bytes'. Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence, dfdl:group SMH: I think a better approach is to provide a property dfdl:indirectType, instead of indirectLength/indirectUnits, which refers to a simple type (not element) that carries its own lengthKind, length & lengthUnits properties. Similar idea to dfdl:prefixLengthType. That allows a lot of flexibility on how the pointer can appear. offsetBase non-empty string containing an absolute or relative XPath expression for the base element. Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence, dfdl:group The proposal would be to have the contents of the indirection be after the LeadingAlignment and before the TrailingAlignment. This would mean the aligment and skip factors apply to the indirection values in the data stream instead of the the contents of the indirection. SMH: Agree. This also then means in an array element, each element has its own indirection value (pointer or offset) and the alignment,skip factors then apply to each of these indirection values. SMH: Do you mean '...each occurrence...' ? It would be thought that the indirection values apply only to the data stream and not the infoset. During parse when the infoset is populated from the data stream, the indirection values are replaced by the contents. During unparse, the indirection values don't exist in the infoset and are created during the writing to/creation of the data stream. SMH: I agree that the indirection should be a purely physical thing, but I am not clear how the value is filled in when unparsing. Where does the value come from? outputValueCalc? Or maybe it's not needed when unparsing, and the data is always contiguous? For pointers, a null pointer creates the scenario of either nil representation or empty representation depending on whether or not nillable is defined as true. Unless default values (or 0 occurrence) are defined for all underlying content, then this is a processing error. During unparse, the only scenario in which a null pointer would be created is for a nil representation. SMH: This needs more thought. The nil & default properties apply to the contents of the indirection, not to the pointer. If you want to give a nil semantic to the pointer value itself, then that would require a new enum for dfdl:nilKind. I don't see why a pointer value 0 can't be treated like any other indirection value. A missing pointer is an error - it must be present - there is no way to control optionality because minOccurs/maxOccurs apply to the contents. (Alternatively, if you want the concepts of nil, default, occurs to apply to the indirect value, then dfdl:indirectType could point at an element instead of a simple type - but that seems way too over engineered). Examples: The following is the definition for the address of a null-terminated string in which the string address may be NULL as indicated by a nillable value of true: <xs:element name="myString" type="xs:string" dfdl:lengthKind="delimited" dfdl:encoding="UTF-8" dfdl:terminator="%NUL;" dfdl:indirectKind="pointer" dfdl:indirectLength="8" dfdl:indirectUnits="bytes" nillable="true" /> The following is the definition for an array of three 4 byte addresses of a complex element defined by ns0:myStruct: <xs:element name="myArray" type="ns0:myStruct" dfdl:lengthKind="implicit" dfdl:indirectKind="pointer" dfdl:indirectLength="4" dfdl:indirectUnits="bytes" minOccurs="3" maxOccurs="3" dfdl:occursCountKind="fixed" /> The following is the definition for a 4 byte offset to a 100 byte hexBinary value from the start of the parent element definition: <xs:element name="myData" type="xs:hexBinary" dfdl:lengthKind="explicit" dfdl:length="100" dfdl:lengthUnits="bytes" dfdl:indirectKind="offset" dfdl:indirectLength="4" dfdl:indirectUnits="bytes" dfdl:offsetBase=".." /> SMH: I don't see how unparsing works. What provides the value? The proposal would also allow for the following optional item but I don't currently see a need for this: dfdl:offsetKind with values "startToStart" or "endToStart" - indicates if the offset is from the start of the base element or the end of the base element. I tried getting this out before my vacation so it might take a little bit to respond for issues. Thank you for your time. Regards, Bradd Kadlecik z/TPF Development Phone: 1-845-433-1573 2455 South Rd E-mail: braddk@us.ibm.com Poughkeepsie, NY 12601-5400 United States From: Steve Hanson/UK/IBM To: dfdl-wg@ogf.org Cc: "Mike Beckerle" <mbeckerle@tresys.com>, "Michele Zundo" <michele.zundo@esa.int>, Bradd Kadlecik/Poughkeepsie/IBM@IBMUS Date: 02/07/2019 12:32 PM Subject: Latest OGF DFDL WG Call Minutes Please find minutes from the latest call at https://redmine.ogf.org/projects/dfdl-wg/news Regards Steve Hanson IBM Hybrid Integration Architect, IBM DFDL, Co-Chair, OGF DFDL Working Group smh@uk.ibm.com tel:+44-1962-815848 mob:+44-7717-378890 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU