Variable occurrence of variable/fields
Dear all, my 2 cents on variable occurrence fields. We have been using XML syntax (in now defunct BinX format) to describe the structure of ESA Earth Observation SMOS Mission Science binary data Product. (see http://www.smos.com.pt/) We also developed some custom C/C++ library to allow Read/Write access to these binary data as we felt this would allow to easily modify the binary structure of the data product without requiring application code changes but only XML "schema". One of the major limitation we have found in BinX (which I'm not sure has been solved in DFDL) is indeed related to describing a binary structure with variable number of records where the number is described (dynamically) within the binary itself. We had to define a pre-processor of the binary data which would read each instance of the file find out how many of the records we had and create at run-time a XML schema with the correct number of replication of the XML code describing each of the records. This is very inconvenient and un-elegant. For example a binary data Product with a variable number of scientific observation would be (symbolically) structured as: Number_of_Records Data_Record_1 (blah, blah, etc etc ) Data_Record_2 ((blah, blah, etc etc) ..... Record_N (with N= Number_of_Records ) It would be desirable to be able to specify such a data product like: Number_of_Records List (count=Number_of_Records) of Data_Record The other feature which we greatly missed is bit level operation since many of our Space data structure are formatted at bit level e.g. 3 bits field + 2 bit field + 11 bit field etc. Regards
From: Ryan Farrell
Date: July 19, 2011 20:12:05 GMT+02:00 To: dfdl-wg@ogf.org Subject: [DFDL-WG] [wg-all] Repeating groups in DFDL For anyone that was at the 19/07 phone meeting, this is what Adam Fox and I were trying to find a solution to. If anyone knows of a solution, please contact me or adam.fox@nrl.navy.mil.
----------------------------------------------------------------------------------------------------
Please see the attached xml schema while reading this message.
The problem we want to solve is how to represent a structure that can have a variable amount of occurences (even zero), and the amount of repeats is controlled by a one bit field, which we will call"repeatBit". The repeating structure in this case is "A1". The element before it, "presentBit", determines if there is at least one occurence of A1. If presentBit is zero, then A1 has zero occurences. If presentBit is one, then A1 has AT LEAST one occurence.
The first element in A1 is "repeatBit". What repeatBit does is tells us if there will be another occurence of A1 after the current occurence. When repeatBit is zero, then that means we will read through the rest of A1, then that is the last occurence of A1. As long as repeatBit continues to be one however, we will read through the rest of A1, then start a new occurence.
I have included all the known required DFDL notation. Please let me know what is missing.
EXAMPLE INPUT (results are in base 10): NOTE: Not all bits are used in these examples. Unused bits should be ignored for the purpose of the examples.
1) 0111 0101 presentBit = 0 aFieldAfterA1 = 3
2) 1011 1010 presentBit = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 2
3) 1101 0110 0000 0000 presentBit = 1 A1 -repeatBit = 1 -someField = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 0
From: Steve Hanson
Date: July 20, 2011 10:54:27 GMT+02:00 To: Ryan Farrell Cc: dfdl-wg@ogf.org Subject: Re: [DFDL-WG] [wg-all] Repeating groups in DFDL A couple of clarifications if I may....
- In your example, you have someField as being of simple type, can it be complex type in your real format?
- You said "If presentBit is zero, then A1 has zero occurences". But in your xsd you have A1 as (implied) minOccurs="1". I think you meant minOccurs="0"?
Regards
Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848
From: Ryan Farrell
To: dfdl-wg@ogf.org Date: 19/07/2011 19:12 Subject: [DFDL-WG] [wg-all] Repeating groups in DFDL Sent by: dfdl-wg-bounces@ogf.org For anyone that was at the 19/07 phone meeting, this is what Adam Fox and I were trying to find a solution to. If anyone knows of a solution, please contact me or adam.fox@nrl.navy.mil.
----------------------------------------------------------------------------------------------------
Please see the attached xml schema while reading this message.
The problem we want to solve is how to represent a structure that can have a variable amount of occurences (even zero), and the amount of repeats is controlled by a one bit field, which we will call"repeatBit". The repeating structure in this case is "A1". The element before it, "presentBit", determines if there is at least one occurence of A1. If presentBit is zero, then A1 has zero occurences. If presentBit is one, then A1 has AT LEAST one occurence.
The first element in A1 is "repeatBit". What repeatBit does is tells us if there will be another occurence of A1 after the current occurence. When repeatBit is zero, then that means we will read through the rest of A1, then that is the last occurence of A1. As long as repeatBit continues to be one however, we will read through the rest of A1, then start a new occurence.
I have included all the known required DFDL notation. Please let me know what is missing.
EXAMPLE INPUT (results are in base 10): NOTE: Not all bits are used in these examples. Unused bits should be ignored for the purpose of the examples.
1) 0111 0101 presentBit = 0 aFieldAfterA1 = 3
2) 1011 1010 presentBit = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 2
3) 1101 0110 0000 0000 presentBit = 1 A1 -repeatBit = 1 -someField = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 0 [attachment "repeat_example.xsd" deleted by Steve Hanson/UK/IBM] -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
From: Steve Hanson
Date: July 20, 2011 13:28:47 GMT+02:00 To: dfdl-wg@ogf.org Cc: DFDL-Technical-Core%IBMGB@uk.ibm.com Subject: [DFDL-WG] OGF DFDL WG Call Minutes 2011-07-19 Please find minutes of the above meeting on GridForge at:
http://forge.gridforum.org/sf/docman/do/downloadDocument/projects.dfdl-wg/do...
Regards
Steve Hanson Architect, DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848
From: "Mike Beckerle"
Date: July 23, 2011 00:43:49 GMT+02:00 To: "'Ryan Farrell'" , Subject: Re: [DFDL-WG] [wg-all] Repeating groups in DFDL Great example Ryan,
We've spent quite a bit of time over the last several months on textual formats with these kinds of characteristics, which I would describe as "occurrances determined by parsing" - that is, you look at the data to determine "is there another element or not" one by one as they are parsed.
We need to make sure the properties are there to make this work for the binary case as well.
I actually believe we cannot handle this format right now. We don't have a property which says to look at the current element to determine whether to expect a subsequent element, as a way of determining occurrence counts.
Next week we'll add resolving this example to the workgroup actions list.
...mikeb
Mike Beckerle Senior Technology Leader/Manager Deloitte Managed Analytics 100 Fifth Avenue, Waltham, MA 02451 Tel/Direct: +1 781 330 0412 | Fax: +1 866 253 3006 www.deloitte.com
-----Original Message----- From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On Behalf Of Ryan Farrell Sent: Tuesday, July 19, 2011 2:12 PM To: dfdl-wg@ogf.org Subject: [DFDL-WG] [wg-all] Repeating groups in DFDL
For anyone that was at the 19/07 phone meeting, this is what Adam Fox and I were trying to find a solution to. If anyone knows of a solution, please contact me or adam.fox@nrl.navy.mil.
---------------------------------------------------------------------------- ------------------------
Please see the attached xml schema while reading this message.
The problem we want to solve is how to represent a structure that can have a variable amount of occurences (even zero), and the amount of repeats is controlled by a one bit field, which we will call"repeatBit". The repeating structure in this case is "A1". The element before it, "presentBit", determines if there is at least one occurence of A1. If presentBit is zero, then A1 has zero occurences. If presentBit is one, then A1 has AT LEAST one occurence.
The first element in A1 is "repeatBit". What repeatBit does is tells us if there will be another occurence of A1 after the current occurence. When repeatBit is zero, then that means we will read through the rest of A1, then that is the last occurence of A1. As long as repeatBit continues to be one however, we will read through the rest of A1, then start a new occurence.
I have included all the known required DFDL notation. Please let me know what is missing.
EXAMPLE INPUT (results are in base 10): NOTE: Not all bits are used in these examples. Unused bits should be ignored for the purpose of the examples.
1) 0111 0101 presentBit = 0 aFieldAfterA1 = 3
2) 1011 1010 presentBit = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 2
3) 1101 0110 0000 0000 presentBit = 1 A1 -repeatBit = 1 -someField = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 0
From: Steve Hanson
Date: July 26, 2011 11:58:59 GMT+02:00 To: dfdl-wg@ogf.org Subject: [DFDL-WG] OGF DFDL WG Call Agenda 2011-07-26 Please find agenda for the above call on GridForge at:
http://forge.gridforum.org/sf/docman/do/downloadDocument/projects.dfdl-wg/do...
As per action 144 an errata to the spec has been created here: http://forge.gridforum.org/sf/go/doc16280?nav=1
Regards
Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
----------------------------------------- Michele Zundo EOP-PEP Ground System Definition and Verification Office European Space Agency, ESTEC e-mail: michele.zundo@esa.int
Hello Michele
Thank you for your interest in DFDL.
DFDL is able to describe a binary format with variable numbers of elements
where the count is provided earlier in the data, like you describe below.
Use dfdl:occursCountKind="expression" and set dfdl:occursCount to an XPath
expression that refers to the count field. (Ryan's format does not contain
a count, rather it contains a 'last item' indicator in the repeating data
itself, which is more tricky to model).
DFDL is able to model bit oriented data. Use dfdl:lengthUnits= "bits" and
dfdl:alignmentUnits="bits".
Are you a colleague of Dario Romano? We were contacted last year by Dario
who provided us with some sample satellite bit data that he wanted DFDL to
model. We used it as a proof point for the bit support in DFDL, and as a
result we changed the behaviour of dfdl:leadingSkip and dfdl:trailingSkip
to obey dfdl:alignmentUnits="bits".
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh@uk.ibm.com
tel:+44-1962-815848
From:
Michele Zundo
Dear Steve, thanks for the reply. We are currently not working anymore with BinX for the mission I mention below but started developing some application for visualizing CCSDS satellite data and having at least a syntax able to model bit and variable number is a good thing. The feedback I had from the developers on this was that BinX needed to be extended as I mention in my previous e-mail. I will point them now to DFDL since it address these 2 shortcoming and so that at least at syntax level we have something "standard" (DFDL based) and they can start coding the library for accessing the data. Michele PS I just found out that Dario works indeed at ESA but I never spoke with him (it is a big organisation...) On Jul 26, 2011, at 14:52 , Steve Hanson wrote:
Hello Michele
Thank you for your interest in DFDL.
DFDL is able to describe a binary format with variable numbers of elements where the count is provided earlier in the data, like you describe below. Use dfdl:occursCountKind="expression" and set dfdl:occursCount to an XPath expression that refers to the count field. (Ryan's format does not contain a count, rather it contains a 'last item' indicator in the repeating data itself, which is more tricky to model).
DFDL is able to model bit oriented data. Use dfdl:lengthUnits= "bits" and dfdl:alignmentUnits="bits".
Are you a colleague of Dario Romano? We were contacted last year by Dario who provided us with some sample satellite bit data that he wanted DFDL to model. We used it as a proof point for the bit support in DFDL, and as a result we changed the behaviour of dfdl:leadingSkip and dfdl:trailingSkip to obey dfdl:alignmentUnits="bits".
Regards
Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848
From: Michele Zundo
To: dfdl-wg@ogf.org Date: 26/07/2011 12:45 Subject: [DFDL-WG] Variable occurrence of variable/fields Sent by: dfdl-wg-bounces@ogf.org Dear all,
my 2 cents on variable occurrence fields.
We have been using XML syntax (in now defunct BinX format) to describe the structure of ESA Earth Observation SMOS Mission Science binary data Product. (see http://www.smos.com.pt/)
We also developed some custom C/C++ library to allow Read/Write access to these binary data as we felt this would allow to easily modify the binary structure of the data product without requiring application code changes but only XML "schema".
One of the major limitation we have found in BinX (which I'm not sure has been solved in DFDL) is indeed related to describing a binary structure with variable number of records where the number is described (dynamically) within the binary itself. We had to define a pre-processor of the binary data which would read each instance of the file find out how many of the records we had and create at run-time a XML schema with the correct number of replication of the XML code describing each of the records. This is very inconvenient and un-elegant.
For example a binary data Product with a variable number of scientific observation would be (symbolically) structured as:
Number_of_Records Data_Record_1 (blah, blah, etc etc ) Data_Record_2 ((blah, blah, etc etc) ..... Record_N (with N= Number_of_Records )
It would be desirable to be able to specify such a data product like:
Number_of_Records List (count=Number_of_Records) of Data_Record
The other feature which we greatly missed is bit level operation since many of our Space data structure are formatted at bit level e.g. 3 bits field + 2 bit field + 11 bit field etc.
Regards
From: Ryan Farrell
Date: July 19, 2011 20:12:05 GMT+02:00 To: dfdl-wg@ogf.org Subject: [DFDL-WG] [wg-all] Repeating groups in DFDL For anyone that was at the 19/07 phone meeting, this is what Adam Fox and I were trying to find a solution to. If anyone knows of a solution, please contact me or adam.fox@nrl.navy.mil.
----------------------------------------------------------------------------------------------------
Please see the attached xml schema while reading this message.
The problem we want to solve is how to represent a structure that can have a variable amount of occurences (even zero), and the amount of repeats is controlled by a one bit field, which we will call"repeatBit". The repeating structure in this case is "A1". The element before it, "presentBit", determines if there is at least one occurence of A1. If presentBit is zero, then A1 has zero occurences. If presentBit is one, then A1 has AT LEAST one occurence.
The first element in A1 is "repeatBit". What repeatBit does is tells us if there will be another occurence of A1 after the current occurence. When repeatBit is zero, then that means we will read through the rest of A1, then that is the last occurence of A1. As long as repeatBit continues to be one however, we will read through the rest of A1, then start a new occurence.
I have included all the known required DFDL notation. Please let me know what is missing.
EXAMPLE INPUT (results are in base 10): NOTE: Not all bits are used in these examples. Unused bits should be ignored for the purpose of the examples.
1) 0111 0101 presentBit = 0 aFieldAfterA1 = 3
2) 1011 1010 presentBit = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 2
3) 1101 0110 0000 0000 presentBit = 1 A1 -repeatBit = 1 -someField = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 0
From: Steve Hanson
Date: July 20, 2011 10:54:27 GMT+02:00 To: Ryan Farrell Cc: dfdl-wg@ogf.org Subject: Re: [DFDL-WG] [wg-all] Repeating groups in DFDL A couple of clarifications if I may....
- In your example, you have someField as being of simple type, can it be complex type in your real format?
- You said "If presentBit is zero, then A1 has zero occurences". But in your xsd you have A1 as (implied) minOccurs="1". I think you meant minOccurs="0"?
Regards
Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848
From: Ryan Farrell
To: dfdl-wg@ogf.org Date: 19/07/2011 19:12 Subject: [DFDL-WG] [wg-all] Repeating groups in DFDL Sent by: dfdl-wg-bounces@ogf.org For anyone that was at the 19/07 phone meeting, this is what Adam Fox and I were trying to find a solution to. If anyone knows of a solution, please contact me or adam.fox@nrl.navy.mil.
----------------------------------------------------------------------------------------------------
Please see the attached xml schema while reading this message.
The problem we want to solve is how to represent a structure that can have a variable amount of occurences (even zero), and the amount of repeats is controlled by a one bit field, which we will call"repeatBit". The repeating structure in this case is "A1". The element before it, "presentBit", determines if there is at least one occurence of A1. If presentBit is zero, then A1 has zero occurences. If presentBit is one, then A1 has AT LEAST one occurence.
The first element in A1 is "repeatBit". What repeatBit does is tells us if there will be another occurence of A1 after the current occurence. When repeatBit is zero, then that means we will read through the rest of A1, then that is the last occurence of A1. As long as repeatBit continues to be one however, we will read through the rest of A1, then start a new occurence.
I have included all the known required DFDL notation. Please let me know what is missing.
EXAMPLE INPUT (results are in base 10): NOTE: Not all bits are used in these examples. Unused bits should be ignored for the purpose of the examples.
1) 0111 0101 presentBit = 0 aFieldAfterA1 = 3
2) 1011 1010 presentBit = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 2
3) 1101 0110 0000 0000 presentBit = 1 A1 -repeatBit = 1 -someField = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 0 [attachment "repeat_example.xsd" deleted by Steve Hanson/UK/IBM] -- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
From: Steve Hanson
Date: July 20, 2011 13:28:47 GMT+02:00 To: dfdl-wg@ogf.org Cc: DFDL-Technical-Core%IBMGB@uk.ibm.com Subject: [DFDL-WG] OGF DFDL WG Call Minutes 2011-07-19 Please find minutes of the above meeting on GridForge at:
http://forge.gridforum.org/sf/docman/do/downloadDocument/projects.dfdl-wg/do...
Regards
Steve Hanson Architect, DFDL Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848
From: "Mike Beckerle"
Date: July 23, 2011 00:43:49 GMT+02:00 To: "'Ryan Farrell'" , Subject: Re: [DFDL-WG] [wg-all] Repeating groups in DFDL Great example Ryan,
We've spent quite a bit of time over the last several months on textual formats with these kinds of characteristics, which I would describe as "occurrances determined by parsing" - that is, you look at the data to determine "is there another element or not" one by one as they are parsed.
We need to make sure the properties are there to make this work for the binary case as well.
I actually believe we cannot handle this format right now. We don't have a property which says to look at the current element to determine whether to expect a subsequent element, as a way of determining occurrence counts.
Next week we'll add resolving this example to the workgroup actions list.
...mikeb
Mike Beckerle Senior Technology Leader/Manager Deloitte Managed Analytics 100 Fifth Avenue, Waltham, MA 02451 Tel/Direct: +1 781 330 0412 | Fax: +1 866 253 3006 www.deloitte.com
-----Original Message----- From: dfdl-wg-bounces@ogf.org [mailto:dfdl-wg-bounces@ogf.org] On Behalf Of Ryan Farrell Sent: Tuesday, July 19, 2011 2:12 PM To: dfdl-wg@ogf.org Subject: [DFDL-WG] [wg-all] Repeating groups in DFDL
For anyone that was at the 19/07 phone meeting, this is what Adam Fox and I were trying to find a solution to. If anyone knows of a solution, please contact me or adam.fox@nrl.navy.mil.
---------------------------------------------------------------------------- ------------------------
Please see the attached xml schema while reading this message.
The problem we want to solve is how to represent a structure that can have a variable amount of occurences (even zero), and the amount of repeats is controlled by a one bit field, which we will call"repeatBit". The repeating structure in this case is "A1". The element before it, "presentBit", determines if there is at least one occurence of A1. If presentBit is zero, then A1 has zero occurences. If presentBit is one, then A1 has AT LEAST one occurence.
The first element in A1 is "repeatBit". What repeatBit does is tells us if there will be another occurence of A1 after the current occurence. When repeatBit is zero, then that means we will read through the rest of A1, then that is the last occurence of A1. As long as repeatBit continues to be one however, we will read through the rest of A1, then start a new occurence.
I have included all the known required DFDL notation. Please let me know what is missing.
EXAMPLE INPUT (results are in base 10): NOTE: Not all bits are used in these examples. Unused bits should be ignored for the purpose of the examples.
1) 0111 0101 presentBit = 0 aFieldAfterA1 = 3
2) 1011 1010 presentBit = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 2
3) 1101 0110 0000 0000 presentBit = 1 A1 -repeatBit = 1 -someField = 1 A1 -repeatBit = 0 -someField = 3 aFieldAfterA1 = 0
From: Steve Hanson
Date: July 26, 2011 11:58:59 GMT+02:00 To: dfdl-wg@ogf.org Subject: [DFDL-WG] OGF DFDL WG Call Agenda 2011-07-26 Please find agenda for the above call on GridForge at:
http://forge.gridforum.org/sf/docman/do/downloadDocument/projects.dfdl-wg/do...
As per action 144 an errata to the spec has been created here: http://forge.gridforum.org/sf/go/doc16280?nav=1
Regards
Steve Hanson Architect, Data Format Description Language (DFDL) Co-Chair, OGF DFDL Working Group IBM SWG, Hursley, UK smh@uk.ibm.com tel:+44-1962-815848
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
----------------------------------------- Michele Zundo EOP-PEP Ground System Definition and Verification Office European Space Agency, ESTEC e-mail: michele.zundo@esa.int
-- dfdl-wg mailing list dfdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
----------------------------------------- Michele Zundo EOP-PEP Ground System Definition and Verification Office European Space Agency, ESTEC e-mail: michele.zundo@esa.int
participants (2)
-
Michele Zundo
-
Steve Hanson