RE: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Dave this is a good way of posing the question. One can also relate it to the proposal a couple of years ago by Peter Kunszt for grid data handles as a generic concept for naming data. I think the problem arises when the selected data is an arbitrary subset of the stored data or a derivative of the stored data derived via any of a wide range of languages: Xquery, Xpath, SQL, LDAP, semi-structured QLs, statistical languages, FFT, datacutter, .... These all make sense. It is just hard to understand how to compose them. It is hard to make general rules. The derived data has to be evaluated to some point where it can be moved - a RAM buffer - can the movement be part of the standard and the derivation processes be strictly corraled in some other specs? Malcolm
-----Original Message----- From: Dave Berry Sent: 14 September 2005 21:00 To: Malcolm Atkinson; William E. Allcock; ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James Casey; Ravi Madduri Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Hi Malcolm,
I'd rather ask, what are the characteristics of a file that makes these file transfer mechanisms tractable? Then we can ask, to what extent can we generalise the mechanism?
For example, if the key characteristics are that a file can be named and supports random access, then we might generalise the mechanism to include data in RAM (which would avoid unnecessary copying to disk). This case would be analogous to some operating systems which allow entities in RAM to be addressed as part of the file system.
Conversely, if the mechanism can handle any named sequence of bytes, then it could presumably handle streaming data as well. Or if it requires other operations that are specific to the location of bytes on a disk (or tape), then the WG will restrict its attention to those cases.
I would expect this group to place a strong requirement on the OGSA WG to provide a naming system that can specify whatever data sets this WG wants to move.
Dave.
-----Original Message----- From: owner-ogsa-d-wg@ggf.org [mailto:owner-ogsa-d-wg@ggf.org] On Behalf Of Malcolm Atkinson Sent: 14 September 2005 17:20 To: William E. Allcock; ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James Casey; Ravi Madduri Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Hi Bill
I agree that such a standard interface is needed. When you look at files I presume you consider files where ever they are, secondary or tertiary storage at least.
When you say any dataa, then there is the possibility of trivial or large amounts of data between RAM, as well as data from files and databases with an enormous set of possibilities of the way it may be selected and identified. Eventually it gets close to Byte-IO, Streams (BoF) and InfoD etc.
So I'm agreeing with you that if yu go beyond files then scope control is difficult.
Would it be better to do the standardisation of file movement first and look at other forms of adta movement later?
Malcolm
-----Original Message----- From: owner-ogsa-d-wg@ggf.org [mailto:owner-ogsa-d-wg@ggf.org] On Behalf Of William E. Allcock Sent: 14 September 2005 17:04 To: ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James Casey; Ravi Madduri Subject: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Sorry for the re-send, I typoed the byte-io mail list.
All,
Sorry for the SPAM, but I sent this to the "likely suspects" who might >be interested. I have a proposed BOF (waiting for AD approval) to >discuss standardizing an interface for invoking data movement. There >are several of them out there already. CERN has the File Transfer >System (FTS), the gsm-wg has SRM copy, Globus has the Reliable File Transfer (RFT) service, etc.. I don't think there will be any argument that there is a need for such standardization, the hard part will be >scoping the extent of what we will work on. For instance, all the >examples above are file based, but ideally, this interface would work >for any data that can be addressed. > >I expect that that the BOF will be centered around scoping the working >group, but I think we should (and approval of the BOF depends on) >getting some initial discussion around the scope. So... here it goes: > >I think the obvious thing is that it needs to be able to have the basic functionality presented by FTS, RFT, and SRM-copy, however the devil is in the details, so I will break this up into "blocks of functionality":
Lets start with naming. What will this service accept as valid names >for entities that it will move? URLs? EPRs? Will logical file names be accepted or should they be translated outside this service?
Related to the naming is what type of data will this service move? >Files? video streams? the output of simulations? the output of database queries? Can we make this a service that any service that wants to move data can simply invoke it? Note that I am differentiating data from >messages. You would not use this to send the result from a service that summed a bunch of numbers, that would simply be a SOAP response... IMHO :-).
Can we make a generic module that would allow this functionality to be >applied to any service that exposes the byte-io interface? Does that >affect the interface or is it just an implementation issue? > >Can we make this service transport mechanism agnostic? both application transport (GridFTP vs HTTP vs ...) as well as network transport (TCP vs UDP vs UDT vs ...). My concern here is that I am not sure SOAP has the functionality we need. To do this, I wonder if we need the equivalent >of a union in C, so that the parameters specified are based on the transport(s) chosen. For instance, if you use TCP you need to specify a buffer size, but not for UDP. GridFTP specifies streams and data >channel authentication, but HTTP does not. > What about security / authorization. This is a broad category and we >should push as much as possible outside of scope via callouts and Policy Enforcement Points (PEPs), but what about delivery guarantees such as AT MOST ONCE, AT LEAST ONCE, EXACTLY ONCE, non-repudiation, etc.? I know >Dieter has a set of use cases that require some of this type delivery >guarantee functionality. > >A potentially contentious issue is whether or not these services will >use WSRF and notifications to expose (push from the service) or methods to query the state (pull from the service). Hopefully, we can find a >way to make each optional. > >If we start making many optional parts to the interface, it will make what is exposed as service metadata for brokering will become more >important. I would propose that we should make at a minimum a >recommendation for what facts about the service should be exposed. > >All of the existing services accept "bulk" inputs, i.e., move these 100 files. This can be a problem when the requests become very large due to de-serialization. Should we provide a "chunking" interface so that >requests can be of unlimited size? > >Please feel free to make comments on the above and more importantly suggest other important issues we need to address. > >btw, once we have a mail list of our own we will quit spamming the other lists :-).
Bill -- William E. Allcock Argonne National Laboratory Bldg 221, Office C-115A 9700 South Cass Ave Argonne, IL 60439-4844 Office Phone: +1-630-252-7573 Office Fax: +1-630-252-1997 Cell Phone: +1-630-854-2842
The BOF has been approved: 2:00 pm - 3:30 pm Data Movement Interface Standardization (Data Movement Interface Standardization) Charter-Discussion BOF Many services have the need to move data (as opposed to messages invoking the service). The characteristics / semantics required can vary greatly. There are several existing interfaces (RFT, FTS, , all file based, that are similar, but incompatible. A working group spawned by this BOF would work towards a standardized set of WSDL that could invoke a service that met the requirements of existing services as well as non-File based sources. Location: Imperial Ballroom See you there! Bill Malcolm Atkinson wrote:
Dave this is a good way of posing the question. One can also relate it to the proposal a couple of years ago by Peter Kunszt for grid data handles as a generic concept for naming data.
I think the problem arises when the selected data is an arbitrary subset of the stored data or a derivative of the stored data derived via any of a wide range of languages: Xquery, Xpath, SQL, LDAP, semi-structured QLs, statistical languages, FFT, datacutter, ....
These all make sense. It is just hard to understand how to compose them. It is hard to make general rules. The derived data has to be evaluated to some point where it can be moved - a RAM buffer - can the movement be part of the standard and the derivation processes be strictly corraled in some other specs?
Malcolm
-----Original Message----- From: Dave Berry Sent: 14 September 2005 21:00 To: Malcolm Atkinson; William E. Allcock; ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James Casey; Ravi Madduri Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Hi Malcolm,
I'd rather ask, what are the characteristics of a file that makes these file transfer mechanisms tractable? Then we can ask, to what extent can we generalise the mechanism?
For example, if the key characteristics are that a file can be named and supports random access, then we might generalise the mechanism to include data in RAM (which would avoid unnecessary copying to disk). This case would be analogous to some operating systems which allow entities in RAM to be addressed as part of the file system.
Conversely, if the mechanism can handle any named sequence of bytes, then it could presumably handle streaming data as well. Or if it requires other operations that are specific to the location of bytes on a disk (or tape), then the WG will restrict its attention to those cases.
I would expect this group to place a strong requirement on the OGSA WG to provide a naming system that can specify whatever data sets this WG wants to move.
Dave.
-----Original Message----- From: owner-ogsa-d-wg@ggf.org [mailto:owner-ogsa-d-wg@ggf.org] On Behalf Of Malcolm Atkinson Sent: 14 September 2005 17:20 To: William E. Allcock; ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James Casey; Ravi Madduri Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Hi Bill
I agree that such a standard interface is needed. When you look at files I presume you consider files where ever they are, secondary or tertiary storage at least.
When you say any dataa, then there is the possibility of trivial or large amounts of data between RAM, as well as data from files and databases with an enormous set of possibilities of the way it may be selected and identified. Eventually it gets close to Byte-IO, Streams (BoF) and InfoD etc.
So I'm agreeing with you that if yu go beyond files then scope control is difficult.
Would it be better to do the standardisation of file movement first and look at other forms of adta movement later?
Malcolm
-----Original Message----- From: owner-ogsa-d-wg@ggf.org [mailto:owner-ogsa-d-wg@ggf.org] On Behalf Of William E. Allcock Sent: 14 September 2005 17:04 To: ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James Casey; Ravi Madduri Subject: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Sorry for the re-send, I typoed the byte-io mail list.
All,
Sorry for the SPAM, but I sent this to the "likely suspects" who might >be interested. I have a proposed BOF (waiting for AD approval) to >discuss standardizing an interface for invoking data movement. There >are several of them out there already. CERN has the File Transfer >System (FTS), the gsm-wg has SRM copy, Globus has the Reliable File Transfer (RFT) service, etc.. I don't think there will be any argument that there is a need for such standardization, the hard part will be >scoping the extent of what we will work on. For instance, all the >examples above are file based, but ideally, this interface would work >for any data that can be addressed. > >I expect that that the BOF will be centered around scoping the working >group, but I think we should (and approval of the BOF depends on) >getting some initial discussion around the scope. So... here it goes: > >I think the obvious thing is that it needs to be able to have the basic functionality presented by FTS, RFT, and SRM-copy, however the devil is in the details, so I will break this up into "blocks of functionality":
Lets start with naming. What will this service accept as valid names >for entities that it will move? URLs? EPRs? Will logical file names be accepted or should they be translated outside this service?
Related to the naming is what type of data will this service move? >Files? video streams? the output of simulations? the output of database queries? Can we make this a service that any service that wants to move data can simply invoke it? Note that I am differentiating data from >messages. You would not use this to send the result from a service that summed a bunch of numbers, that would simply be a SOAP response... IMHO :-).
Can we make a generic module that would allow this functionality to be >applied to any service that exposes the byte-io interface? Does that >affect the interface or is it just an implementation issue? > >Can we make this service transport mechanism agnostic? both application transport (GridFTP vs HTTP vs ...) as well as network transport (TCP vs UDP vs UDT vs ...). My concern here is that I am not sure SOAP has the functionality we need. To do this, I wonder if we need the equivalent >of a union in C, so that the parameters specified are based on the transport(s) chosen. For instance, if you use TCP you need to specify a buffer size, but not for UDP. GridFTP specifies streams and data >channel authentication, but HTTP does not. > What about security / authorization. This is a broad category and we >should push as much as possible outside of scope via callouts and Policy Enforcement Points (PEPs), but what about delivery guarantees such as AT MOST ONCE, AT LEAST ONCE, EXACTLY ONCE, non-repudiation, etc.? I know >Dieter has a set of use cases that require some of this type delivery >guarantee functionality. > >A potentially contentious issue is whether or not these services will >use WSRF and notifications to expose (push from the service) or methods to query the state (pull from the service). Hopefully, we can find a >way to make each optional. > >If we start making many optional parts to the interface, it will make what is exposed as service metadata for brokering will become more >important. I would propose that we should make at a minimum a >recommendation for what facts about the service should be exposed. > >All of the existing services accept "bulk" inputs, i.e., move these 100 files. This can be a problem when the requests become very large due to de-serialization. Should we provide a "chunking" interface so that >requests can be of unlimited size? > >Please feel free to make comments on the above and more importantly suggest other important issues we need to address. > >btw, once we have a mail list of our own we will quit spamming the other lists :-).
Bill -- William E. Allcock Argonne National Laboratory Bldg 221, Office C-115A 9700 South Cass Ave Argonne, IL 60439-4844 Office Phone: +1-630-252-7573 Office Fax: +1-630-252-1997 Cell Phone: +1-630-854-2842
-- William E. Allcock Argonne National Laboratory Bldg 221, Office C-115A 9700 South Cass Ave Argonne, IL 60439-4844 Office Phone: +1-630-252-7573 Office Fax: +1-630-252-1997 Cell Phone: +1-630-854-2842
Bill, it is 2:00 pm on monday right ? On Sep 20, 2005, at 3:21 PM, William E. Allcock wrote:
The BOF has been approved:
2:00 pm - 3:30 pm
Data Movement Interface Standardization (Data Movement Interface Standardization) Charter-Discussion BOF
Many services have the need to move data (as opposed to messages invoking the service). The characteristics / semantics required can vary greatly. There are several existing interfaces (RFT, FTS, , all file based, that are similar, but incompatible. A working group spawned by this BOF would work towards a standardized set of WSDL that could invoke a service that met the requirements of existing services as well as non-File based sources.
Location: Imperial Ballroom
See you there!
Bill
Malcolm Atkinson wrote:
-----Original Message----- From: Dave Berry >Sent: 14 September 2005 21:00 To: Malcolm Atkinson; William E. Allcock; ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James >Casey; Ravi Madduri Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Hi Malcolm,
I'd rather ask, what are the characteristics of a file that makes these file transfer mechanisms tractable? Then we can ask, to what extent can we generalise the mechanism? > For example, if the key characteristics are that a file can >be named and supports random access, then we might generalise >the mechanism to include data in RAM (which would avoid >unnecessary copying to disk). This case would be analogous >to some operating systems which allow entities in RAM to be >addressed as
Dave this is a good way of posing the question. One can also relate it to the proposal a couple of years ago by Peter Kunszt for grid data handles as a generic concept for naming data. I think the problem arises when the selected data is an arbitrary subset of the stored data or a derivative of the stored data derived via any of a wide range of languages: Xquery, Xpath, SQL, LDAP, semi-structured QLs, statistical languages, FFT, datacutter, .... These all make sense. It is just hard to understand how to compose them. It is hard to make general rules. The derived data has to be evaluated to some point where it can be moved - a RAM buffer - can the movement be part of the standard and the derivation processes be strictly corraled in some other specs? Malcolm part of the file system.
Conversely, if the mechanism can handle any named sequence of bytes, then it could presumably handle streaming data as >well.
Or if it requires other operations that are specific >to the location of bytes on a disk (or tape), then the WG >will restrict its attention to those cases.
I would expect this group to place a strong requirement on >the
OGSA WG to provide a naming system that can specify >whatever data sets this WG wants to move.
Dave.
-----Original Message----- From: owner-ogsa-d-wg@ggf.org >[mailto:owner-ogsa-d-wg@ggf.org]
Sent: 14 September 2005 17:20 To: William E. Allcock; ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James Casey; Ravi Madduri Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Hi Bill
I agree that such a standard interface is needed. When you look at files I presume you consider files where >ever
On Behalf Of Malcolm Atkinson they are, secondary or tertiary storage at least.
When you say any dataa, then there is the possibility of trivial or large amounts of data between RAM, as well as data from files and databases with an enormous set of >possibilities
(BoF) and InfoD etc.
So I'm agreeing with you that if yu go beyond files then >scope control is difficult.
Would it be better to do the standardisation of file movement first and look at other forms of adta movement later?
Malcolm
-----Original Message----- From: owner-ogsa-d-wg@ggf.org > >[mailto:owner-ogsa-d-
wg@ggf.org] On Behalf Of William E. Allcock
Sent: 14 September 2005 17:04 To: ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; > Peter Kunszt; James Casey; Ravi Madduri Subject: [ogsa-d-wg] Draft Charter for Data Movement > Interface Standardization WG
Sorry for the re-send, I typoed the byte-io mail list.
All,
Sorry for the SPAM, but I sent this to the "likely >suspects" who might >be interested. I have a proposed BOF >(waiting for AD approval) to >discuss standardizing an >interface for invoking data movement. There >are several of >them out there already. CERN has the File Transfer >System >(FTS), the gsm-wg has SRM copy, Globus has the Reliable File >>Transfer (RFT) service, etc.. I don't think there will be > >any argument that there is a need for such standardization, the hard >part will be >scoping the extent of what we will work on. >For instance, all the >examples above are file based, but >ideally,
I expect that that the BOF will be centered >around scoping
of the way it may be selected and identified. >Eventually it gets close to Byte-IO, Streams this interface would work >for any data that can be >addressed. the working >group, but I think we should >(and approval of the BOF depends on) >getting some initial >discussion around the scope. So... here it goes: > >I >think the obvious thing is that it needs to be able to have > >the basic
functionality presented by FTS, RFT, and SRM-copy, however > the devil is in the details, so I will break this up into "blocks of > functionality":
Lets start with naming. What will this service accept as valid names >for entities that it will move? URLs? EPRs? >Will logical > >file names be accepted or should they be translated outside this service?
Related to the naming is what type of data will this >service move? >Files? video streams? the output of >simulations? the output > >of database queries? Can we make this a service that any service that > wants to move data can simply invoke it? Note that I am differentiating data from >messages. You would not use this to send the result from a > >service that summed a bunch of numbers, that would simply be a SOAP > response... IMHO :-).
Can we make a generic module that would allow this functionality to be >applied to any service that exposes the byte-io interface? Does that >affect the interface or is it just an implementation issue? > >Can we make this service transport mechanism agnostic? both > >application transport (GridFTP vs HTTP vs ...) as well as network > transport (TCP vs UDP vs UDT vs ...). My concern here is that I am not sure > SOAP has the functionality we need. To do this, I wonder if we need the equivalent >of a union in C, so that the parameters >specified are based on the transport(s) chosen. For instance, if you use TCP you need > to specify a buffer size, but not for UDP. GridFTP specifies streams >and data >channel authentication, but HTTP does not. > >>What about security / authorization. This is a broad >category and we >should push as much as possible outside of >scope via callouts > >and Policy Enforcement Points (PEPs), but what about delivery guarantees such as AT MOST ONCE, AT LEAST ONCE, EXACTLY ONCE, non-repudiation, etc.? I know >Dieter has a set of use cases that require >some of this type delivery >guarantee functionality. > >A potentially contentious issue is whether or not these >services will >use WSRF and notifications to expose (push >from the service) > >or methods to query the state (pull from the service). Hopefully, we can find a >way to make each optional. > >If we start >making many optional parts to the interface, it will make >>what is exposed as service metadata for brokering will >become more important. I would propose that we should make >at a minimum a recommendation for what facts about the >service should be exposed. > >All of the existing services >accept "bulk" inputs, i.e., move > >these 100 files. This can be a problem when the requests become very > large due to de-serialization. Should we provide a "chunking" interface so that >requests can be of unlimited size? > >Please feel free to make comments on the above and more importantly suggest other important issues we need to address. > >btw, once we have a mail list of our own we will quit > >spamming the other lists :-).
Bill -- > >William E. Allcock Argonne National Laboratory Bldg 221, Office C-115A 9700 South Cass Ave Argonne, IL 60439-4844 Office Phone: +1-630-252-7573 Office Fax: +1-630-252-1997 Cell Phone: +1-630-854-2842
-- William E. Allcock Argonne National Laboratory Bldg 221, Office C-115A 9700 South Cass Ave Argonne, IL 60439-4844 Office Phone: +1-630-252-7573 Office Fax: +1-630-252-1997 Cell Phone: +1-630-854-2842
-- Ravi K Madduri The Globus Alliance | Argonne National Laboratory http://www-unix.mcs.anl.gov/~madduri
Sorry, yes, on Monday :-) Ravi Madduri wrote:
Bill, it is 2:00 pm on monday right ?
On Sep 20, 2005, at 3:21 PM, William E. Allcock wrote:
The BOF has been approved:
2:00 pm - 3:30 pm
Data Movement Interface Standardization (Data Movement Interface Standardization) Charter-Discussion BOF
Many services have the need to move data (as opposed to messages invoking the service). The characteristics / semantics required can vary greatly. There are several existing interfaces (RFT, FTS, , all file based, that are similar, but incompatible. A working group spawned by this BOF would work towards a standardized set of WSDL that could invoke a service that met the requirements of existing services as well as non-File based sources.
Location: Imperial Ballroom
See you there!
Bill
Malcolm Atkinson wrote:
-----Original Message----- From: Dave Berry >Sent: 14 September 2005 21:00 To: Malcolm Atkinson; William E. Allcock; ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James >Casey; Ravi Madduri Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Hi Malcolm,
I'd rather ask, what are the characteristics of a file that makes these file transfer mechanisms tractable? Then we can >ask, to what extent can we generalise the mechanism? > For example, if the key characteristics are that a file can >be named and supports random access, then we might generalise >the mechanism to include data in RAM (which would avoid >unnecessary copying to disk). This case would be analogous >to some operating systems which allow entities in RAM to be >addressed as part of the file system.
Conversely, if the mechanism can handle any named sequence of bytes, then it could presumably handle streaming data as >well. Or if it requires other operations that are specific >to the location of bytes on a disk (or tape), then the WG >will restrict its attention to those cases.
I would expect this group to place a strong requirement on >the OGSA WG to provide a naming system that can specify >whatever data sets this WG wants to move.
Dave.
-----Original Message----- From: owner-ogsa-d-wg@ggf.org >[mailto:owner-ogsa-d-wg@ggf.org] On Behalf Of Malcolm Atkinson Sent: 14 September 2005 17:20 To: William E. Allcock; ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; Peter Kunszt; James Casey; Ravi Madduri Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG
Hi Bill
I agree that such a standard interface is needed. When you look at files I presume you consider files where >ever
Dave this is a good way of posing the question. One can also relate it to the proposal a couple of years ago by Peter Kunszt for grid data handles as a generic concept for naming data. I think the problem arises when the selected data is an arbitrary subset of the stored data or a derivative of the stored data derived via any of a wide range of languages: Xquery, Xpath, SQL, LDAP, semi-structured QLs, statistical languages, FFT, datacutter, .... These all make sense. It is just hard to understand how to compose them. It is hard to make general rules. The derived data has to be evaluated to some point where it can be moved - a RAM buffer - can the movement be part of the standard and the derivation processes be strictly corraled in some other specs? Malcolm they are, secondary or tertiary storage at least.
When you say any dataa, then there is the possibility of >trivial
(BoF) and InfoD etc.
So I'm agreeing with you that if yu go beyond files then >scope control is difficult.
Would it be better to do the standardisation of file movement first and look at other forms of adta movement later?
Malcolm
-----Original Message----- From: owner-ogsa-d-wg@ggf.org > >[mailto:owner-ogsa-d-
wg@ggf.org] On Behalf Of William E. Allcock
Sent: 14 September 2005 17:04 To: ogsa-d-wg@ggf.org; gsm-wg@ggf.org; byte-io-wg@ggf.org; > Peter Kunszt; James Casey; Ravi Madduri Subject: [ogsa-d-wg] Draft Charter for Data Movement > Interface Standardization WG
Sorry for the re-send, I typoed the byte-io mail list.
All,
Sorry for the SPAM, but I sent this to the "likely >suspects" who might >be interested. I have a proposed BOF >(waiting for AD approval) to >discuss standardizing an >interface for invoking data movement. There >are several of >them out there already. CERN has the File Transfer >System >(FTS), the gsm-wg has SRM copy, Globus has the Reliable File >>Transfer (RFT) service, etc.. I don't think there will be > >any argument that there is a need for such standardization, the hard >part will be >scoping the extent of what we will work on. >For instance, all the >examples above are file based, but >ideally,
I expect that that the BOF will be centered >around scoping the working >group, but I think we should >(and approval of the BOF depends on) >getting some initial >discussion around the scope. So... here it goes: > >I >think the obvious thing is that it needs to be able to have > >the basic functionality presented by FTS, RFT, and SRM-copy, however > the devil is in the details, so I will break this up into "blocks of > functionality":
Lets start with naming. What will this service accept as valid names >for entities that it will move? URLs? EPRs? >Will logical > >file names be accepted or should they be translated outside this service?
Related to the naming is what type of data will this >service move? >Files? video streams? the output of >simulations? the output > >of database queries? Can we make this a service that any service that > wants to move data can simply invoke it? Note that I am differentiating data from >messages. You would not use this to send the >result from a > >service that summed a bunch of numbers, that would simply be a SOAP > response... IMHO :-).
Can we make a generic module that would allow this functionality to be >applied to any service that exposes the byte-io interface? Does that >affect the interface or is it just an implementation issue? > >Can we make this service transport mechanism agnostic? both > >application transport (GridFTP vs HTTP vs ...) as well as network > transport (TCP vs UDP vs UDT vs ...). My concern here is that I am not sure > SOAP has the functionality we need. To do this, I wonder if we need the equivalent >of a union in C, so that the parameters >specified are based on the transport(s) chosen. For instance, if you use TCP you need > to specify a buffer size, but not for UDP. GridFTP specifies streams >and data >channel authentication, but HTTP does not. > >>What about security / authorization. This is a broad >category and we should push as much as possible outside of >scope via callouts > and Policy Enforcement Points (PEPs), but what about delivery guarantees > such as AT MOST ONCE, AT LEAST ONCE, EXACTLY ONCE, non-repudiation, etc.? I know >Dieter has a set of use cases that require >some of this type delivery >guarantee functionality. > >A potentially contentious issue is whether or not these >services will >use WSRF and notifications to expose (push >from the service) > >or methods to query the state (pull from the service). Hopefully, we >can find a >way to make each optional. > >If we start >making many
files. This can be a problem when the requests become very > large due to de-serialization. Should we provide a "chunking" interface >so
or large amounts of data between RAM, as well as data >from files and databases with an enormous set of >possibilities of the way it may be selected and identified. >Eventually it gets close to Byte-IO, Streams this interface would work >for any data that can be >addressed. optional parts to the interface, it will make >>what is exposed as service metadata for brokering will >become more >important. I would propose that we should make >at a minimum a >recommendation for what facts about the >service should be exposed. > >All of the existing services >accept "bulk" inputs, i.e., move > >these 100 that >requests can be of unlimited size? > >Please feel >free to make comments on the above and more importantly >>suggest other important issues we need to address. > >btw, >once we have a mail list of our own we will quit > >spamming the other
lists :-).
Bill -- > >William E. Allcock Argonne National Laboratory Bldg 221, Office C-115A 9700 South Cass Ave Argonne, IL 60439-4844 Office Phone: +1-630-252-7573 Office Fax: +1-630-252-1997 Cell Phone: +1-630-854-2842
-- William E. Allcock Argonne National Laboratory Bldg 221, Office C-115A 9700 South Cass Ave Argonne, IL 60439-4844 Office Phone: +1-630-252-7573 Office Fax: +1-630-252-1997 Cell Phone: +1-630-854-2842
-- Ravi K Madduri The Globus Alliance | Argonne National Laboratory http://www-unix.mcs.anl.gov/~madduri
-- William E. Allcock Argonne National Laboratory Bldg 221, Office C-115A 9700 South Cass Ave Argonne, IL 60439-4844 Office Phone: +1-630-252-7573 Office Fax: +1-630-252-1997 Cell Phone: +1-630-854-2842
participants (3)
-
Malcolm Atkinson
-
Ravi Madduri
-
William E. Allcock