
Hi Please find attached my proposals to include more than one source/sink pairs in the RequestDataTransferInstance and another proposal to include the retry information in the transfer request. I can talk about the proposal more in tomorrow's call Thanks -- Ravi K Madduri The Globus Alliance | Argonne National Laboratory | University of Chicago http://www-unix.mcs.anl.gov/~madduri

I have looked at Ravi's proposals and would like to offer some comments. The second proposal, on retries, basically seems okay to me. I suspect that there is an issue to be addressed regarding what is done with data that is partially transferred when a transfer is retried. But, perhaps, that is covered if the operation eventually succeeds (presumably no orphan data) or fails (covered by the DMI spec regarding cleanup after failure). I have serious concerns about the first proposal, for multiple objects to be transferred. I see two major problems: 1. The proposal makes no restrictions on what the source/sink DEPRs refer to. Ravi's clear goal is that they refer to different objects on the same server. However, the proposal makes no such restriction. Thus the proposal admits to the N sources being on N different servers, perhaps widely separated both geographically and organizationally. The complications that this creates for the DTF and DTI are almost unimaginable and, I feel, uncaccpetable. I see no way to make a simple change to this proposal to address this problem. 2. There is nothing in the proposal that ties a particular source DEPR to a particular sink DEPR. Since the source/sink sequences are ordered, perhaps the desire is to make the connection via that ordering. Thus this can probably be handled fairly simply. As a mater of principle, I think that this whole issue is not one that the DMI spec either should or needs to address. Rather, the multiplicity of entities to be transferred should solely be an issue to be addressed by the client when getting the DEPR from the source (sink). The source (sink) should mint the DEPR in a way that encodes the multiplicity of entities to be transferred. Then, when the DTF and DTI pass that DEPR back to the source (sink) it can take the DEPR apart and properly handle the multiplicity of objects. I can see backing away from this rigid approach to allow ***implementations*** of a DTI to be aware of the way that certain DEPRs are minted (e.g., if we are transferring multiple files, the source/sink file systems and the FTP DTI could agree on the way multiple files are denoted in the DEPRs). In either case, it should not be the responsibility of the DMI spec to specify this behavior. Allen Luniewski IBM WebSphere Cross Brand Services IBM Silicon Valley Laboratory 555 Bailey Ave. San Jose, CA 95141 408-463-2255 408-930-1844 (mobile) Ravi Madduri <madduri@mcs.anl. gov> To Sent by: ogsa-dmi-wg@ogf.org ogsa-dmi-wg-bounc cc es@ogf.org Subject [ogsa-dmi-wg] Proposals 12/04/2007 02:24 PM Hi Please find attached my proposals to include more than one source/sink pairs in the RequestDataTransferInstance and another proposal to include the retry information in the transfer request. I can talk about the proposal more in tomorrow's call Thanks [attachment "DMI_Proposal_Retries" deleted by Allen Luniewski/Almaden/IBM] -- Ravi K Madduri The Globus Alliance | Argonne National Laboratory | University of Chicago http://www-unix.mcs.anl.gov/~madduri -- ogsa-dmi-wg mailing list ogsa-dmi-wg@ogf.org http://www.ogf.org/mailman/listinfo/ogsa-dmi-wg

Hi all, some comments from me inline: On 4 Dec 2007, at 23:48, Allen Luniewski wrote:
[...]
The second proposal, on retries, basically seems okay to me. I suspect that there is an issue to be addressed regarding what is done with data that is partially transferred when a transfer is retried. But, perhaps, that is covered if the operation eventually succeeds (presumably no orphan data) or fails (covered by the DMI spec regarding cleanup after failure).
I agree that the second proposal is of less controversy. However, I would argue that the concept of retrying a data transfer should not be different in its semantics from a single attempt to transfer data. The proposal of integrating the number of retries in the transfer requirements seem fine. But I would not want to introduce a new state reflecting that previous attempts have failed. I'd rather propose another optional DTI attribute that indicates how many attempts have failed already. Regarding the effects on failed, partial attempts I would also argue not to change the overall semantics. For this I see two suitable alternatives: a) Leave it as an implementation detail on how exactly the DTI handles failed attempts as long as the overall semantics are kept, or b) we explicitly require that each failed attempt MUST abide by the rules laid out for the cleanup rules given for the chosen data transfer protocol. Personally, I would prefer option b.
I have serious concerns about the first proposal, for multiple objects to be transferred. I see two major problems: 1. The proposal makes no restrictions on what the source/sink DEPRs refer to. Ravi's clear goal is that they refer to different objects on the same server. However, the proposal makes no such restriction. Thus the proposal admits to the N sources being on N different servers, perhaps widely separated both geographically and organizationally. The complications that this creates for the DTF and DTI are almost unimaginable and, I feel, uncaccpetable. I see no way to make a simple change to this proposal to address this problem. 2. There is nothing in the proposal that ties a particular source DEPR to a particular sink DEPR. Since the source/sink sequences are ordered, perhaps the desire is to make the connection via that ordering. Thus this can probably be handled fairly simply. As a mater of principle, I think that this whole issue is not one that the DMI spec either should or needs to address. Rather, the multiplicity of entities to be transferred should solely be an issue to be addressed by the client when getting the DEPR from the source (sink). The source (sink) should mint the DEPR in a way that encodes the multiplicity of entities to be transferred. Then, when the DTF and DTI pass that DEPR back to the source (sink) it can take the DEPR apart and properly handle the multiplicity of objects. I can see backing away from this rigid approach to allow ***implementations*** of a DTI to be aware of the way that certain DEPRs are minted (e.g., if we are transferring multiple files, the source/sink file systems and the FTP DTI could agree on the way multiple files are denoted in the DEPRs). In either case, it should not be the responsibility of the DMI spec to specify this behavior.
I second this entirely. Cheers, Michel

Hi,
I have looked at Ravi's proposals and would like to offer some comments.
Me too.
The second proposal, on retries, basically seems okay to me. I suspect that there is an issue to be addressed regarding what is done with data that is partially transferred when a transfer is retried. But, perhaps, that is covered if the operation eventually succeeds (presumably no orphan data) or fails (covered by the DMI spec regarding cleanup after failure).
Clarification for me - retries re-uses the same protocol for the retries? There is nothing complex going on there right?
I have serious concerns about the first proposal, for multiple objects to be transferred. I see two major problems: 1. The proposal makes no restrictions on what the source/sink DEPRs refer to. Ravi's clear goal is that they refer to different objects on the same server. However, the proposal makes no such restriction. Thus the proposal admits to the N sources being on N different servers, perhaps widely separated both geographically and organizationally. The complications that this creates for the DTF and DTI are almost unimaginable and, I feel, uncaccpetable. I see no way to make a simple change to this proposal to address this problem.
This could be an issue but if you made it an optional mode of behaviour then any implementation could discard more than one DMIDataTransferUnitElement if present, hence a DTF need not have to deal with this. In some instances though there might be advantages to supporting this mode of operation. If it was optional would you be ok with this Allen? I can see the potential complexity that you are alluding to.
2. There is nothing in the proposal that ties a particular source DEPR to a particular sink DEPR. Since the source/sink sequences are ordered, perhaps the desire is to make the connection via that ordering. Thus this can probably be handled fairly simply.
My call would be that there should be no associated ordering. If using this for optimisation then you may want to perform all the transfers in parallel. However, what is the failure policy? Best effort? One failure leads to the transfer being aborted?
As a mater of principle, I think that this whole issue is not one that the DMI spec either should or needs to address. Rather, the multiplicity of entities to be transferred should solely be an issue to be addressed by the client when getting the DEPR from the source (sink). The source (sink) should mint the DEPR in a way that encodes the multiplicity of entities to be transferred. Then, when the DTF and DTI pass that DEPR back to the source (sink) it can take the DEPR apart and properly handle the multiplicity of objects. I can see backing away from this rigid approach to allow ***implementations*** of a DTI to be aware of the way that certain DEPRs are minted (e.g., if we are transferring multiple files, the source/sink file systems and the FTP DTI could agree on the way multiple files are denoted in the DEPRs). In either case, it should not be the responsibility of the DMI spec to specify this behavior.
We coined the DataEPR term right? If we do not define the basics of this who are we expecting to do it? I guess we want to be clear enough to be able to describe the syntax and semantics so that these objects may be used in DMI. Mario +-----------------------------------------------------------------------+ |Mario Antonioletti:EPCC,JCMB,The King's Buildings,Edinburgh EH9 3JZ. | |Tel:0131 650 5141|mario@epcc.ed.ac.uk|http://www.epcc.ed.ac.uk/~mario/ | +-----------------------------------------------------------------------+
participants (4)
-
Allen Luniewski
-
Mario Antonioletti
-
Michel Drescher
-
Ravi Madduri