
Andrew's e-mail bounced... ---------- Forwarded message ---------- Date: Mon, 6 Jun 2005 15:18:05 -0400 From: Andrew Grimshaw <grimshaw@cs.virginia.edu> To: jsdl-wg@gridforum.org Subject: Question on file stage-in All, I am new to this working group and have a question on the specification. In the data staging elements there is a creation flag that indicates whether to over-write or append. Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file. Is there a way to do conditional stage in? I know the document says "More complex file transfers, for example, conditional transfers based on job termination status are out of scope. " But we're talking about a major performance optimization. Andrew

Hi Andrew, first off, welcome to JSDL. :-)
In the data staging elements there is a creation flag that indicates whether to over-write or append. Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file. Is there a way to do conditional stage in?
Not at the moment. This may be different in the future.
But we're talking about a major performance optimization.
I am not objecting in general, but I do wonder - how is "different" defined? By creation date? By modification date? Size? MD5 hash? - Who carries out the assessment and decides if source and target are different or the same? I think a more prominent use case for a performance enhancement are partial file transfers (in your example, this would only transfer the changed bits of your data set). But both, partial file transfers and conditional file transfers can be already realised today using extensions to JSDL: Note that, referring to JSDL 1.0 draft 19 (http://tinyurl.com/cvfmk), both source and target elements in a JSDL document instance do not have to have a jsdl:URI child element. You can add as many child elements and attributes as you want. Cheers, Michel

Partial (incremental) transfers can be done without extensions by specifying an appropriate URI. The spec already includes this example: <jsdl:DataStaging> <jsdl:Source> <jsdl:URI>rsync://foo.bar.com/~me/job1/input</jsdl:URI> </jsdl:Source> ... </jsdl:DataStaging> Michel Drescher wrote:
Hi Andrew,
first off, welcome to JSDL. :-)
In the data staging elements there is a creation flag that indicates whether to over-write or append. Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file. Is there a way to do conditional stage in?
Not at the moment. This may be different in the future.
But we're talking about a major performance optimization.
I am not objecting in general, but I do wonder - how is "different" defined? By creation date? By modification date? Size? MD5 hash? - Who carries out the assessment and decides if source and target are different or the same?
I think a more prominent use case for a performance enhancement are partial file transfers (in your example, this would only transfer the changed bits of your data set).
But both, partial file transfers and conditional file transfers can be already realised today using extensions to JSDL: Note that, referring to JSDL 1.0 draft 19 (http://tinyurl.com/cvfmk), both source and target elements in a JSDL document instance do not have to have a jsdl:URI child element. You can add as many child elements and attributes as you want.
-- Andreas Savva Fujitsu Laboratories Ltd

Michel, I did not see rsync. It is very close, but depending on how it is implemented may end up moving a lot of data around anyway. Thanks! Andrew -----Original Message----- From: owner-jsdl-wg@ggf.org [mailto:owner-jsdl-wg@ggf.org] On Behalf Of Andreas Savva Sent: Wednesday, June 08, 2005 10:38 PM To: Michel Drescher Cc: Ali Anjomshoaa; JSDL WG Subject: Re: [jsdl-wg] Question on file stage-in (fwd) Partial (incremental) transfers can be done without extensions by specifying an appropriate URI. The spec already includes this example: <jsdl:DataStaging> <jsdl:Source> <jsdl:URI>rsync://foo.bar.com/~me/job1/input</jsdl:URI> </jsdl:Source> ... </jsdl:DataStaging> Michel Drescher wrote:
Hi Andrew,
first off, welcome to JSDL. :-)
In the data staging elements there is a creation flag that indicates whether to over-write or append. Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file. Is there a way to do conditional stage in?
Not at the moment. This may be different in the future.
But we're talking about a major performance optimization.
I am not objecting in general, but I do wonder - how is "different" defined? By creation date? By modification date? Size? MD5 hash? - Who carries out the assessment and decides if source and target are different or the same?
I think a more prominent use case for a performance enhancement are partial file transfers (in your example, this would only transfer the changed bits of your data set).
But both, partial file transfers and conditional file transfers can be already realised today using extensions to JSDL: Note that, referring to JSDL 1.0 draft 19 (http://tinyurl.com/cvfmk), both source and target elements in a JSDL document instance do not have to have a jsdl:URI child element. You can add as many child elements and attributes as you want.
-- Andreas Savva Fujitsu Laboratories Ltd

Michel, Your questions: " I am not objecting in general, but I do wonder - how is "different" defined? By creation date? By modification date? Size? MD5 hash? - Who carries out the assessment and decides if source and target are different or the same? I think a more prominent use case for a performance enhancement are partial file transfers (in your example, this would only transfer the changed bits of your data set). But both, partial file transfers and conditional file transfers can be already realised today using extensions to JSDL: Note that, referring to JSDL 1.0 draft 19 (http://tinyurl.com/cvfmk), both source and target elements in a JSDL document instance do not have to have a jsdl:URI child element. You can add as many child elements and attributes as you want." Different may depend on semantics. We've used last update time from stat in the past ... just see if they are different - of course we kept track is a separate place the mod time of the cached copy. In general "difference" depends on the semantics of the "file". The problem with partial file transfers is that they may require extensive book-keeping on the server side of writes, or diffs between different versions. I think that in the long term we'll end up with different "types" of file services that can, in fact, do differential synchronization. But that is beyond where we are now. I'm not sure how I would use the extensibility option you mention. Thanks for the reply. Andrew -----Original Message----- From: owner-jsdl-wg@ggf.org [mailto:owner-jsdl-wg@ggf.org] On Behalf Of Michel Drescher Sent: Wednesday, June 08, 2005 10:55 AM To: Ali Anjomshoaa Cc: JSDL WG Subject: Re: [jsdl-wg] Question on file stage-in (fwd) Hi Andrew, first off, welcome to JSDL. :-)
In the data staging elements there is a creation flag that indicates whether to over-write or append. Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file. Is there a way to do conditional stage in?
Not at the moment. This may be different in the future.
But we're talking about a major performance optimization.
I am not objecting in general, but I do wonder - how is "different" defined? By creation date? By modification date? Size? MD5 hash? - Who carries out the assessment and decides if source and target are different or the same? I think a more prominent use case for a performance enhancement are partial file transfers (in your example, this would only transfer the changed bits of your data set). But both, partial file transfers and conditional file transfers can be already realised today using extensions to JSDL: Note that, referring to JSDL 1.0 draft 19 (http://tinyurl.com/cvfmk), both source and target elements in a JSDL document instance do not have to have a jsdl:URI child element. You can add as many child elements and attributes as you want. Cheers, Michel

Hi Andrew, here's an example how to do that (see attached files). You may want to adjust the xs:schemaLocation definitions, so that the example document validates against the two schemas. Cheers, Michel On 9 Jun 2005, at 14:27, Andrew Grimshaw wrote:
Michel,
Your questions: " I am not objecting in general, but I do wonder - how is "different" defined? By creation date? By modification date? Size? MD5 hash? - Who carries out the assessment and decides if source and target are different or the same?
I think a more prominent use case for a performance enhancement are partial file transfers (in your example, this would only transfer the changed bits of your data set).
But both, partial file transfers and conditional file transfers can be already realised today using extensions to JSDL: Note that, referring to JSDL 1.0 draft 19 (http://tinyurl.com/cvfmk), both source and target elements in a JSDL document instance do not have to have a jsdl:URI child element. You can add as many child elements and attributes as you want."
Different may depend on semantics. We've used last update time from stat in the past ... just see if they are different - of course we kept track is a separate place the mod time of the cached copy. In general "difference" depends on the semantics of the "file".
The problem with partial file transfers is that they may require extensive book-keeping on the server side of writes, or diffs between different versions. I think that in the long term we'll end up with different "types" of file services that can, in fact, do differential synchronization. But that is beyond where we are now.
I'm not sure how I would use the extensibility option you mention.
Thanks for the reply.
Andrew
-----Original Message----- From: owner-jsdl-wg@ggf.org [mailto:owner-jsdl-wg@ggf.org] On Behalf Of Michel Drescher Sent: Wednesday, June 08, 2005 10:55 AM To: Ali Anjomshoaa Cc: JSDL WG Subject: Re: [jsdl-wg] Question on file stage-in (fwd)
Hi Andrew,
first off, welcome to JSDL. :-)
In the data staging elements there is a creation flag that indicates whether to over-write or append. Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file. Is there a way to do conditional stage in?
Not at the moment. This may be different in the future.
But we're talking about a major performance optimization.
I am not objecting in general, but I do wonder - how is "different" defined? By creation date? By modification date? Size? MD5 hash? - Who carries out the assessment and decides if source and target are different or the same?
I think a more prominent use case for a performance enhancement are partial file transfers (in your example, this would only transfer the changed bits of your data set).
But both, partial file transfers and conditional file transfers can be already realised today using extensions to JSDL: Note that, referring to JSDL 1.0 draft 19 (http://tinyurl.com/cvfmk), both source and target elements in a JSDL document instance do not have to have a jsdl:URI child element. You can add as many child elements and attributes as you want.
Cheers, Michel

Andrew Grimshaw wrote:
In the data staging elements there is a creation flag that indicates whether to over-write or append.
Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file.
Is there a way to do conditional stage in?
My understanding is that the implementation is free to use some mechanism outside the JSDL scope to optimize downloads. From the JSDL perspective (not that this is the only valid perspective, of course) it doesn't matter all that much. Looking at the latest version of the spec though, I see that the value "jsdl:dontOverwrite" could be interpreted to mean "transfer only if not pre-existing". Of course, at that point you then have to worry about whether the two ends refer to the same data, but if you're piling the data into some job-specific dir that's IMO not a huge issue.
I know the document says
"More complex file transfers, for example, conditional transfers based on job termination status are out of scope. "
But we're talking about a major performance optimization.
Well, perhaps. But if we add it, we have to worry about systems that don't have a mechanism for doing this sort of thing in the first place. Indeed, the JSDL 1.0[*] spec leaves out many things that could be major performance wins (e.g. compressed data transfers) either because they're complicated in their own right, or because we decided to get a spec going *this* decade as opposed to the next one. :^) None of which means that we will not go back and revisit these issues once there is some more data and experience reports available on actual deployed implementations. In particular, as it is a spec put together by mainly compute guys, we know that the data-related part is in need of fleshing out in the future. We've not reached the end of the story. Maybe just the end of the chapter instead. ;^) Donal. [*] As a side note, it should be done just about now as I write this. :)

I think at this point we should remember that JSDL 1.0 is a minimum specification. All consumers need to be able to deal with the elements we define. I do agree with Andrew that optimised transfers are very useful and it's a prime example of an extension to JSDL that should be done. Then at some point in the future you'll have services which handle the useful extensions and some which don't. People will submit jobs to services which have the extensions they desire - though could use the others if they have to. This will hopefully cause services which don't have the extensions to upgrade or "mimic" the features so they do get jobs. This is something I've already seen signs of. steve.. Donal K. Fellows wrote:
Andrew Grimshaw wrote:
In the data staging elements there is a creation flag that indicates whether to over-write or append.
Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file. Is there a way to do conditional stage in?
My understanding is that the implementation is free to use some mechanism outside the JSDL scope to optimize downloads. From the JSDL perspective (not that this is the only valid perspective, of course) it doesn't matter all that much. Looking at the latest version of the spec though, I see that the value "jsdl:dontOverwrite" could be interpreted to mean "transfer only if not pre-existing". Of course, at that point you then have to worry about whether the two ends refer to the same data, but if you're piling the data into some job-specific dir that's IMO not a huge issue.
I know the document says
"More complex file transfers, for example, conditional transfers based on job termination status are out of scope. "
But we're talking about a major performance optimization.
Well, perhaps. But if we add it, we have to worry about systems that don't have a mechanism for doing this sort of thing in the first place. Indeed, the JSDL 1.0[*] spec leaves out many things that could be major performance wins (e.g. compressed data transfers) either because they're complicated in their own right, or because we decided to get a spec going *this* decade as opposed to the next one. :^) None of which means that we will not go back and revisit these issues once there is some more data and experience reports available on actual deployed implementations. In particular, as it is a spec put together by mainly compute guys, we know that the data-related part is in need of fleshing out in the future.
We've not reached the end of the story. Maybe just the end of the chapter instead. ;^)
Donal. [*] As a side note, it should be done just about now as I write this. :)
-- ------------------------------------------------------------------------ Dr A. Stephen McGough ------------------------------------------------------------------------ Research Associate, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8310 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------ Assistant Warden, Brabazon House, Pimlico, 5 Moreton Street, London SW1V 2PN, UK tel: +44 (0)207-828-4733 fax: +44 (0)207-233-8105 ------------------------------------------------------------------------

All, Is there a way to do condition statements in JSDL, in other words the implementation that is "executing" the JSDL can choose which of several options? Andrew -----Original Message----- From: owner-jsdl-wg@ggf.org [mailto:owner-jsdl-wg@ggf.org] On Behalf Of Steve McGough Sent: Thursday, June 09, 2005 7:07 AM Cc: JSDL WG Subject: Re: [jsdl-wg] Question on file stage-in (fwd) I think at this point we should remember that JSDL 1.0 is a minimum specification. All consumers need to be able to deal with the elements we define. I do agree with Andrew that optimised transfers are very useful and it's a prime example of an extension to JSDL that should be done. Then at some point in the future you'll have services which handle the useful extensions and some which don't. People will submit jobs to services which have the extensions they desire - though could use the others if they have to. This will hopefully cause services which don't have the extensions to upgrade or "mimic" the features so they do get jobs. This is something I've already seen signs of. steve.. Donal K. Fellows wrote:
Andrew Grimshaw wrote:
In the data staging elements there is a creation flag that indicates whether to over-write or append.
Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file. Is there a way to do conditional stage in?
My understanding is that the implementation is free to use some mechanism outside the JSDL scope to optimize downloads. From the JSDL perspective (not that this is the only valid perspective, of course) it doesn't matter all that much. Looking at the latest version of the spec though, I see that the value "jsdl:dontOverwrite" could be interpreted to mean "transfer only if not pre-existing". Of course, at that point you then have to worry about whether the two ends refer to the same data, but if you're piling the data into some job-specific dir that's IMO not a huge issue.
I know the document says
"More complex file transfers, for example, conditional transfers based on job termination status are out of scope. "
But we're talking about a major performance optimization.
Well, perhaps. But if we add it, we have to worry about systems that don't have a mechanism for doing this sort of thing in the first place. Indeed, the JSDL 1.0[*] spec leaves out many things that could be major performance wins (e.g. compressed data transfers) either because they're complicated in their own right, or because we decided to get a spec going *this* decade as opposed to the next one. :^) None of which means that we will not go back and revisit these issues once there is some more data and experience reports available on actual deployed implementations. In particular, as it is a spec put together by mainly compute guys, we know that the data-related part is in need of fleshing out in the future.
We've not reached the end of the story. Maybe just the end of the chapter instead. ;^)
Donal. [*] As a side note, it should be done just about now as I write this. :)
-- ------------------------------------------------------------------------ Dr A. Stephen McGough ------------------------------------------------------------------------ Research Associate, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8310 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------ Assistant Warden, Brabazon House, Pimlico, 5 Moreton Street, London SW1V 2PN, UK tel: +44 (0)207-828-4733 fax: +44 (0)207-233-8105 ------------------------------------------------------------------------

JSDL 1.0 does not support conditionals. These were removed (in their last form - profiles) from the JSDL spec at GGF 13. It is a subject we wish to re-visit this post JSDL 1.0 release. steve.. Andrew Grimshaw wrote:
All, Is there a way to do condition statements in JSDL, in other words the implementation that is "executing" the JSDL can choose which of several options?
Andrew
-----Original Message----- From: owner-jsdl-wg@ggf.org [mailto:owner-jsdl-wg@ggf.org] On Behalf Of Steve McGough Sent: Thursday, June 09, 2005 7:07 AM Cc: JSDL WG Subject: Re: [jsdl-wg] Question on file stage-in (fwd)
I think at this point we should remember that JSDL 1.0 is a minimum specification. All consumers need to be able to deal with the elements we define.
I do agree with Andrew that optimised transfers are very useful and it's a prime example of an extension to JSDL that should be done. Then at some point in the future you'll have services which handle the useful extensions and some which don't. People will submit jobs to services which have the extensions they desire - though could use the others if they have to. This will hopefully cause services which don't have the extensions to upgrade or "mimic" the features so they do get jobs. This is something I've already seen signs of.
steve..
Donal K. Fellows wrote:
Andrew Grimshaw wrote:
In the data staging elements there is a creation flag that indicates whether to over-write or append.
Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file. Is there a way to do conditional stage in?
My understanding is that the implementation is free to use some mechanism outside the JSDL scope to optimize downloads. From the JSDL perspective (not that this is the only valid perspective, of course) it doesn't matter all that much. Looking at the latest version of the spec though, I see that the value "jsdl:dontOverwrite" could be interpreted to mean "transfer only if not pre-existing". Of course, at that point you then have to worry about whether the two ends refer to the same data, but if you're piling the data into some job-specific dir that's IMO not a huge issue.
I know the document says
"More complex file transfers, for example, conditional transfers based on job termination status are out of scope. "
But we're talking about a major performance optimization.
Well, perhaps. But if we add it, we have to worry about systems that don't have a mechanism for doing this sort of thing in the first place. Indeed, the JSDL 1.0[*] spec leaves out many things that could be major performance wins (e.g. compressed data transfers) either because they're complicated in their own right, or because we decided to get a spec going *this* decade as opposed to the next one. :^) None of which means that we will not go back and revisit these issues once there is some more data and experience reports available on actual deployed implementations. In particular, as it is a spec put together by mainly compute guys, we know that the data-related part is in need of fleshing out in the future.
We've not reached the end of the story. Maybe just the end of the chapter instead. ;^)
Donal. [*] As a side note, it should be done just about now as I write this. :)
-- ------------------------------------------------------------------------ Dr A. Stephen McGough ------------------------------------------------------------------------ Research Associate, Imperial College London, Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK tel: +44 (0)207-594-8310 fax: +44 (0)207-581-8024 ------------------------------------------------------------------------ Assistant Warden, Brabazon House, Pimlico, 5 Moreton Street, London SW1V 2PN, UK tel: +44 (0)207-828-4733 fax: +44 (0)207-233-8105 ------------------------------------------------------------------------

Andrew Grimshaw wrote:
Is there a way to do condition statements in JSDL, in other words the implementation that is "executing" the JSDL can choose which of several options?
We used to have such a mechanism (called Profiles) but we dropped it because it didn't sit well with the space of things offered by WS-Agreement and was rather too complex for us to want to describe properly at the time. Such mechanisms are therefore outside the scope of jsdl 1.0 (i.e. properly done through extensibility.) Donal.

Donal, See my embedded comments <AG> ... </AG> -----Original Message----- From: Donal K. Fellows [mailto:donal.k.fellows@manchester.ac.uk] Sent: Wednesday, June 08, 2005 11:04 AM To: Andrew Grimshaw Cc: JSDL WG Subject: Re: [jsdl-wg] Question on file stage-in (fwd) Andrew Grimshaw wrote:
In the data staging elements there is a creation flag that indicates whether to over-write or append.
Often, one only wants to overwrite if the target and the source are different, for example if I am using a large data set that changes infrequently or several jobs on the same resource will use the "same" input file.
Is there a way to do conditional stage in?
My understanding is that the implementation is free to use some mechanism outside the JSDL scope to optimize downloads. From the JSDL perspective (not that this is the only valid perspective, of course) it doesn't matter all that much. Looking at the latest version of the spec though, I see that the value "jsdl:dontOverwrite" could be interpreted to mean "transfer only if not pre-existing". Of course, at that point you then have to worry about whether the two ends refer to the same data, but if you're piling the data into some job-specific dir that's IMO not a huge issue. <AG> Exactly, if I specify "don't overwrite" AND the base data has in fact "changed" ... then I may WANT to overwrite. </AG>
I know the document says
"More complex file transfers, for example, conditional transfers based on job termination status are out of scope. "
But we're talking about a major performance optimization.
Well, perhaps. But if we add it, we have to worry about systems that don't have a mechanism for doing this sort of thing in the first place. Indeed, the JSDL 1.0[*] spec leaves out many things that could be major performance wins (e.g. compressed data transfers) either because they're complicated in their own right, or because we decided to get a spec going *this* decade as opposed to the next one. :^) None of which means that we will not go back and revisit these issues once there is some more data and experience reports available on actual deployed implementations. In particular, as it is a spec put together by mainly compute guys, we know that the data-related part is in need of fleshing out in the future. We've not reached the end of the story. Maybe just the end of the chapter instead. ;^) <AG> I'm not suggesting changing anything at this late date ... though it will likely come up later. </AG> Donal. [*] As a side note, it should be done just about now as I write this. :)
participants (6)
-
Ali Anjomshoaa
-
Andreas Savva
-
Andrew Grimshaw
-
Donal K. Fellows
-
Michel Drescher
-
Steve McGough