
Hallihallo! [For the attentive reader: right, the subject of the 'other' mail should have been "Jobs and GetState", not "Tasks and GetState" - sorry] Another comment I got about jobs in SAGA is: how is sandboxing supported? Can I at least determine if my job runs in a sandbox? Or at least what it's 'cwd' is? Does a job have a unique job ID I can use to identify it? (That question is related to the session persistency discussed in another thread I think). Any opinions? Cheers, Andre. -- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+

On 29/7/05 10:47, "Andre Merzky" <andre@merzky.net> wrote:
Another comment I got about jobs in SAGA is: how is sandboxing supported? Can I at least determine if my job runs in a sandbox? Or at least what it's 'cwd' is?
You can specify the job's CWD as submission time, but there is no attribute to retrieve to fill in this information. Perhaps it should be added to the JobInfo class? As for supporting sandboxes, what does that actually mean? In a chroot jail? With a restricted user id (whatever that means)? Why should I care? What's the use case?
Does a job have a unique job ID I can use to identify it? (That question is related to the session persistency discussed in another thread I think).
There is a getJobId method on the Job interface for this purpose. It's up to the backend to provide the ID, so uniqueness is not something SAGA can guarantee. -- Chris

Sorry Chris, late answer... Quoting [Christopher Smith] (Aug 04 2005):
On 29/7/05 10:47, "Andre Merzky" <andre@merzky.net> wrote:
Another comment I got about jobs in SAGA is: how is sandboxing supported? Can I at least determine if my job runs in a sandbox? Or at least what it's 'cwd' is?
You can specify the job's CWD as submission time, but there is no attribute to retrieve to fill in this information. Perhaps it should be added to the JobInfo class?
Yes, perhaps. I think the basic use case is: - you run a job - job write data file ./out.dat - job finishes - you want to retrieve the data file w/o knowing the cwd, you have trouble finding the file. Setting it beforehand does not help if the scheduler creates a sandbox. So adding that info to the jobinfo seems to make sense.
As for supporting sandboxes, what does that actually mean? In a chroot jail? With a restricted user id (whatever that means)? Why should I care? What's the use case?
I guess you are right: sandbox is by definition transparent to the end user, isn't it? So while it might be useful to know where your job runs (see above), it may no make sense to enforce sandboxing (either its used or it isn't - what can SAGA do about this? nothing).
Does a job have a unique job ID I can use to identify it? (That question is related to the session persistency discussed in another thread I think).
There is a getJobId method on the Job interface for this purpose. It's up to the backend to provide the ID, so uniqueness is not something SAGA can guarantee.
I semi-agree. For finding your job again, you need more then the backend job-id - you need also the contact point for the backend. Your SAGA implementation might know about that, so it may be able to create a 'better' job id. In GAT, we did that, and had the distinction between a Native-JobID (the backends), and GAT-JobID (globally unique). That might be overkill to mandate for SAGA at this point, unless we have a clear use case wanting so I guess. So, bottom line, I guess you are right, backend-id should be sufficient unless we run into problems with that. Thanks, Andre.
-- Chris
-- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+

On 11/8/05 13:31, "Andre Merzky" <andre@merzky.net> wrote:
Quoting [Christopher Smith] (Aug 04 2005):
On 29/7/05 10:47, "Andre Merzky" <andre@merzky.net> wrote:
Another comment I got about jobs in SAGA is: how is sandboxing supported? Can I at least determine if my job runs in a sandbox? Or at least what it's 'cwd' is?
You can specify the job's CWD as submission time, but there is no attribute to retrieve to fill in this information. Perhaps it should be added to the JobInfo class?
Yes, perhaps. I think the basic use case is:
- you run a job - job write data file ./out.dat - job finishes - you want to retrieve the data file
w/o knowing the cwd, you have trouble finding the file. Setting it beforehand does not help if the scheduler creates a sandbox.
So adding that info to the jobinfo seems to make sense.
Ok ... I'll add this to the JobInfo. It's on a list of updates I need to make to the docs ... that I haven't got to yet. *blush*
As for supporting sandboxes, what does that actually mean? In a chroot jail? With a restricted user id (whatever that means)? Why should I care? What's the use case?
I guess you are right: sandbox is by definition transparent to the end user, isn't it? So while it might be useful to know where your job runs (see above), it may no make sense to enforce sandboxing (either its used or it isn't - what can SAGA do about this? nothing).
Right. In a sandboxed environment, you basically need to a) stage files in and out in-line with the job using relative paths, or b) use some kind of "third party" storage service that you can then retrieve files from (basically you use fully qualified paths and service endpoints). I'm not sure there is much in between.
Does a job have a unique job ID I can use to identify it? (That question is related to the session persistency discussed in another thread I think).
There is a getJobId method on the Job interface for this purpose. It's up to the backend to provide the ID, so uniqueness is not something SAGA can guarantee.
I semi-agree. For finding your job again, you need more then the backend job-id - you need also the contact point for the backend. Your SAGA implementation might know about that, so it may be able to create a 'better' job id.
In GAT, we did that, and had the distinction between a Native-JobID (the backends), and GAT-JobID (globally unique). That might be overkill to mandate for SAGA at this point, unless we have a clear use case wanting so I guess.
So, bottom line, I guess you are right, backend-id should be sufficient unless we run into problems with that.
I think that the idea of a SAGA-JobID that is some kind of composite of the backend ID and some "SAGA decoration" is a good idea ... especially if a SAGA session is used to access multiple back ends. Generating global IDs within one implementation is easy enough, but do we want to take a stab at defining a format that all implementations should support? How hard do you think it would be? The idea is that two SAGA implementations (running concurrently) would have globally unique job id spaces. -- Chris

Sorry, late reply... Quoting [Christopher Smith] (Aug 12 2005):
As for supporting sandboxes, what does that actually mean? In a chroot jail? With a restricted user id (whatever that means)? Why should I care? What's the use case?
I guess you are right: sandbox is by definition transparent to the end user, isn't it? So while it might be useful to know where your job runs (see above), it may no make sense to enforce sandboxing (either its used or it isn't - what can SAGA do about this? nothing).
Right. In a sandboxed environment, you basically need to a) stage files in and out in-line with the job using relative paths, or b) use some kind of "third party" storage service that you can then retrieve files from (basically you use fully qualified paths and service endpoints).
I'm not sure there is much in between.
Right, I agree. So we should leave it as is.
Does a job have a unique job ID I can use to identify it? (That question is related to the session persistency discussed in another thread I think).
There is a getJobId method on the Job interface for this purpose. It's up to the backend to provide the ID, so uniqueness is not something SAGA can guarantee.
I semi-agree. For finding your job again, you need more then the backend job-id - you need also the contact point for the backend. Your SAGA implementation might know about that, so it may be able to create a 'better' job id.
In GAT, we did that, and had the distinction between a Native-JobID (the backends), and GAT-JobID (globally unique). That might be overkill to mandate for SAGA at this point, unless we have a clear use case wanting so I guess.
So, bottom line, I guess you are right, backend-id should be sufficient unless we run into problems with that.
I think that the idea of a SAGA-JobID that is some kind of composite of the backend ID and some "SAGA decoration" is a good idea ... especially if a SAGA session is used to access multiple back ends. Generating global IDs within one implementation is easy enough, but do we want to take a stab at defining a format that all implementations should support? How hard do you think it would be?
I think its difficult enough to leave it out of the spec - we kept away from backend dependend definition until now, and I think thats good. However, we could make a decent proposal, which should hold for the most common use cases, and which we should try to get included into the reference implementation. That would help a lot I think.
The idea is that two SAGA implementations (running concurrently) would have globally unique job id spaces.
I can think of two ways to create such ideas. A: create a unique string (MD5 or so) and make an external entity responsible for maintaining the mapping between that ID and the backend instance and native job id. One could also allow a non random string (e.g. a user specified name), but the naming collision problem is then moved into user space. B: combine backen-url and native jobID in a well defined (i.e. parsable) way. Possibly allow to add another part as user specific. <free string>-<backend url>-<nativeID> <MyJob>-<gram://www.test.net:1234/>-<SAD12412SDF> B seems simplier, and does not introduce an external dependency. The free string would allo the user to recognice the jobs - nice for browsing. Could default to the executable name or so. I am pretty sure it braks for some cases. E.g. the backend may have a moving URL, or may reuse ID's (as Unix does with pid's). However, as long as it is not mandatory, it might just help... my $0,02 Andre. -- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+

On 19/8/05 10:15, "Andre Merzky" <andre@merzky.net> wrote:
The idea is that two SAGA implementations (running concurrently) would have globally unique job id spaces.
I can think of two ways to create such ideas.
A: create a unique string (MD5 or so) and make an external entity responsible for maintaining the mapping between that ID and the backend instance and native job id. One could also allow a non random string (e.g. a user specified name), but the naming collision problem is then moved into user space.
B: combine backen-url and native jobID in a well defined (i.e. parsable) way. Possibly allow to add another part as user specific.
<free string>-<backend url>-<nativeID> <MyJob>-<gram://www.test.net:1234/>-<SAD12412SDF>
B seems simplier, and does not introduce an external dependency. The free string would allo the user to recognice the jobs - nice for browsing. Could default to the executable name or so.
I like this mechanism as well, although I'm ok with just <backend url>-<nativeID> without the user specified identifier (there are still job names after all).
I am pretty sure it braks for some cases. E.g. the backend may have a moving URL, or may reuse ID's (as Unix does with pid's). However, as long as it is not mandatory, it might just help...
You're right, but it would be useful for many cases. -- Chris

Hi Chris, Quoting [Christopher Smith] (Aug 19 2005):
On 19/8/05 10:15, "Andre Merzky" <andre@merzky.net> wrote:
The idea is that two SAGA implementations (running concurrently) would have globally unique job id spaces.
I can think of two ways to create such ideas.
A: create a unique string (MD5 or so) and make an external entity responsible for maintaining the mapping between that ID and the backend instance and native job id. One could also allow a non random string (e.g. a user specified name), but the naming collision problem is then moved into user space.
B: combine backen-url and native jobID in a well defined (i.e. parsable) way. Possibly allow to add another part as user specific.
<free string>-<backend url>-<nativeID> <MyJob>-<gram://www.test.net:1234/>-<SAD12412SDF>
B seems simplier, and does not introduce an external dependency. The free string would allo the user to recognice the jobs - nice for browsing. Could default to the executable name or so.
I like this mechanism as well, although I'm ok with just <backend url>-<nativeID> without the user specified identifier (there are > still job names after all).
Right, I agree.
I am pretty sure it braks for some cases. E.g. the backend may have a moving URL, or may reuse ID's (as Unix does with pid's). However, as long as it is not mandatory, it might just help...
You're right, but it would be useful for many cases.
So I guess we should include it, because it is simple, does not seem to break anything (its not mandatory, right?), and seems to allow a number of useful use cases. As its not mandatory, we should add it to the notes section I guess. Does anybody else on the list disagree? Thanks, Andre.
-- Chris
-- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+

On Fri, 19 Aug 2005, Andre Merzky wrote:
I am pretty sure it braks for some cases. E.g. the backend may have a moving URL, or may reuse ID's (as Unix does with pid's). However, as long as it is not mandatory, it might just help...
You're right, but it would be useful for many cases.
So I guess we should include it, because it is simple, does not seem to break anything (its not mandatory, right?), and seems to allow a number of useful use cases. As its not mandatory, we should add it to the notes section I guess.
Does anybody else on the list disagree?
I think it should be a hint for implementers, rather than required by the spec, as as far as end users are concerned it should be an opaque string. Tom

Hi Tom, Quoting [Tom Goodale] (Aug 25 2005):
On Fri, 19 Aug 2005, Andre Merzky wrote:
I am pretty sure it braks for some cases. E.g. the backend may have a moving URL, or may reuse ID's (as Unix does with pid's). However, as long as it is not mandatory, it might just help...
You're right, but it would be useful for many cases.
So I guess we should include it, because it is simple, does not seem to break anything (its not mandatory, right?), and seems to allow a number of useful use cases. As its not mandatory, we should add it to the notes section I guess.
Does anybody else on the list disagree?
I think it should be a hint for implementers, rather than required by the spec, as as far as end users are concerned it should be an opaque string.
Right, that is what we mean it to be, a hint. However, I think its ok to put that in the spec (as long as its clear that it is not normative). As the spec is used for developing, its the best place to have that type of information. The users guide should probably be silent on that issue, I agree. Cheers, Andre.
Tom
-- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+

On 25/8/05 07:29, "Tom Goodale" <goodale@cct.lsu.edu> wrote:
On Fri, 19 Aug 2005, Andre Merzky wrote:
I am pretty sure it braks for some cases. E.g. the backend may have a moving URL, or may reuse ID's (as Unix does with pid's). However, as long as it is not mandatory, it might just help...
You're right, but it would be useful for many cases.
So I guess we should include it, because it is simple, does not seem to break anything (its not mandatory, right?), and seems to allow a number of useful use cases. As its not mandatory, we should add it to the notes section I guess.
Does anybody else on the list disagree?
I think it should be a hint for implementers, rather than required by the spec, as as far as end users are concerned it should be an opaque string.
The benefit of exposing the format of the string is that users can use the back end system identifier directly with the back end system commands and APIs if they so choose. -- Chris

I misread .... I see ... users can just treat it as an opaque string (doesn't mean it doesn't have structure). -- Chris On 25/8/05 09:58, "Christopher Smith" <csmith@platform.com> wrote:
On 25/8/05 07:29, "Tom Goodale" <goodale@cct.lsu.edu> wrote:
On Fri, 19 Aug 2005, Andre Merzky wrote:
I am pretty sure it braks for some cases. E.g. the backend may have a moving URL, or may reuse ID's (as Unix does with pid's). However, as long as it is not mandatory, it might just help...
You're right, but it would be useful for many cases.
So I guess we should include it, because it is simple, does not seem to break anything (its not mandatory, right?), and seems to allow a number of useful use cases. As its not mandatory, we should add it to the notes section I guess.
Does anybody else on the list disagree?
I think it should be a hint for implementers, rather than required by the spec, as as far as end users are concerned it should be an opaque string.
The benefit of exposing the format of the string is that users can use the back end system identifier directly with the back end system commands and APIs if they so choose.
-- Chris
participants (3)
-
Andre Merzky
-
Christopher Smith
-
Tom Goodale