Comments on the job API (revision 1.4)

Graeme Pound

3 Mar 2006 3 Mar '06

11:35 a.m.

Hi, I have some comments on the job section of the Strawman API (revision 1.4). Thanks Graeme -3.37 I do not understand the purpose of job_service.get_self() which returns a Job object. -3.38 I like the removal of the JobInfo and JobExitStatus objects from the API, and adding this information as attributes to Job. This streamlines the API, the concept of read only attributes also makes sense [to me] within the context of the Job object. -3.39 I like the simplification of the job_state enumeration. NB This is called 'state' in the SIDL and should be corrected to 'job_state'. -3.41 The separation of the JobService.submitJob() method into two methods JobService.createJob() [create the job object] and Job.run() [start the job] has not been seamless and there are several of conceptual and practical problems. This needs to be fine-tuned further. On many [all?] resource managers there is no separation between submitting a job to the resource manager and manually starting the job, this raises the following problem: - Job objects are identified via the job_id which is described as the "job identifier as returned by the resource manager". Unfortunately since this information will only be available upon submission of the job (via Job.run()) this breaks the methods JobService.list() and JobService.getJob(). It is now impossible to manage an index of Job objects within JobService based upon the job ID. - What is the conceptual relationship between the JobService and the resource manager? At present this is a little confused in a couple of ways. #1 Should there be a one to one relationship between JobService instances and resource managers; i.e. should the resource manager endpoint be specified in the JobService constructor (or otherwise as an argument to JobService.createJob())? #2 The term JobService implies a close relationship to the resource manager. Previously JobService.submitJob() corresponded to communication with the resource manager. Now JobService.createJob() corresponds to the creation of instances of the Job class the JobService is acting as a factory. It may beneficial to rename JobService to JobFactory to clarify the relationship. -3.42 Should Job objects created by runJob() be added to the index managed by the JobService? Or in other terms, should JobService.runJob() be a Java 'static' method?

Show replies by date

Graeme Pound

3 Mar 3 Mar

11:46 a.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

...

-3.42 Should Job objects created by runJob() be added to the index managed by the JobService? Or in other terms, should JobService.runJob() be a Java 'static' method?

Actually the first question here is independent of the second, but the first point should be clarified. Graeme

Christopher Smith

6 Mar 6 Mar

8:43 p.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

FWIW I see the JobService as being closely coupled to the resource manager, such that the constructor of the JobService could take the address of an RM endpoint. I'm also not sure of the usefulness of having separate createJob and run methods, rather than submitJob ... and I can't recall the conversations that lead to the distinction. :-) I do believe that jobs started with runJob are still "jobs" and should be added to indices. runJob is just a shortcut method (we had lots of discussions about whether to add it or not). -- Chris On 3/3/06 03:35, "Graeme Pound" <G.E.POUND@soton.ac.uk> wrote:

...

Hi,

I have some comments on the job section of the Strawman API (revision 1.4).

Thanks Graeme

-3.37 I do not understand the purpose of job_service.get_self() which returns a Job object.

-3.38 I like the removal of the JobInfo and JobExitStatus objects from the API, and adding this information as attributes to Job. This streamlines the API, the concept of read only attributes also makes sense [to me] within the context of the Job object.

-3.39 I like the simplification of the job_state enumeration. NB This is called 'state' in the SIDL and should be corrected to 'job_state'.

-3.41 The separation of the JobService.submitJob() method into two methods JobService.createJob() [create the job object] and Job.run() [start the job] has not been seamless and there are several of conceptual and practical problems. This needs to be fine-tuned further. On many [all?] resource managers there is no separation between submitting a job to the resource manager and manually starting the job, this raises the following problem: - Job objects are identified via the job_id which is described as the "job identifier as returned by the resource manager". Unfortunately since this information will only be available upon submission of the job (via Job.run()) this breaks the methods JobService.list() and JobService.getJob(). It is now impossible to manage an index of Job objects within JobService based upon the job ID. - What is the conceptual relationship between the JobService and the resource manager? At present this is a little confused in a couple of ways. #1 Should there be a one to one relationship between JobService instances and resource managers; i.e. should the resource manager endpoint be specified in the JobService constructor (or otherwise as an argument to JobService.createJob())? #2 The term JobService implies a close relationship to the resource manager. Previously JobService.submitJob() corresponded to communication with the resource manager. Now JobService.createJob() corresponds to the creation of instances of the Job class the JobService is acting as a factory. It may beneficial to rename JobService to JobFactory to clarify the relationship.

-3.42 Should Job objects created by runJob() be added to the index managed by the JobService? Or in other terms, should JobService.runJob() be a Java 'static' method?

Andre Merzky

9:22 p.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

Hi Chris, Quoting [Christopher Smith] (Mar 06 2006):

...

FWIW

I see the JobService as being closely coupled to the resource manager, such that the constructor of the JobService could take the address of an RM endpoint.

I'm also not sure of the usefulness of having separate createJob and run methods, rather than submitJob ... and I can't recall the conversations that lead to the distinction. :-)

The main reason is the required ability to handle large bulks of jobs: saga::task_container tc; for ( int i = 0; i < 100000; i++ ) { saga::job j = js.create_job (description); // remember: job implements task tc.add_task (j); } // ran these jobs as bulk: tc.run ();

...

I do believe that jobs started with runJob are still "jobs" and should be added to indices. runJob is just a shortcut method (we had lots of discussions about whether to add it or not).

Right, run_job is just a shortcut, avoiding the job_description altogether. Cheers, Andre.

...

-- Chris

On 3/3/06 03:35, "Graeme Pound" <G.E.POUND@soton.ac.uk> wrote:

...
Hi,

I have some comments on the job section of the Strawman API (revision 1.4).

Thanks Graeme

-3.37 I do not understand the purpose of job_service.get_self() which returns a Job object.

-3.38 I like the removal of the JobInfo and JobExitStatus objects from the API, and adding this information as attributes to Job. This streamlines the API, the concept of read only attributes also makes sense [to me] within the context of the Job object.

-3.39 I like the simplification of the job_state enumeration. NB This is called 'state' in the SIDL and should be corrected to 'job_state'.

-3.41 The separation of the JobService.submitJob() method into two methods JobService.createJob() [create the job object] and Job.run() [start the job] has not been seamless and there are several of conceptual and practical problems. This needs to be fine-tuned further. On many [all?] resource managers there is no separation between submitting a job to the resource manager and manually starting the job, this raises the following problem: - Job objects are identified via the job_id which is described as the "job identifier as returned by the resource manager". Unfortunately since this information will only be available upon submission of the job (via Job.run()) this breaks the methods JobService.list() and JobService.getJob(). It is now impossible to manage an index of Job objects within JobService based upon the job ID. - What is the conceptual relationship between the JobService and the resource manager? At present this is a little confused in a couple of ways. #1 Should there be a one to one relationship between JobService instances and resource managers; i.e. should the resource manager endpoint be specified in the JobService constructor (or otherwise as an argument to JobService.createJob())? #2 The term JobService implies a close relationship to the resource manager. Previously JobService.submitJob() corresponded to communication with the resource manager. Now JobService.createJob() corresponds to the creation of instances of the Job class the JobService is acting as a factory. It may beneficial to rename JobService to JobFactory to clarify the relationship.

-3.42 Should Job objects created by runJob() be added to the index managed by the JobService? Or in other terms, should JobService.runJob() be a Java 'static' method?

-- "So much time, so little to do..." -- Garfield

Andre Merzky

9:17 p.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

Quoting [Graeme Pound] (Mar 03 2006):

...

Hi,

I have some comments on the job section of the Strawman API (revision 1.4).

Great :-) Lets see...

...

Thanks Graeme

-3.37 I do not understand the purpose of job_service.get_self() which returns a Job object.

Use case is the following (pseudo code-ish: main () { saga::job_service js ("xyz"); saga::job me = js.get_self (); saga::job_status s = me.get_status (); // should be running ;-) // start to do some work // after a while, do something to MY instance me.migrate ("some new big resource"); return (0); } Basically, the job returned represents the application calling get_self, and allows to perform actions on that application (like suspending myself via the resource manager). Should make that more clear in the spec I guess... ;-)

...

-3.38 I like the removal of the JobInfo and JobExitStatus objects from the API, and adding this information as attributes to Job. This streamlines the API, the concept of read only attributes also makes sense [to me] within the context of the Job object.

great :-)

...

-3.39 I like the simplification of the job_state enumeration. NB This is called 'state' in the SIDL and should be corrected to 'job_state'.

Thanks, fixed.

...

-3.41 The separation of the JobService.submitJob() method into two methods JobService.createJob() [create the job object] and Job.run() [start the job] has not been seamless and there are several of conceptual and practical problems. This needs to be fine-tuned further. On many [all?] resource managers there is no separation between submitting a job to the resource manager and manually starting the job,

Interesting point. How does BES invision the use of its New state? I guess the job is then not bound to a resource manager in that state...

...

this raises the following problem: - Job objects are identified via the job_id which is described as the "job identifier as returned by the resource manager". Unfortunately since this information will only be available upon submission of the job (via Job.run()) this breaks the methods JobService.list() and JobService.getJob(). It is now impossible to manage an index of Job objects within JobService based upon the job ID.

Right, thats impossible. I think a job-id should only be assigned after the job got run(). Also, the job should not be listed before. But see below.

...

- What is the conceptual relationship between the JobService and the resource manager? At present this is a little confused in a couple of ways. #1 Should there be a one to one relationship between JobService instances and resource managers; i.e. should the resource manager endpoint be specified in the JobService constructor (or otherwise as an argument to JobService.createJob())?

You pointed the missing resource manager specification out before - and we concluded that it should be specified in the create-job method. That is what we have right now. As such, the relation job_service to RM would be 1:n (or even n:m I guess). However, I wonder how list_jobs is supposed to work: should the job_service query all _available_ RMs? Or should list also allow/require a RM to be specified? *scratch* The job should be associated to a RM only if it is running I guess (as you state above, the job-id and job-listing get useless oterwise). As I stated earlier, I have not much experience with RM. However, pondering about your comments, the following seems possible as well: 1) // we need job description saga::job_description js; // fill it... // keep job and job_service independend saga::job j (jdes); saga::job_service js ("RM"); // associate job and js js.submit_job (j); 2) // we need job description saga::job_description js; // fill it... // keep job and job_service independend saga::job j (jdes); saga::job_service js; // associate job and js, and RM js.submit_job (j, "RM"); Both versions would imply that a job_id is only available after submit. The js.run_job method would be unaffected (there seems no issue with that).

...

#2 The term JobService implies a close relationship to the resource manager. Previously JobService.submitJob() corresponded to communication with the resource manager. Now JobService.createJob() corresponds to the creation of instances of the Job class the JobService is acting as a factory. It may beneficial to rename JobService to JobFactory to clarify the relationship.

-3.42 Should Job objects created by runJob() be added to the index managed by the JobService?

Yes.

...

Or in other terms, should JobService.runJob() be a Java 'static' method?

Uhm, is that related? Cheers, Andre. -- "So much time, so little to do..." -- Garfield

Andre Merzky

9:54 p.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

Hi Graeme, I should think before I write: the code I sent is useless, as the delayed _run_ poses the problem (there is no submit_job as I used). Also, the run() must not take a parameter, as it comes from task. So, the association between job and job_service _has_ to happen on creation time. Anyway, as you rightly point out: a job_id won't be available before run got called: // we need job description saga::job_description js; // fill it... // create a job service, bound to a resource manager saga::job_service js ("RM"); // create a job instance which is not yet running. No // job id available (set to "Unknown" or so) saga::job j = js.create (jdes); // run the job, get a job_id from RM, put into // job list for the associated js j.run (); Sorry, Andre. Quoting [Andre Merzky] (Mar 06 2006):

...

As I stated earlier, I have not much experience with RM. However, pondering about your comments, the following seems possible as well:

1) // we need job description saga::job_description js; // fill it...

// keep job and job_service independend saga::job j (jdes); saga::job_service js ("RM");

// associate job and js js.submit_job (j);

2) // we need job description saga::job_description js; // fill it...

// keep job and job_service independend saga::job j (jdes); saga::job_service js;

// associate job and js, and RM js.submit_job (j, "RM");

Both versions would imply that a job_id is only available after submit. The js.run_job method would be unaffected (there seems no issue with that).

-- "So much time, so little to do..." -- Garfield

Graeme Pound

7 Mar 7 Mar

9:55 a.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

Andre, Some comments on get_self() below. Graeme Andre Merzky wrote:

...

Quoting [Graeme Pound] (Mar 03 2006): [...]

...
-3.37 I do not understand the purpose of job_service.get_self() which returns a Job object.

Use case is the following (pseudo code-ish:

main () { saga::job_service js ("xyz"); saga::job me = js.get_self ();

saga::job_status s = me.get_status (); // should be running ;-)

// start to do some work

// after a while, do something to MY instance me.migrate ("some new big resource");

return (0); }

Basically, the job returned represents the application calling get_self, and allows to perform actions on that application (like suspending myself via the resource manager).

Should make that more clear in the spec I guess... ;-)

I am afraid that I find this a little bizarre. As I understand it; job_service.get_self() returns a representation of the _local_ client application which has instantiated the job_service object (is that correct?). This would allow the client application to perform operations upon itself via the 'job' interface. This raises several questions (how and why), but I am unsure if my understanding is correct.

Andre Merzky

12:54 p.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

Hi Graeme, Quoting [Graeme Pound] (Mar 07 2006):

...

I am afraid that I find this a little bizarre.

As I understand it; job_service.get_self() returns a representation of the _local_ client application which has instantiated the job_service object (is that correct?). This would allow the client application to perform operations upon itself via the 'job' interface.

Exactly! :-) We call those applications to be 'Grid aware', but I am not sure if that is a good term. However, the app 'knows' it is running in a Grid, and can actively perform actions, also on itself. For example it can migrate itself, if there is need to do so (think agents). Also, it can spawn copies of itself, to perform some partial analysis (it gets the job object for self, it gets the job description from that, and resubmits that description with some changed parameters: ergo the app gets cloned). I think taht the concept opens a range of very dynamic scenarios...

...

This raises several questions (how and why), but I am unsure if my understanding is correct.

What issues does it raise? :-) Cheers, Andre. -- "So much time, so little to do..." -- Garfield

Graeme Pound

1:25 p.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

Andre Merzky wrote:

...

Hi Graeme,

Quoting [Graeme Pound] (Mar 07 2006):

...
I am afraid that I find this a little bizarre.

As I understand it; job_service.get_self() returns a representation of the _local_ client application which has instantiated the job_service object (is that correct?). This would allow the client application to perform operations upon itself via the 'job' interface.

Exactly! :-)

We call those applications to be 'Grid aware', but I am not sure if that is a good term. However, the app 'knows' it is running in a Grid, and can actively perform actions, also on itself. For example it can migrate itself, if there is need to do so (think agents). Also, it can spawn copies of itself, to perform some partial analysis (it gets the job object for self, it gets the job description from that, and resubmits that description with some changed parameters: ergo the app gets cloned).

I think taht the concept opens a range of very dynamic scenarios...

Andre, I really do not like this concept. It has been added as an aside to the jobmanagement package but opens a large can of worms. The practical problems that this poses to implementations of the API are huge. For starters; each SAGA implementation must provide an implementation of the Job interface specific to the resource which that implementation targets, a large amount of additional code would be required to perform operations on the _local_ client application. This must be beyond the scope of the jobmanagement package, which is otherwise well defined. Operations on the local client application can only be in a fraction (if any) of the SAGA use cases. This sort of thing departs from the concept of a *simple* API for the Grid. Graeme

Andre Merzky

8 Mar 8 Mar

11:20 a.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

Hi Graeme, Quoting [Graeme Pound] (Mar 07 2006):

...

Andre Merzky wrote:

...
Hi Graeme,

Quoting [Graeme Pound] (Mar 07 2006):

...
I am afraid that I find this a little bizarre.

As I understand it; job_service.get_self() returns a representation of the _local_ client application which has instantiated the job_service object (is that correct?). This would allow the client application to perform operations upon itself via the 'job' interface.

Exactly! :-)

We call those applications to be 'Grid aware', but I am not sure if that is a good term. However, the app 'knows' it is running in a Grid, and can actively perform actions, also on itself. For example it can migrate itself, if there is need to do so (think agents). Also, it can spawn copies of itself, to perform some partial analysis (it gets the job object for self, it gets the job description from that, and resubmits that description with some changed parameters: ergo the app gets cloned).

I think taht the concept opens a range of very dynamic scenarios...

Andre,

I really do not like this concept. It has been added as an aside to the jobmanagement package but opens a large can of worms.

The practical problems that this poses to implementations of the API are huge. For starters; each SAGA implementation must provide an implementation of the Job interface specific to the resource which that implementation targets, a large amount of additional code would be required to perform operations on the _local_ client application.

No, not at all: if your implementation talks to a resource manager, but that manager has no access to your application, e.g. because you running on a client where that RM has no access to, then you can't return a job in get_self, obviously! That should only be available if: - you application gets submitted via an RM - THAT application asks THAT rm for a handle for itself, in order to do things to itself. If you do a list_job on that RM, you would find that application anyway, and would be able to get a job handle for it. However, you have no chance to find YOUR application ID - thats the only missing link, which is solved by that method: get_self is a shortcut for getting my own job_id out of bound, then doing a list_jobs, and then getting the job handle for my own ID. Andre.

...

This must be beyond the scope of the jobmanagement package, which is otherwise well defined. Operations on the local client application can only be in a fraction (if any) of the SAGA use cases. This sort of thing departs from the concept of a *simple* API for the Grid.

Graeme

-- "So much time, so little to do..." -- Garfield

G.E.POUND＠soton.ac.uk

1:08 p.m.

New subject: [saga-rg] Comments on the job API (revision 1.4)

Quoting Andre Merzky <andre@merzky.net>:

...

Hi Graeme,

Quoting [Graeme Pound] (Mar 07 2006):

...
Andre Merzky wrote:

...
Hi Graeme,

Quoting [Graeme Pound] (Mar 07 2006):

...
I am afraid that I find this a little bizarre.

As I understand it; job_service.get_self() returns a representation

of

...
...
...
the _local_ client application which has instantiated the job_service object (is that correct?). This would allow the client application to perform operations upon itself via the 'job' interface.

Exactly! :-)

We call those applications to be 'Grid aware', but I am not sure if that is a good term. However, the app 'knows' it is running in a Grid, and can actively perform actions, also on itself. For example it can migrate itself, if there is need to do so (think agents). Also, it can spawn copies of itself, to perform some partial analysis (it gets the job object for self, it gets the job description from that, and resubmits that description with some changed parameters: ergo the app gets cloned).

I think taht the concept opens a range of very dynamic scenarios...

Andre,

I really do not like this concept. It has been added as an aside to the jobmanagement package but opens a large can of worms.

The practical problems that this poses to implementations of the API are huge. For starters; each SAGA implementation must provide an implementation of the Job interface specific to the resource which that implementation targets, a large amount of additional code would be required to perform operations on the _local_ client application.

No, not at all: if your implementation talks to a resource manager, but that manager has no access to your application, e.g. because you running on a client where that RM has no access to, then you can't return a job in get_self, obviously! That should only be available if:

- you application gets submitted via an RM - THAT application asks THAT rm for a handle for itself, in order to do things to itself.

If you do a list_job on that RM, you would find that application anyway, and would be able to get a job handle for it. However, you have no chance to find YOUR application ID - thats the only missing link, which is solved by that method:

get_self is a shortcut for getting my own job_id out of bound, then doing a list_jobs, and then getting the job handle for my own ID.

Andre.

Hmmm, Ok I am having a little difficultly imagining this working in practice. I certainly do not think that this a frequent scenario. I believe that this would be best left out of the API. Graeme

...

...
This must be beyond the scope of the jobmanagement package, which is otherwise well defined. Operations on the local client application can only be in a fraction (if any) of the SAGA use cases. This sort of thing departs from the concept of a *simple* API for the Grid.

Graeme

-- "So much time, so little to do..." -- Garfield

7055

Age (days ago)

7060

Last active (days ago)

List overview

Download

10 comments

4 participants

participants (4)

Andre Merzky
Christopher Smith
G.E.POUND＠soton.ac.uk
Graeme Pound