Job State proposal made to SAGA-RG

Hi all, Per Marvin's comments ... Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February. http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html -- Chris

Chris: I don't understand why SAGA is proceeding with a job state modeling effort independent of BES. Can you explain? Ian. At 07:18 AM 4/20/2006 -0700, Christopher Smith wrote:
Hi all,
Per Marvin's comments ...
Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February.
http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html
-- Chris
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org.

Quoting [Ian Foster] (Apr 20 2006):
Chris: I don't understand why SAGA is proceeding with a job state modeling effort independent of BES. Can you explain? Ian.
Ah, but we don't :-) SAGA has a very simple state model (new, running, done, failed, unknown). However, the states have 'substates', which can be queried, and expose the complete BES state model. The simple model above only exposes those states which can be changed with SAGA calls. As saga calls get added, e.g. for hold, more stated might get exposed on that level. Please have a look at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/png00005.png I hope that state diagram makes that clearer. I am not sure if it is in sync with the current BES states. Another post to exactly that topic is at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00131.html Cheers, Andre.
At 07:18 AM 4/20/2006 -0700, Christopher Smith wrote:
Hi all, Per Marvin's comments ... Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February. [1]http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.htm l -- Chris
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: [2]www.ci.uchicago.edu. Globus Alliance: [3]www.globus.org.
References
1. http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html 2. http://www.ci.uchicago.edu/ 3. http://www.globus.org/
-- "So much time, so little to do..." -- Garfield

And I would encourage BES to pick up this model as well. I was going to make this comment on the BES document, if it ever would show up in public comment. -- Chris On 20/4/06 08:15, "Andre Merzky" <andre@merzky.net> wrote:
Quoting [Ian Foster] (Apr 20 2006):
Chris: I don't understand why SAGA is proceeding with a job state modeling effort independent of BES. Can you explain? Ian.
Ah, but we don't :-) SAGA has a very simple state model (new, running, done, failed, unknown). However, the states have 'substates', which can be queried, and expose the complete BES state model.
The simple model above only exposes those states which can be changed with SAGA calls. As saga calls get added, e.g. for hold, more stated might get exposed on that level.
Please have a look at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/png00005.png I hope that state diagram makes that clearer. I am not sure if it is in sync with the current BES states.
Another post to exactly that topic is at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00131.html
Cheers, Andre.
At 07:18 AM 4/20/2006 -0700, Christopher Smith wrote:
Hi all, Per Marvin's comments ... Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February.
[1]http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.htm l -- Chris
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: [2]www.ci.uchicago.edu. Globus Alliance: [3]www.globus.org.
References
1. http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html 2. http://www.ci.uchicago.edu/ 3. http://www.globus.org/

It's certainly very desirable that we end up with a single common model! We have the ESI model, that reflects experience in GRAM and Unicore; the BES model; and the SAGA model. Ian. At 11:08 AM 4/20/2006 -0700, Christopher Smith wrote:
And I would encourage BES to pick up this model as well. I was going to make this comment on the BES document, if it ever would show up in public comment.
-- Chris
On 20/4/06 08:15, "Andre Merzky" <andre@merzky.net> wrote:
Quoting [Ian Foster] (Apr 20 2006):
Chris: I don't understand why SAGA is proceeding with a job state modeling effort independent of BES. Can you explain? Ian.
Ah, but we don't :-) SAGA has a very simple state model (new, running, done, failed, unknown). However, the states have 'substates', which can be queried, and expose the complete BES state model.
The simple model above only exposes those states which can be changed with SAGA calls. As saga calls get added, e.g. for hold, more stated might get exposed on that level.
Please have a look at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/png00005.png I hope that state diagram makes that clearer. I am not sure if it is in sync with the current BES states.
Another post to exactly that topic is at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00131.html
Cheers, Andre.
At 07:18 AM 4/20/2006 -0700, Christopher Smith wrote:
Hi all, Per Marvin's comments ... Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February.
[1]http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.htm l -- Chris
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: [2]www.ci.uchicago.edu. Globus Alliance: [3]www.globus.org.
References
1. http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html 2. http://www.ci.uchicago.edu/ 3. http://www.globus.org/
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org.

And the SAGA model can be used (maybe with some refinement) to model all three of ESI, BES and SAGA state diagrams... You can probably guess which model I would vote for. :-) -- Chris On 20/4/06 11:11, "Ian Foster" <foster@mcs.anl.gov> wrote:
It's certainly very desirable that we end up with a single common model!
We have the ESI model, that reflects experience in GRAM and Unicore; the BES model; and the SAGA model.
Ian.
At 11:08 AM 4/20/2006 -0700, Christopher Smith wrote:
And I would encourage BES to pick up this model as well. I was going to make this comment on the BES document, if it ever would show up in public comment.
-- Chris
On 20/4/06 08:15, "Andre Merzky" <andre@merzky.net> wrote:
Quoting [Ian Foster] (Apr 20 2006):
Chris: I don't understand why SAGA is proceeding with a job state modeling
effort
independent of BES. Can you explain? Ian.
Ah, but we don't :-) SAGA has a very simple state model (new, running, done, failed, unknown). However, the states have 'substates', which can be queried, and expose the complete BES state model.
The simple model above only exposes those states which can be changed with SAGA calls. As saga calls get added, e.g. for hold, more stated might get exposed on that level.
Please have a look at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/png00005.png I hope that state diagram makes that clearer. I am not sure if it is in sync with the current BES states.
Another post to exactly that topic is at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00131.html
Cheers, Andre.
At 07:18 AM 4/20/2006 -0700, Christopher Smith wrote:
Hi all, Per Marvin's comments ... Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February.
[1]http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.htm
l -- Chris
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: [2]www.ci.uchicago.edu. Globus Alliance: [3]www.globus.org.
References
1. http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html 2. http://www.ci.uchicago.edu/ 3. http://www.globus.org/
Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu <http://www.ci.uchicago.edu/> . Globus Alliance: www.globus.org <http://www.globus.org/> .

that would be great At 11:26 AM 4/20/2006 -0700, Christopher Smith wrote:
And the SAGA model can be used (maybe with some refinement) to model all three of ESI, BES and SAGA state diagrams...
You can probably guess which model I would vote for. :-)
-- Chris
On 20/4/06 11:11, "Ian Foster" <foster@mcs.anl.gov> wrote:
It's certainly very desirable that we end up with a single common model!
We have the ESI model, that reflects experience in GRAM and Unicore; the BES model; and the SAGA model.
Ian.
At 11:08 AM 4/20/2006 -0700, Christopher Smith wrote: And I would encourage BES to pick up this model as well. I was going to make this comment on the BES document, if it ever would show up in public comment.
-- Chris
On 20/4/06 08:15, "Andre Merzky" <andre@merzky.net> wrote:
Quoting [Ian Foster] (Apr 20 2006):
Chris: I don't understand why SAGA is proceeding with a job state modeling effort independent of BES. Can you explain? Ian.
Ah, but we don't :-) SAGA has a very simple state model (new, running, done, failed, unknown). However, the states have 'substates', which can be queried, and expose the complete BES state model.
The simple model above only exposes those states which can be changed with SAGA calls. As saga calls get added, e.g. for hold, more stated might get exposed on that level.
Please have a look at
I hope that state diagram makes that clearer. I am not sure if it is in sync with the current BES states.
Another post to exactly that topic is at
Cheers, Andre.
At 07:18 AM 4/20/2006 -0700, Christopher Smith wrote:
Hi all, Per Marvin's comments ... Here is a pointer to the proposal for modelling job states that I
made
to the SAGA group last February.
l -- Chris
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: [2]www.ci.uchicago.edu. Globus Alliance: [3]www.globus.org.
References
1.
2. http://www.ci.uchicago.edu/ 3. <http://www.globus.org/>http://www.globus.org/
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu <http://www.ci.uchicago.edu/> . Globus Alliance: www.globus.org <http://www.globus.org/> .
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org.

If people are willing to spend some time discussing this at GGF17, I would take a stab at rendering both BES and ESI using the SAGA model, and send it out before we get to Tokyo. -- Chris On 20/4/06 15:28, "Ian Foster" <foster@mcs.anl.gov> wrote:
that would be great
At 11:26 AM 4/20/2006 -0700, Christopher Smith wrote:
And the SAGA model can be used (maybe with some refinement) to model all three of ESI, BES and SAGA state diagrams...
You can probably guess which model I would vote for. :-)
-- Chris
On 20/4/06 11:11, "Ian Foster" <foster@mcs.anl.gov> wrote:
It's certainly very desirable that we end up with a single common model!
We have the ESI model, that reflects experience in GRAM and Unicore; the BES model; and the SAGA model.
Ian.
At 11:08 AM 4/20/2006 -0700, Christopher Smith wrote: And I would encourage BES to pick up this model as well. I was going to make this comment on the BES document, if it ever would show up in public comment.
-- Chris
On 20/4/06 08:15, "Andre Merzky" <andre@merzky.net> wrote:
Quoting [Ian Foster] (Apr 20 2006):
Chris: I don't understand why SAGA is proceeding with a job state modeling
effort
independent of BES. Can you explain? Ian.
Ah, but we don't :-) SAGA has a very simple state model (new, running, done, failed, unknown). However, the states have 'substates', which can be queried, and expose the complete BES state model.
The simple model above only exposes those states which can be changed with SAGA calls. As saga calls get added, e.g. for hold, more stated might get exposed on that level.
Please have a look at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/png00005.png I hope that state diagram makes that clearer. I am not sure if it is in sync with the current BES states.
Another post to exactly that topic is at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00131.html
Cheers, Andre.
At 07:18 AM 4/20/2006 -0700, Christopher Smith wrote:
Hi all, Per Marvin's comments ... Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February.
[1]http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.htm
l -- Chris
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: [2]www.ci.uchicago.edu. Globus Alliance: [3]www.globus.org.
References
1. http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html 2. http://www.ci.uchicago.edu/ 3. http://www.globus.org/
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu <http://www.ci.uchicago.edu/> <http://www.ci.uchicago.edu/> . Globus Alliance: www.globus.org <http://www.globus.org/> <http://www.globus.org/> .
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu <http://www.ci.uchicago.edu/> . Globus Alliance: www.globus.org <http://www.globus.org/> .

Hi Chris, That's fine by me. At this Monday's OGSA-WG call, let's discuss if we can add this topic to OGSA-WG EMS session at GGF17 (Thursday, May 11, 3:45 pm - 5:15 pm ) Thanks, ---- Hiro Kishimoto Christopher Smith wrote:
If people are willing to spend some time discussing this at GGF17, I would take a stab at rendering both BES and ESI using the SAGA model, and send it out before we get to Tokyo.
-- Chris
On 20/4/06 15:28, "Ian Foster" <foster@mcs.anl.gov> wrote:
that would be great
At 11:26 AM 4/20/2006 -0700, Christopher Smith wrote:
And the SAGA model can be used (maybe with some refinement) to model all three of ESI, BES and SAGA state diagrams...
You can probably guess which model I would vote for. :-)
-- Chris
On 20/4/06 11:11, "Ian Foster" <foster@mcs.anl.gov> wrote:
It's certainly very desirable that we end up with a single common model!
We have the ESI model, that reflects experience in GRAM and Unicore; the BES model; and the SAGA model.
Ian.
At 11:08 AM 4/20/2006 -0700, Christopher Smith wrote: And I would encourage BES to pick up this model as well. I was going to make this comment on the BES document, if it ever would show up in public comment.
-- Chris
On 20/4/06 08:15, "Andre Merzky" <andre@merzky.net> wrote:
Quoting [Ian Foster] (Apr 20 2006):
> Chris: > I don't understand why SAGA is proceeding with a job state modeling effort > independent of BES. Can you explain? > Ian. Ah, but we don't :-) SAGA has a very simple state model (new, running, done, failed, unknown). However, the states have 'substates', which can be queried, and expose the complete BES state model.
The simple model above only exposes those states which can be changed with SAGA calls. As saga calls get added, e.g. for hold, more stated might get exposed on that level.
Please have a look at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/png00005.png I hope that state diagram makes that clearer. I am not sure if it is in sync with the current BES states.
Another post to exactly that topic is at http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00131.html
Cheers, Andre.
> At 07:18 AM 4/20/2006 -0700, Christopher Smith wrote: > > Hi all, > Per Marvin's comments ... > Here is a pointer to the proposal for modelling job states that I made > to > the SAGA group last February. > > [1]http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.htm > l > -- Chris > > _______________________________________________________________ > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: [2]www.ci.uchicago.edu. > Globus Alliance: [3]www.globus.org. > > References > > 1. > http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html > 2. http://www.ci.uchicago.edu/ > 3. http://www.globus.org/
Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu <http://www.ci.uchicago.edu/> <http://www.ci.uchicago.edu/> . Globus Alliance: www.globus.org <http://www.globus.org/> <http://www.globus.org/> .
_______________________________________________________________ Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu <http://www.ci.uchicago.edu/> . Globus Alliance: www.globus.org <http://www.globus.org/> .

The SAGA activities started earlier than BES, so has more in common with DRMAA. -- Chris On 20/4/06 08:02, "Ian Foster" <foster@mcs.anl.gov> wrote:
Chris:
I don't understand why SAGA is proceeding with a job state modeling effort independent of BES. Can you explain?
Ian.
At 07:18 AM 4/20/2006 -0700, Christopher Smith wrote:
Hi all,
Per Marvin's comments ...
Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February.
http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html
-- Chris
Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu <http://www.ci.uchicago.edu/> . Globus Alliance: www.globus.org <http://www.globus.org/> .

Christopher Smith wrote:
Hi all,
Per Marvin's comments ...
Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February.
http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html
-- Chris
Presumably suspend<-->resume is optional? That is, there are some things that cannot be suspended, or at least they suspend but cannot resume? (*) That is something that is not in the CDDLM model (which is based on the WSDM state model), because its a lot harder t to suspend things like a database and a server hosting many active connections. Its a lot easier to shut it down and redeploy later, relying on the application to be able to continue when it is redeployed. Which is a good idea for anything you want to be resilient. [1] If you have VM images you can suspend them, but at least as far as vmware is concerned, the apps don't get warned before and after, so they have a worse experience than on a laptop, where apps and drivers get warned that they are about to suspend and told that they have woken up. All you know about on a vmware hosted image is that the clock suddenly jumps and all your active TCP start throwing errors. You may have suspended, but the world still turns. (*) Even on ACPI laptops there is evidence of a de-facto suspend state, S6: the sleep from which laptops do not recover. [2] [1] http://swig.stanford.edu/~candea/papers/crashonly/ [2] http://www.hpl.hp.com/techreports/2000/HPL-2000-21.html .

Exactly. The thinking is that the base set of states will be fairly simple, and that capabilities such as suspend/resume will be described in extensions because it might not make sense for all implementations. -- Chris On 25/4/06 03:14, "Steve Loughran" <steve_loughran@hpl.hp.com> wrote:
Christopher Smith wrote:
Hi all,
Per Marvin's comments ...
Here is a pointer to the proposal for modelling job states that I made to the SAGA group last February.
http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html
-- Chris
Presumably suspend<-->resume is optional? That is, there are some things that cannot be suspended, or at least they suspend but cannot resume? (*)
That is something that is not in the CDDLM model (which is based on the WSDM state model), because its a lot harder t to suspend things like a database and a server hosting many active connections. Its a lot easier to shut it down and redeploy later, relying on the application to be able to continue when it is redeployed. Which is a good idea for anything you want to be resilient. [1]
If you have VM images you can suspend them, but at least as far as vmware is concerned, the apps don't get warned before and after, so they have a worse experience than on a laptop, where apps and drivers get warned that they are about to suspend and told that they have woken up. All you know about on a vmware hosted image is that the clock suddenly jumps and all your active TCP start throwing errors. You may have suspended, but the world still turns.
(*) Even on ACPI laptops there is evidence of a de-facto suspend state, S6: the sleep from which laptops do not recover. [2]
[1] http://swig.stanford.edu/~candea/papers/crashonly/ [2] http://www.hpl.hp.com/techreports/2000/HPL-2000-21.html .
participants (5)
-
Andre Merzky
-
Christopher Smith
-
Hiro Kishimoto
-
Ian Foster
-
Steve Loughran