Hi group, It seems I am lacking some understanding of the SAGA job states, and would appreciate if someone with more knowledge about job management could answer some questions (Chris, are you still watching this list by any chance?) Attached to this mail is a draft of a SAGA job state diagram. For simplicity, I did not distinguish between UserHold and SystemHold, nor between UserSuspend and SystemSuspend. The arrows show possible state changes, the method names denote methods which can initiate state changes from API level. I think that diagram is wrong, isn't it? Well, here are my questions: - if we submit a job, its immediately Queued - is that right? Should it be pending before (e.g. as long as the queuing request travels the middleware layers)? - can the hold and suspend states reached only from 'Running', or from elsewhere as well? - What is the difference between 'Hold' and 'Suspend'? - Are there signals defined (apart from KILL) which shange the job state? I guess that is not as simple as saying SUSP does suspend - that state is probably defined by the scheduler, not by the OS... - What is the use case for distinguishing between UserHold and SystemHold, or between UserSuspend and SystemSuspend? I would like to understand these states, in particular in correspndence to the task states we have. Well, I put questions to that relation in another mail - to keep this mail 'short'. Thanks, Andre. -- "So much time, so little to do..." -- Garfield
Hi Andre, The Basic Execution Service WG has done some work on defining an activity state. You may want to have a look at that - there should be a copy on gridforge - somewhere! Steven -- ---------------------------------------------------------------- Dr Steven Newhouse Tel:+44 (0)2380 598789 Director, Open Middleware Infrastructure Institute-UK (OMII-UK) c/o Suite 6005, Faraday Building (B21), Highfield Campus, Southampton University, Highfield, Southampton, SO17 1BJ, UK
Nice, thanks! To save everybody from the need to search GrodForge, here is the table from the document: bes::job saga::task saga::job New Pending Queued Pending Running Queued StagingIn Running PreExecution ExecutionPending Running PreExecution Running Running Running ExecutionComplete Running PostExecution StagingOut Running PostExecution CleaningUp Running PostExecution ShuttingDown Running PostExecution Suspended Running Suspend Suspended Running SuspendUser Suspended Running SuspendSystem NotKnown Unknown Unknown Other Unknown Unknown Done Done DoneOK Terminated Failed DoneFail Exception Failed DoneFail HoldSystem HoldUser Hold In repspect to the original topic (task/job), these map fairly well to the task and job states we have, but are more detailed. I could not find an equivalent for Hold. Are there other models defined in GGF, or is that _THE_ model? Cheers, Andre. Quoting [Steven Newhouse] (Feb 05 2006):
Date: Sun, 05 Feb 2006 07:04:28 +0000 From: Steven Newhouse
To: Andre Merzky CC: Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... Hi Andre,
The Basic Execution Service WG has done some work on defining an activity state. You may want to have a look at that - there should be a copy on gridforge - somewhere!
Steven -- "So much time, so little to do..." -- Garfield
I had not had the time to review all of this, but I like to suggest that a "task" has also the option to be suspended or resumed. you also need an introspector to actually find out if for example a particular feature has been implemented or not. You ma be in the need to define NonSuspendableTasks in which you for example do not have resume or suspend options. This has actually implications on how we intend to use this as part of future portal developments. Hers the "tasks" could encapsulate a lotincluding file transfers and job submissions ... As a file transfer as wll as a job could be suspended its important that both have at least the ability to be supported in that way. Gregor Andre Merzky wrote:
Nice, thanks!
To save everybody from the need to search GrodForge, here is the table from the document:
bes::job saga::task saga::job
New Pending Queued Pending Running Queued StagingIn Running PreExecution ExecutionPending Running PreExecution Running Running Running ExecutionComplete Running PostExecution StagingOut Running PostExecution CleaningUp Running PostExecution ShuttingDown Running PostExecution Suspended Running Suspend Suspended Running SuspendUser Suspended Running SuspendSystem NotKnown Unknown Unknown Other Unknown Unknown Done Done DoneOK Terminated Failed DoneFail Exception Failed DoneFail HoldSystem HoldUser Hold
In repspect to the original topic (task/job), these map fairly well to the task and job states we have, but are more detailed. I could not find an equivalent for Hold.
Are there other models defined in GGF, or is that _THE_ model?
Cheers, Andre.
Quoting [Steven Newhouse] (Feb 05 2006):
Date: Sun, 05 Feb 2006 07:04:28 +0000 From: Steven Newhouse
To: Andre Merzky CC: Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... Hi Andre,
The Basic Execution Service WG has done some work on defining an activity state. You may want to have a look at that - there should be a copy on gridforge - somewhere!
Steven
Hi Gregor, for most discussions in SAGA, tasks have been considered as equivalent to threads (so exist purely for the sake of async operations). As such, we did not see the need for finer control (e.g. suspend or signaling). However, you are right, that might be too simplistic in the long run. Having suspend for tasks like file transfer, and also a resume, begs for the question of state of tasks. Also, as you mention, it requires support in the backend, and hence introspection. The use cases we have for asynchroneous operations right now are, AFAIK, - async RPC calls (not in the API right now) - async file and stream read/writes The use cases for both ops are not intended to run on a backend that supports suspend/resume. Although the saga task model now provides async. for almost _all_ operations, its not immediately clear to me what a suspend state would imply in terms of semantics to these calls. I think that the large file transfer example you cite is the most common use case with support for susp/resume (RFT). We do not have it explicitely in our use cases right now. The only obvious case in saga are jobs, where we have the separate suspend/resume methods. So, you are right, if we think of conversion between jobs and tasks, we should also consider suspends. It might be best to shelve that point and come back to it in the next API version. Well, that is a perfect task for the upcoming RG! :-P Well, having said that, I think its still useful to have the task API for jobs (not vice versa). As the task states and methods are a real subset of the job states and methods, that should be trivial (kind of) :-) Cheers, Andre. Quoting [Gregor von Laszewski] (Feb 05 2006):
I had not had the time to review all of this, but I like to suggest that a "task" has also the option to be suspended or resumed. you also need an introspector to actually find out if for example a particular feature has been implemented or not. You ma be in the need to define NonSuspendableTasks in which you for example do not have resume or suspend options. This has actually implications on how we intend to use this as part of future portal developments. Hers the "tasks" could encapsulate a lotincluding file transfers and job submissions ... As a file transfer as wll as a job could be suspended its important that both have at least the ability to be supported in that way.
Gregor
Andre Merzky wrote:
Nice, thanks!
To save everybody from the need to search GrodForge, here is the table from the document:
bes::job saga::task saga::job
New Pending Queued Pending Running Queued StagingIn Running PreExecution ExecutionPending Running PreExecution Running Running Running ExecutionComplete Running PostExecution StagingOut Running PostExecution CleaningUp Running PostExecution ShuttingDown Running PostExecution Suspended Running Suspend Suspended Running SuspendUser Suspended Running SuspendSystem NotKnown Unknown Unknown Other Unknown Unknown Done Done DoneOK Terminated Failed DoneFail Exception Failed DoneFail HoldSystem HoldUser Hold
In repspect to the original topic (task/job), these map fairly well to the task and job states we have, but are more detailed. I could not find an equivalent for Hold.
Are there other models defined in GGF, or is that _THE_ model?
Cheers, Andre.
Quoting [Steven Newhouse] (Feb 05 2006):
Date: Sun, 05 Feb 2006 07:04:28 +0000 From: Steven Newhouse
To: Andre Merzky CC: Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... Hi Andre,
The Basic Execution Service WG has done some work on defining an activity state. You may want to have a look at that - there should be a copy on gridforge - somewhere!
Steven
-- "So much time, so little to do..." -- Garfield
On 4/2/06 11:18, "Andre Merzky"
I think that diagram is wrong, isn't it? Well, here are my questions:
- if we submit a job, its immediately Queued - is that right? Should it be pending before (e.g. as long as the queuing request travels the middleware layers)?
To me, Queued is the same as Pending. Pending is probably a better word for this. Can't remember where the Queued name came from, as LSF uses PEND.
- can the hold and suspend states reached only from 'Running', or from elsewhere as well?
You can only go into a Hold state from Pending, I think, or directly into Hold on submission.
- What is the difference between 'Hold' and 'Suspend'?
A Hold state tells the scheduler/broker not to consider this job for scheduling/dispatch until the hold is explicitly released.
- Are there signals defined (apart from KILL) which shange the job state? I guess that is not as simple as saying SUSP does suspend - that state is probably defined by the scheduler, not by the OS...
Right ... this is implementation dependent on the mechanism used to suspend a job (might be a signal, might be some other mechanism). What is important is that there is an operation to initiate the state transition.
- What is the use case for distinguishing between UserHold and SystemHold, or between UserSuspend and SystemSuspend?
If I preempt workload, the system will put it into a SystemSuspend state that a user cannot cause a switch out of, otherwise a system may become oversubscribed due to the preempted and preempting jobs running at the same time. A UserSuspend can be entered and exited by the user, and is often used to hold processing to check progress, etc. By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names. -- Chris
Hi Chris, many thanks for the answers! :-)
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
I agree, kind of - I would say that the SAGA job state diagram should at _most_ be subset of the BES state diagram. It could be _S_implier :-) Cheers, Andre. Quoting [Christopher Smith] (Feb 10 2006):
Date: Fri, 10 Feb 2006 13:41:18 -0800 Subject: Re: [saga-rg] job states... From: Christopher Smith
To: Simple API for Grid Applications WG On 4/2/06 11:18, "Andre Merzky"
wrote: Ok ... I'll try to answer these, at least from my viewpoint.
I think that diagram is wrong, isn't it? Well, here are my questions:
- if we submit a job, its immediately Queued - is that right? Should it be pending before (e.g. as long as the queuing request travels the middleware layers)?
To me, Queued is the same as Pending. Pending is probably a better word for this. Can't remember where the Queued name came from, as LSF uses PEND.
- can the hold and suspend states reached only from 'Running', or from elsewhere as well?
You can only go into a Hold state from Pending, I think, or directly into Hold on submission.
- What is the difference between 'Hold' and 'Suspend'?
A Hold state tells the scheduler/broker not to consider this job for scheduling/dispatch until the hold is explicitly released.
- Are there signals defined (apart from KILL) which shange the job state? I guess that is not as simple as saying SUSP does suspend - that state is probably defined by the scheduler, not by the OS...
Right ... this is implementation dependent on the mechanism used to suspend a job (might be a signal, might be some other mechanism). What is important is that there is an operation to initiate the state transition.
- What is the use case for distinguishing between UserHold and SystemHold, or between UserSuspend and SystemSuspend?
If I preempt workload, the system will put it into a SystemSuspend state that a user cannot cause a switch out of, otherwise a system may become oversubscribed due to the preempted and preempting jobs running at the same time. A UserSuspend can be entered and exited by the user, and is often used to hold processing to check progress, etc.
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
-- Chris
-- "So much time, so little to do..." -- Garfield
What I meant by that comment is that where it is a subset, it should reflect
the BES terminology. I think that the number of states represented is enough
already. ;-)
-- Chris
On 10/2/06 17:30, "Andre Merzky"
Hi Chris,
many thanks for the answers! :-)
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
I agree, kind of - I would say that the SAGA job state diagram should at _most_ be subset of the BES state diagram. It could be _S_implier :-)
Cheers, Andre.
Quoting [Christopher Smith] (Feb 10 2006):
Date: Fri, 10 Feb 2006 13:41:18 -0800 Subject: Re: [saga-rg] job states... From: Christopher Smith
To: Simple API for Grid Applications WG On 4/2/06 11:18, "Andre Merzky"
wrote: Ok ... I'll try to answer these, at least from my viewpoint.
I think that diagram is wrong, isn't it? Well, here are my questions:
- if we submit a job, its immediately Queued - is that right? Should it be pending before (e.g. as long as the queuing request travels the middleware layers)?
To me, Queued is the same as Pending. Pending is probably a better word for this. Can't remember where the Queued name came from, as LSF uses PEND.
- can the hold and suspend states reached only from 'Running', or from elsewhere as well?
You can only go into a Hold state from Pending, I think, or directly into Hold on submission.
- What is the difference between 'Hold' and 'Suspend'?
A Hold state tells the scheduler/broker not to consider this job for scheduling/dispatch until the hold is explicitly released.
- Are there signals defined (apart from KILL) which shange the job state? I guess that is not as simple as saying SUSP does suspend - that state is probably defined by the scheduler, not by the OS...
Right ... this is implementation dependent on the mechanism used to suspend a job (might be a signal, might be some other mechanism). What is important is that there is an operation to initiate the state transition.
- What is the use case for distinguishing between UserHold and SystemHold, or between UserSuspend and SystemSuspend?
If I preempt workload, the system will put it into a SystemSuspend state that a user cannot cause a switch out of, otherwise a system may become oversubscribed due to the preempted and preempting jobs running at the same time. A UserSuspend can be entered and exited by the user, and is often used to hold processing to check progress, etc.
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
-- Chris
Quoting [Christopher Smith] (Feb 11 2006):
What I meant by that comment is that where it is a subset, it should reflect the BES terminology. I think that the number of states represented is enough already. ;-)
Would it make sense to just copy the BES state diagram? It did not exist when we (== you ;-) drafted the SAGA job states - if it would have been around then, we might have had copied it already. Apart from the SystemXXX/UserXXX states, and from Hold, it is not that much different from the SAGA model anyway. Cheers, Andre.
-- Chris
On 10/2/06 17:30, "Andre Merzky"
wrote: Hi Chris,
many thanks for the answers! :-)
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
I agree, kind of - I would say that the SAGA job state diagram should at _most_ be subset of the BES state diagram. It could be _S_implier :-)
Cheers, Andre.
Quoting [Christopher Smith] (Feb 10 2006):
Date: Fri, 10 Feb 2006 13:41:18 -0800 Subject: Re: [saga-rg] job states... From: Christopher Smith
To: Simple API for Grid Applications WG On 4/2/06 11:18, "Andre Merzky"
wrote: Ok ... I'll try to answer these, at least from my viewpoint.
I think that diagram is wrong, isn't it? Well, here are my questions:
- if we submit a job, its immediately Queued - is that right? Should it be pending before (e.g. as long as the queuing request travels the middleware layers)?
To me, Queued is the same as Pending. Pending is probably a better word for this. Can't remember where the Queued name came from, as LSF uses PEND.
- can the hold and suspend states reached only from 'Running', or from elsewhere as well?
You can only go into a Hold state from Pending, I think, or directly into Hold on submission.
- What is the difference between 'Hold' and 'Suspend'?
A Hold state tells the scheduler/broker not to consider this job for scheduling/dispatch until the hold is explicitly released.
- Are there signals defined (apart from KILL) which shange the job state? I guess that is not as simple as saying SUSP does suspend - that state is probably defined by the scheduler, not by the OS...
Right ... this is implementation dependent on the mechanism used to suspend a job (might be a signal, might be some other mechanism). What is important is that there is an operation to initiate the state transition.
- What is the use case for distinguishing between UserHold and SystemHold, or between UserSuspend and SystemSuspend?
If I preempt workload, the system will put it into a SystemSuspend state that a user cannot cause a switch out of, otherwise a system may become oversubscribed due to the preempted and preempting jobs running at the same time. A UserSuspend can be entered and exited by the user, and is often used to hold processing to check progress, etc.
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
-- Chris
-- "So much time, so little to do..." -- Garfield
It makes sense to keep the state models in sync.
-- Chris
On 10/2/06 18:26, "Andre Merzky"
Quoting [Christopher Smith] (Feb 11 2006):
What I meant by that comment is that where it is a subset, it should reflect the BES terminology. I think that the number of states represented is enough already. ;-)
Would it make sense to just copy the BES state diagram?
It did not exist when we (== you ;-) drafted the SAGA job states - if it would have been around then, we might have had copied it already.
Apart from the SystemXXX/UserXXX states, and from Hold, it is not that much different from the SAGA model anyway.
Cheers, Andre.
-- Chris
On 10/2/06 17:30, "Andre Merzky"
wrote: Hi Chris,
many thanks for the answers! :-)
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
I agree, kind of - I would say that the SAGA job state diagram should at _most_ be subset of the BES state diagram. It could be _S_implier :-)
Cheers, Andre.
Quoting [Christopher Smith] (Feb 10 2006):
Date: Fri, 10 Feb 2006 13:41:18 -0800 Subject: Re: [saga-rg] job states... From: Christopher Smith
To: Simple API for Grid Applications WG On 4/2/06 11:18, "Andre Merzky"
wrote: Ok ... I'll try to answer these, at least from my viewpoint.
I think that diagram is wrong, isn't it? Well, here are my questions:
- if we submit a job, its immediately Queued - is that right? Should it be pending before (e.g. as long as the queuing request travels the middleware layers)?
To me, Queued is the same as Pending. Pending is probably a better word for this. Can't remember where the Queued name came from, as LSF uses PEND.
- can the hold and suspend states reached only from 'Running', or from elsewhere as well?
You can only go into a Hold state from Pending, I think, or directly into Hold on submission.
- What is the difference between 'Hold' and 'Suspend'?
A Hold state tells the scheduler/broker not to consider this job for scheduling/dispatch until the hold is explicitly released.
- Are there signals defined (apart from KILL) which shange the job state? I guess that is not as simple as saying SUSP does suspend - that state is probably defined by the scheduler, not by the OS...
Right ... this is implementation dependent on the mechanism used to suspend a job (might be a signal, might be some other mechanism). What is important is that there is an operation to initiate the state transition.
- What is the use case for distinguishing between UserHold and SystemHold, or between UserSuspend and SystemSuspend?
If I preempt workload, the system will put it into a SystemSuspend state that a user cannot cause a switch out of, otherwise a system may become oversubscribed due to the preempted and preempting jobs running at the same time. A UserSuspend can be entered and exited by the user, and is often used to hold processing to check progress, etc.
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
-- Chris
Ok, then I'll do that in the strawman. I would appreciate if you could glance over it after commit, for a sanity check. Thanks, Andre. Quoting [Christopher Smith] (Feb 11 2006):
It makes sense to keep the state models in sync.
-- Chris
On 10/2/06 18:26, "Andre Merzky"
wrote: Quoting [Christopher Smith] (Feb 11 2006):
What I meant by that comment is that where it is a subset, it should reflect the BES terminology. I think that the number of states represented is enough already. ;-)
Would it make sense to just copy the BES state diagram?
It did not exist when we (== you ;-) drafted the SAGA job states - if it would have been around then, we might have had copied it already.
Apart from the SystemXXX/UserXXX states, and from Hold, it is not that much different from the SAGA model anyway.
Cheers, Andre.
-- Chris
On 10/2/06 17:30, "Andre Merzky"
wrote: Hi Chris,
many thanks for the answers! :-)
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
I agree, kind of - I would say that the SAGA job state diagram should at _most_ be subset of the BES state diagram. It could be _S_implier :-)
Cheers, Andre.
Quoting [Christopher Smith] (Feb 10 2006):
Date: Fri, 10 Feb 2006 13:41:18 -0800 Subject: Re: [saga-rg] job states... From: Christopher Smith
To: Simple API for Grid Applications WG On 4/2/06 11:18, "Andre Merzky"
wrote: Ok ... I'll try to answer these, at least from my viewpoint.
I think that diagram is wrong, isn't it? Well, here are my questions:
- if we submit a job, its immediately Queued - is that right? Should it be pending before (e.g. as long as the queuing request travels the middleware layers)?
To me, Queued is the same as Pending. Pending is probably a better word for this. Can't remember where the Queued name came from, as LSF uses PEND.
- can the hold and suspend states reached only from 'Running', or from elsewhere as well?
You can only go into a Hold state from Pending, I think, or directly into Hold on submission.
- What is the difference between 'Hold' and 'Suspend'?
A Hold state tells the scheduler/broker not to consider this job for scheduling/dispatch until the hold is explicitly released.
- Are there signals defined (apart from KILL) which shange the job state? I guess that is not as simple as saying SUSP does suspend - that state is probably defined by the scheduler, not by the OS...
Right ... this is implementation dependent on the mechanism used to suspend a job (might be a signal, might be some other mechanism). What is important is that there is an operation to initiate the state transition.
- What is the use case for distinguishing between UserHold and SystemHold, or between UserSuspend and SystemSuspend?
If I preempt workload, the system will put it into a SystemSuspend state that a user cannot cause a switch out of, otherwise a system may become oversubscribed due to the preempted and preempting jobs running at the same time. A UserSuspend can be entered and exited by the user, and is often used to hold processing to check progress, etc.
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
-- Chris
-- "So much time, so little to do..." -- Garfield
Sure.
As mentioned ... I think maybe supporting a subset of BES is ok. Much of the
state model wrt file transfer state modelling I think is not required for
SAGA.
-- Chris
On 10/2/06 18:46, "Andre Merzky"
Ok, then I'll do that in the strawman. I would appreciate if you could glance over it after commit, for a sanity check.
Thanks, Andre.
Quoting [Christopher Smith] (Feb 11 2006):
It makes sense to keep the state models in sync.
-- Chris
On 10/2/06 18:26, "Andre Merzky"
wrote: Quoting [Christopher Smith] (Feb 11 2006):
What I meant by that comment is that where it is a subset, it should reflect the BES terminology. I think that the number of states represented is enough already. ;-)
Would it make sense to just copy the BES state diagram?
It did not exist when we (== you ;-) drafted the SAGA job states - if it would have been around then, we might have had copied it already.
Apart from the SystemXXX/UserXXX states, and from Hold, it is not that much different from the SAGA model anyway.
Cheers, Andre.
-- Chris
On 10/2/06 17:30, "Andre Merzky"
wrote: Hi Chris,
many thanks for the answers! :-)
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
I agree, kind of - I would say that the SAGA job state diagram should at _most_ be subset of the BES state diagram. It could be _S_implier :-)
Cheers, Andre.
Quoting [Christopher Smith] (Feb 10 2006):
Date: Fri, 10 Feb 2006 13:41:18 -0800 Subject: Re: [saga-rg] job states... From: Christopher Smith
To: Simple API for Grid Applications WG On 4/2/06 11:18, "Andre Merzky"
wrote: Ok ... I'll try to answer these, at least from my viewpoint.
> > I think that diagram is wrong, isn't it? Well, here are my > questions: > > - if we submit a job, its immediately Queued - is that > right? Should it be pending before (e.g. as long as the > queuing request travels the middleware layers)? > To me, Queued is the same as Pending. Pending is probably a better word for this. Can't remember where the Queued name came from, as LSF uses PEND.
> - can the hold and suspend states reached only from > 'Running', or from elsewhere as well? > You can only go into a Hold state from Pending, I think, or directly into Hold on submission.
> - What is the difference between 'Hold' and 'Suspend'? > A Hold state tells the scheduler/broker not to consider this job for scheduling/dispatch until the hold is explicitly released.
> - Are there signals defined (apart from KILL) which shange > the job state? I guess that is not as simple as saying > SUSP does suspend - that state is probably defined by > the scheduler, not by the OS... > Right ... this is implementation dependent on the mechanism used to suspend a job (might be a signal, might be some other mechanism). What is important is that there is an operation to initiate the state transition.
> - What is the use case for distinguishing between UserHold > and SystemHold, or between UserSuspend and > SystemSuspend? > If I preempt workload, the system will put it into a SystemSuspend state that a user cannot cause a switch out of, otherwise a system may become oversubscribed due to the preempted and preempting jobs running at the same time. A UserSuspend can be entered and exited by the user, and is often used to hold processing to check progress, etc.
By the way ... I believe that the state diagram should at least be a subset of the BES state diagram ... we should adopt the same names.
-- Chris
I agree - the file transfer state models are needed for SAGA. We don't have any actions on these states anyway. Andre. Quoting [Christopher Smith] (Feb 11 2006):
Sure.
As mentioned ... I think maybe supporting a subset of BES is ok. Much of the state model wrt file transfer state modelling I think is not required for SAGA.
-- Chris
On 10/2/06 18:46, "Andre Merzky"
wrote: Ok, then I'll do that in the strawman. I would appreciate if you could glance over it after commit, for a sanity check.
Thanks, Andre.
Quoting [Christopher Smith] (Feb 11 2006):
It makes sense to keep the state models in sync.
-- Chris
On 10/2/06 18:26, "Andre Merzky"
wrote: Quoting [Christopher Smith] (Feb 11 2006):
What I meant by that comment is that where it is a subset, it should reflect the BES terminology. I think that the number of states represented is enough already. ;-)
Would it make sense to just copy the BES state diagram?
It did not exist when we (== you ;-) drafted the SAGA job states - if it would have been around then, we might have had copied it already.
Apart from the SystemXXX/UserXXX states, and from Hold, it is not that much different from the SAGA model anyway.
Cheers, Andre.
-- Chris
On 10/2/06 17:30, "Andre Merzky"
wrote: Hi Chris,
many thanks for the answers! :-)
> By the way ... I believe that the state diagram should at least be a > subset > of the BES state diagram ... we should adopt the same names.
I agree, kind of - I would say that the SAGA job state diagram should at _most_ be subset of the BES state diagram. It could be _S_implier :-)
Cheers, Andre.
Quoting [Christopher Smith] (Feb 10 2006): > Date: Fri, 10 Feb 2006 13:41:18 -0800 > Subject: Re: [saga-rg] job states... > From: Christopher Smith
> To: Simple API for Grid Applications WG > > On 4/2/06 11:18, "Andre Merzky" wrote: > > Ok ... I'll try to answer these, at least from my viewpoint. > >> >> I think that diagram is wrong, isn't it? Well, here are my >> questions: >> >> - if we submit a job, its immediately Queued - is that >> right? Should it be pending before (e.g. as long as the >> queuing request travels the middleware layers)? >> > To me, Queued is the same as Pending. Pending is probably a better word > for > this. Can't remember where the Queued name came from, as LSF uses PEND. > >> - can the hold and suspend states reached only from >> 'Running', or from elsewhere as well? >> > You can only go into a Hold state from Pending, I think, or directly into > Hold on submission. > >> - What is the difference between 'Hold' and 'Suspend'? >> > A Hold state tells the scheduler/broker not to consider this job for > scheduling/dispatch until the hold is explicitly released. > >> - Are there signals defined (apart from KILL) which shange >> the job state? I guess that is not as simple as saying >> SUSP does suspend - that state is probably defined by >> the scheduler, not by the OS... >> > Right ... this is implementation dependent on the mechanism used to > suspend > a job (might be a signal, might be some other mechanism). What is > important > is that there is an operation to initiate the state transition. > >> - What is the use case for distinguishing between UserHold >> and SystemHold, or between UserSuspend and >> SystemSuspend? >> > If I preempt workload, the system will put it into a SystemSuspend state > that a user cannot cause a switch out of, otherwise a system may become > oversubscribed due to the preempted and preempting jobs running at the > same > time. A UserSuspend can be entered and exited by the user, and is often > used > to hold processing to check progress, etc. > > > By the way ... I believe that the state diagram should at least be a > subset > of the BES state diagram ... we should adopt the same names. > > -- Chris
-- "So much time, so little to do..." -- Garfield
Curious question: SAGA should align job states with both BES and DRMAA ? Are they the same to begin with? Thilo On Sat, Feb 11, 2006 at 03:54:16AM +0100, Andre Merzky wrote:
X-Original-To: kielmann@localhost Delivered-To: kielmann@localhost.cs.vu.nl Delivered-To: grdfm-saga-rg-outgoing@mailbouncer.mcs.anl.gov X-Original-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Delivered-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Date: Sat, 11 Feb 2006 03:54:16 +0100 From: Andre Merzky
To: Christopher Smith Cc: Andre Merzky , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov I agree - the file transfer state models are needed for SAGA. We don't have any actions on these states anyway.
Andre.
Quoting [Christopher Smith] (Feb 11 2006):
Sure.
As mentioned ... I think maybe supporting a subset of BES is ok. Much of the state model wrt file transfer state modelling I think is not required for SAGA.
-- Chris
On 10/2/06 18:46, "Andre Merzky"
wrote: Ok, then I'll do that in the strawman. I would appreciate if you could glance over it after commit, for a sanity check.
Thanks, Andre.
Quoting [Christopher Smith] (Feb 11 2006):
It makes sense to keep the state models in sync.
-- Chris
On 10/2/06 18:26, "Andre Merzky"
wrote: Quoting [Christopher Smith] (Feb 11 2006):
What I meant by that comment is that where it is a subset, it should reflect the BES terminology. I think that the number of states represented is enough already. ;-)
Would it make sense to just copy the BES state diagram?
It did not exist when we (== you ;-) drafted the SAGA job states - if it would have been around then, we might have had copied it already.
Apart from the SystemXXX/UserXXX states, and from Hold, it is not that much different from the SAGA model anyway.
Cheers, Andre.
-- Chris
On 10/2/06 17:30, "Andre Merzky"
wrote: > Hi Chris, > > many thanks for the answers! :-) > >> By the way ... I believe that the state diagram should at least be a >> subset >> of the BES state diagram ... we should adopt the same names. > > I agree, kind of - I would say that the SAGA job state > diagram should at _most_ be subset of the BES state diagram. > It could be _S_implier :-) > > Cheers, Andre. > > > Quoting [Christopher Smith] (Feb 10 2006): >> Date: Fri, 10 Feb 2006 13:41:18 -0800 >> Subject: Re: [saga-rg] job states... >> From: Christopher Smith
>> To: Simple API for Grid Applications WG >> >> On 4/2/06 11:18, "Andre Merzky" wrote: >> >> Ok ... I'll try to answer these, at least from my viewpoint. >> >>> >>> I think that diagram is wrong, isn't it? Well, here are my >>> questions: >>> >>> - if we submit a job, its immediately Queued - is that >>> right? Should it be pending before (e.g. as long as the >>> queuing request travels the middleware layers)? >>> >> To me, Queued is the same as Pending. Pending is probably a better word >> for >> this. Can't remember where the Queued name came from, as LSF uses PEND. >> >>> - can the hold and suspend states reached only from >>> 'Running', or from elsewhere as well? >>> >> You can only go into a Hold state from Pending, I think, or directly into >> Hold on submission. >> >>> - What is the difference between 'Hold' and 'Suspend'? >>> >> A Hold state tells the scheduler/broker not to consider this job for >> scheduling/dispatch until the hold is explicitly released. >> >>> - Are there signals defined (apart from KILL) which shange >>> the job state? I guess that is not as simple as saying >>> SUSP does suspend - that state is probably defined by >>> the scheduler, not by the OS... >>> >> Right ... this is implementation dependent on the mechanism used to >> suspend >> a job (might be a signal, might be some other mechanism). What is >> important >> is that there is an operation to initiate the state transition. >> >>> - What is the use case for distinguishing between UserHold >>> and SystemHold, or between UserSuspend and >>> SystemSuspend? >>> >> If I preempt workload, the system will put it into a SystemSuspend state >> that a user cannot cause a switch out of, otherwise a system may become >> oversubscribed due to the preempted and preempting jobs running at the >> same >> time. A UserSuspend can be entered and exited by the user, and is often >> used >> to hold processing to check progress, etc. >> >> >> By the way ... I believe that the state diagram should at least be a >> subset >> of the BES state diagram ... we should adopt the same names. >> >> -- Chris > > -- "So much time, so little to do..." -- Garfield
-- Thilo Kielmann http://www.cs.vu.nl/~kielmann/
No, they are different, unfortunately. The DRMAA states are closer to the original SAGA states. However, DRMAA also spece'd their states before BES, so that is not surprising. It would be interesting though why BES came up with a new model at all, or if they new about the DRMAA model. Cheers, Andre. Quoting [Thilo Kielmann] (Feb 12 2006):
Date: Sun, 12 Feb 2006 08:08:58 +0100 From: Thilo Kielmann
To: Andre Merzky Cc: Christopher Smith , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... Curious question:
SAGA should align job states with both BES and DRMAA ? Are they the same to begin with?
Thilo
On Sat, Feb 11, 2006 at 03:54:16AM +0100, Andre Merzky wrote:
X-Original-To: kielmann@localhost Delivered-To: kielmann@localhost.cs.vu.nl Delivered-To: grdfm-saga-rg-outgoing@mailbouncer.mcs.anl.gov X-Original-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Delivered-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Date: Sat, 11 Feb 2006 03:54:16 +0100 From: Andre Merzky
To: Christopher Smith Cc: Andre Merzky , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov I agree - the file transfer state models are needed for SAGA. We don't have any actions on these states anyway.
Andre.
Quoting [Christopher Smith] (Feb 11 2006):
Sure.
As mentioned ... I think maybe supporting a subset of BES is ok. Much of the state model wrt file transfer state modelling I think is not required for SAGA.
-- Chris
On 10/2/06 18:46, "Andre Merzky"
wrote: Ok, then I'll do that in the strawman. I would appreciate if you could glance over it after commit, for a sanity check.
Thanks, Andre.
Quoting [Christopher Smith] (Feb 11 2006):
It makes sense to keep the state models in sync.
-- Chris
On 10/2/06 18:26, "Andre Merzky"
wrote: Quoting [Christopher Smith] (Feb 11 2006): > > What I meant by that comment is that where it is a subset, it should > reflect > the BES terminology. I think that the number of states represented is > enough > already. ;-)
Would it make sense to just copy the BES state diagram?
It did not exist when we (== you ;-) drafted the SAGA job states - if it would have been around then, we might have had copied it already.
Apart from the SystemXXX/UserXXX states, and from Hold, it is not that much different from the SAGA model anyway.
Cheers, Andre.
> -- Chris > > > On 10/2/06 17:30, "Andre Merzky"
wrote: > >> Hi Chris, >> >> many thanks for the answers! :-) >> >>> By the way ... I believe that the state diagram should at least be a >>> subset >>> of the BES state diagram ... we should adopt the same names. >> >> I agree, kind of - I would say that the SAGA job state >> diagram should at _most_ be subset of the BES state diagram. >> It could be _S_implier :-) >> >> Cheers, Andre. >> >> >> Quoting [Christopher Smith] (Feb 10 2006): >>> Date: Fri, 10 Feb 2006 13:41:18 -0800 >>> Subject: Re: [saga-rg] job states... >>> From: Christopher Smith >>> To: Simple API for Grid Applications WG >>> >>> On 4/2/06 11:18, "Andre Merzky" wrote: >>> >>> Ok ... I'll try to answer these, at least from my viewpoint. >>> >>>> >>>> I think that diagram is wrong, isn't it? Well, here are my >>>> questions: >>>> >>>> - if we submit a job, its immediately Queued - is that >>>> right? Should it be pending before (e.g. as long as the >>>> queuing request travels the middleware layers)? >>>> >>> To me, Queued is the same as Pending. Pending is probably a better word >>> for >>> this. Can't remember where the Queued name came from, as LSF uses PEND. >>> >>>> - can the hold and suspend states reached only from >>>> 'Running', or from elsewhere as well? >>>> >>> You can only go into a Hold state from Pending, I think, or directly into >>> Hold on submission. >>> >>>> - What is the difference between 'Hold' and 'Suspend'? >>>> >>> A Hold state tells the scheduler/broker not to consider this job for >>> scheduling/dispatch until the hold is explicitly released. >>> >>>> - Are there signals defined (apart from KILL) which shange >>>> the job state? I guess that is not as simple as saying >>>> SUSP does suspend - that state is probably defined by >>>> the scheduler, not by the OS... >>>> >>> Right ... this is implementation dependent on the mechanism used to >>> suspend >>> a job (might be a signal, might be some other mechanism). What is >>> important >>> is that there is an operation to initiate the state transition. >>> >>>> - What is the use case for distinguishing between UserHold >>>> and SystemHold, or between UserSuspend and >>>> SystemSuspend? >>>> >>> If I preempt workload, the system will put it into a SystemSuspend state >>> that a user cannot cause a switch out of, otherwise a system may become >>> oversubscribed due to the preempted and preempting jobs running at the >>> same >>> time. A UserSuspend can be entered and exited by the user, and is often >>> used >>> to hold processing to check progress, etc. >>> >>> >>> By the way ... I believe that the state diagram should at least be a >>> subset >>> of the BES state diagram ... we should adopt the same names. >>> >>> -- Chris >> >> -- "So much time, so little to do..." -- Garfield -- "So much time, so little to do..." -- Garfield
No, they are different, unfortunately. The DRMAA states are closer to the original SAGA states.
However, DRMAA also spece'd their states before BES, so that is not surprising. It would be interesting though why BES came up with a new model at all, or if they new about the DRMAA model.
I wouldn't conclude from these facts that BES did the right thing. Maybe their state diagram is the best around, but by merely being incompatible with prior work from GGF (like DRMAA) it has its own drawbacks. Mandating SAGA uses BES states is too simplistic, IMHO. I am afraid we need both groups to talk to each other and to us to resolve this. Maybe we have a chance during this GGF Meeting? Thilo
Cheers, Andre.
Quoting [Thilo Kielmann] (Feb 12 2006):
Date: Sun, 12 Feb 2006 08:08:58 +0100 From: Thilo Kielmann
To: Andre Merzky Cc: Christopher Smith , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... Curious question:
SAGA should align job states with both BES and DRMAA ? Are they the same to begin with?
Thilo
On Sat, Feb 11, 2006 at 03:54:16AM +0100, Andre Merzky wrote:
X-Original-To: kielmann@localhost Delivered-To: kielmann@localhost.cs.vu.nl Delivered-To: grdfm-saga-rg-outgoing@mailbouncer.mcs.anl.gov X-Original-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Delivered-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Date: Sat, 11 Feb 2006 03:54:16 +0100 From: Andre Merzky
To: Christopher Smith Cc: Andre Merzky , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov I agree - the file transfer state models are needed for SAGA. We don't have any actions on these states anyway.
Andre.
Quoting [Christopher Smith] (Feb 11 2006):
Sure.
As mentioned ... I think maybe supporting a subset of BES is ok. Much of the state model wrt file transfer state modelling I think is not required for SAGA.
-- Chris
On 10/2/06 18:46, "Andre Merzky"
wrote: Ok, then I'll do that in the strawman. I would appreciate if you could glance over it after commit, for a sanity check.
Thanks, Andre.
Quoting [Christopher Smith] (Feb 11 2006):
It makes sense to keep the state models in sync.
-- Chris
On 10/2/06 18:26, "Andre Merzky"
wrote: > Quoting [Christopher Smith] (Feb 11 2006): >> >> What I meant by that comment is that where it is a subset, it should >> reflect >> the BES terminology. I think that the number of states represented is >> enough >> already. ;-) > > Would it make sense to just copy the BES state diagram? > > It did not exist when we (== you ;-) drafted the SAGA job > states - if it would have been around then, we might have > had copied it already. > > Apart from the SystemXXX/UserXXX states, and from Hold, > it is not that much different from the SAGA model anyway. > > Cheers, Andre. > > >> -- Chris >> >> >> On 10/2/06 17:30, "Andre Merzky"
wrote: >> >>> Hi Chris, >>> >>> many thanks for the answers! :-) >>> >>>> By the way ... I believe that the state diagram should at least be a >>>> subset >>>> of the BES state diagram ... we should adopt the same names. >>> >>> I agree, kind of - I would say that the SAGA job state >>> diagram should at _most_ be subset of the BES state diagram. >>> It could be _S_implier :-) >>> >>> Cheers, Andre. >>> >>> >>> Quoting [Christopher Smith] (Feb 10 2006): >>>> Date: Fri, 10 Feb 2006 13:41:18 -0800 >>>> Subject: Re: [saga-rg] job states... >>>> From: Christopher Smith >>>> To: Simple API for Grid Applications WG >>>> >>>> On 4/2/06 11:18, "Andre Merzky" wrote: >>>> >>>> Ok ... I'll try to answer these, at least from my viewpoint. >>>> >>>>> >>>>> I think that diagram is wrong, isn't it? Well, here are my >>>>> questions: >>>>> >>>>> - if we submit a job, its immediately Queued - is that >>>>> right? Should it be pending before (e.g. as long as the >>>>> queuing request travels the middleware layers)? >>>>> >>>> To me, Queued is the same as Pending. Pending is probably a better word >>>> for >>>> this. Can't remember where the Queued name came from, as LSF uses PEND. >>>> >>>>> - can the hold and suspend states reached only from >>>>> 'Running', or from elsewhere as well? >>>>> >>>> You can only go into a Hold state from Pending, I think, or directly into >>>> Hold on submission. >>>> >>>>> - What is the difference between 'Hold' and 'Suspend'? >>>>> >>>> A Hold state tells the scheduler/broker not to consider this job for >>>> scheduling/dispatch until the hold is explicitly released. >>>> >>>>> - Are there signals defined (apart from KILL) which shange >>>>> the job state? I guess that is not as simple as saying >>>>> SUSP does suspend - that state is probably defined by >>>>> the scheduler, not by the OS... >>>>> >>>> Right ... this is implementation dependent on the mechanism used to >>>> suspend >>>> a job (might be a signal, might be some other mechanism). What is >>>> important >>>> is that there is an operation to initiate the state transition. >>>> >>>>> - What is the use case for distinguishing between UserHold >>>>> and SystemHold, or between UserSuspend and >>>>> SystemSuspend? >>>>> >>>> If I preempt workload, the system will put it into a SystemSuspend state >>>> that a user cannot cause a switch out of, otherwise a system may become >>>> oversubscribed due to the preempted and preempting jobs running at the >>>> same >>>> time. A UserSuspend can be entered and exited by the user, and is often >>>> used >>>> to hold processing to check progress, etc. >>>> >>>> >>>> By the way ... I believe that the state diagram should at least be a >>>> subset >>>> of the BES state diagram ... we should adopt the same names. >>>> >>>> -- Chris >>> >>> > > -- "So much time, so little to do..." -- Garfield -- "So much time, so little to do..." -- Garfield
-- Thilo Kielmann http://www.cs.vu.nl/~kielmann/
Quoting [Thilo Kielmann] (Feb 12 2006):
No, they are different, unfortunately. The DRMAA states are closer to the original SAGA states.
However, DRMAA also spece'd their states before BES, so that is not surprising. It would be interesting though why BES came up with a new model at all, or if they new about the DRMAA model.
I wouldn't conclude from these facts that BES did the right thing.
No, definitely not! But, my (very personal) opinion is that the BES version is simplier, and easier to understand, although it has more states! (I attach both diagrams).
Maybe their state diagram is the best around, but by merely being incompatible with prior work from GGF (like DRMAA) it has its own drawbacks.
Agree, but see note from Steven.
Mandating SAGA uses BES states is too simplistic, IMHO. I am afraid we need both groups to talk to each other and to us to resolve this.
Maybe. Some assurance to a 'final' GGF version would be nice...
Maybe we have a chance during this GGF Meeting?
Hmm, why not :-P Cheers, Andre.
Thilo
Cheers, Andre.
Quoting [Thilo Kielmann] (Feb 12 2006):
Date: Sun, 12 Feb 2006 08:08:58 +0100 From: Thilo Kielmann
To: Andre Merzky Cc: Christopher Smith , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... Curious question:
SAGA should align job states with both BES and DRMAA ? Are they the same to begin with?
Thilo
On Sat, Feb 11, 2006 at 03:54:16AM +0100, Andre Merzky wrote:
X-Original-To: kielmann@localhost Delivered-To: kielmann@localhost.cs.vu.nl Delivered-To: grdfm-saga-rg-outgoing@mailbouncer.mcs.anl.gov X-Original-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Delivered-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Date: Sat, 11 Feb 2006 03:54:16 +0100 From: Andre Merzky
To: Christopher Smith Cc: Andre Merzky , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov I agree - the file transfer state models are needed for SAGA. We don't have any actions on these states anyway.
Andre.
Quoting [Christopher Smith] (Feb 11 2006):
Sure.
As mentioned ... I think maybe supporting a subset of BES is ok. Much of the state model wrt file transfer state modelling I think is not required for SAGA.
-- Chris
On 10/2/06 18:46, "Andre Merzky"
wrote: Ok, then I'll do that in the strawman. I would appreciate if you could glance over it after commit, for a sanity check.
Thanks, Andre.
Quoting [Christopher Smith] (Feb 11 2006): > > It makes sense to keep the state models in sync. > > -- Chris > > > On 10/2/06 18:26, "Andre Merzky"
wrote: > >> Quoting [Christopher Smith] (Feb 11 2006): >>> >>> What I meant by that comment is that where it is a subset, it should >>> reflect >>> the BES terminology. I think that the number of states represented is >>> enough >>> already. ;-) >> >> Would it make sense to just copy the BES state diagram? >> >> It did not exist when we (== you ;-) drafted the SAGA job >> states - if it would have been around then, we might have >> had copied it already. >> >> Apart from the SystemXXX/UserXXX states, and from Hold, >> it is not that much different from the SAGA model anyway. >> >> Cheers, Andre. >> >> >>> -- Chris >>> >>> >>> On 10/2/06 17:30, "Andre Merzky" wrote: >>> >>>> Hi Chris, >>>> >>>> many thanks for the answers! :-) >>>> >>>>> By the way ... I believe that the state diagram should at least be a >>>>> subset >>>>> of the BES state diagram ... we should adopt the same names. >>>> >>>> I agree, kind of - I would say that the SAGA job state >>>> diagram should at _most_ be subset of the BES state diagram. >>>> It could be _S_implier :-) >>>> >>>> Cheers, Andre. >>>> >>>> >>>> Quoting [Christopher Smith] (Feb 10 2006): >>>>> Date: Fri, 10 Feb 2006 13:41:18 -0800 >>>>> Subject: Re: [saga-rg] job states... >>>>> From: Christopher Smith >>>>> To: Simple API for Grid Applications WG >>>>> >>>>> On 4/2/06 11:18, "Andre Merzky" wrote: >>>>> >>>>> Ok ... I'll try to answer these, at least from my viewpoint. >>>>> >>>>>> >>>>>> I think that diagram is wrong, isn't it? Well, here are my >>>>>> questions: >>>>>> >>>>>> - if we submit a job, its immediately Queued - is that >>>>>> right? Should it be pending before (e.g. as long as the >>>>>> queuing request travels the middleware layers)? >>>>>> >>>>> To me, Queued is the same as Pending. Pending is probably a better word >>>>> for >>>>> this. Can't remember where the Queued name came from, as LSF uses PEND. >>>>> >>>>>> - can the hold and suspend states reached only from >>>>>> 'Running', or from elsewhere as well? >>>>>> >>>>> You can only go into a Hold state from Pending, I think, or directly into >>>>> Hold on submission. >>>>> >>>>>> - What is the difference between 'Hold' and 'Suspend'? >>>>>> >>>>> A Hold state tells the scheduler/broker not to consider this job for >>>>> scheduling/dispatch until the hold is explicitly released. >>>>> >>>>>> - Are there signals defined (apart from KILL) which shange >>>>>> the job state? I guess that is not as simple as saying >>>>>> SUSP does suspend - that state is probably defined by >>>>>> the scheduler, not by the OS... >>>>>> >>>>> Right ... this is implementation dependent on the mechanism used to >>>>> suspend >>>>> a job (might be a signal, might be some other mechanism). What is >>>>> important >>>>> is that there is an operation to initiate the state transition. >>>>> >>>>>> - What is the use case for distinguishing between UserHold >>>>>> and SystemHold, or between UserSuspend and >>>>>> SystemSuspend? >>>>>> >>>>> If I preempt workload, the system will put it into a SystemSuspend state >>>>> that a user cannot cause a switch out of, otherwise a system may become >>>>> oversubscribed due to the preempted and preempting jobs running at the >>>>> same >>>>> time. A UserSuspend can be entered and exited by the user, and is often >>>>> used >>>>> to hold processing to check progress, etc. >>>>> >>>>> >>>>> By the way ... I believe that the state diagram should at least be a >>>>> subset >>>>> of the BES state diagram ... we should adopt the same names. >>>>> >>>>> -- Chris >>>> >>>> >> >> -- "So much time, so little to do..." -- Garfield -- "So much time, so little to do..." -- Garfield
-- "So much time, so little to do..." -- Garfield
Where did those attachements go *scratch* A. Quoting [Andre Merzky] (Feb 12 2006):
Date: Sun, 12 Feb 2006 10:06:01 +0200 From: Andre Merzky
To: Thilo Kielmann Cc: Andre Merzky , Christopher Smith , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... Quoting [Thilo Kielmann] (Feb 12 2006):
No, they are different, unfortunately. The DRMAA states are closer to the original SAGA states.
However, DRMAA also spece'd their states before BES, so that is not surprising. It would be interesting though why BES came up with a new model at all, or if they new about the DRMAA model.
I wouldn't conclude from these facts that BES did the right thing.
No, definitely not!
But, my (very personal) opinion is that the BES version is simplier, and easier to understand, although it has more states! (I attach both diagrams).
Maybe their state diagram is the best around, but by merely being incompatible with prior work from GGF (like DRMAA) it has its own drawbacks.
Agree, but see note from Steven.
Mandating SAGA uses BES states is too simplistic, IMHO. I am afraid we need both groups to talk to each other and to us to resolve this.
Maybe. Some assurance to a 'final' GGF version would be nice...
Maybe we have a chance during this GGF Meeting?
Hmm, why not :-P
Cheers, Andre.
Thilo
Cheers, Andre.
Quoting [Thilo Kielmann] (Feb 12 2006):
Date: Sun, 12 Feb 2006 08:08:58 +0100 From: Thilo Kielmann
To: Andre Merzky Cc: Christopher Smith , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... Curious question:
SAGA should align job states with both BES and DRMAA ? Are they the same to begin with?
Thilo
On Sat, Feb 11, 2006 at 03:54:16AM +0100, Andre Merzky wrote:
X-Original-To: kielmann@localhost Delivered-To: kielmann@localhost.cs.vu.nl Delivered-To: grdfm-saga-rg-outgoing@mailbouncer.mcs.anl.gov X-Original-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Delivered-To: grdfm-saga-rg@mailbouncer.mcs.anl.gov Date: Sat, 11 Feb 2006 03:54:16 +0100 From: Andre Merzky
To: Christopher Smith Cc: Andre Merzky , Simple API for Grid Applications WG Subject: Re: [saga-rg] job states... X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov I agree - the file transfer state models are needed for SAGA. We don't have any actions on these states anyway.
Andre.
Quoting [Christopher Smith] (Feb 11 2006):
Sure.
As mentioned ... I think maybe supporting a subset of BES is ok. Much of the state model wrt file transfer state modelling I think is not required for SAGA.
-- Chris
On 10/2/06 18:46, "Andre Merzky"
wrote: > Ok, then I'll do that in the strawman. I would appreciate > if you could glance over it after commit, for a sanity > check. > > Thanks, Andre. > > > Quoting [Christopher Smith] (Feb 11 2006): >> >> It makes sense to keep the state models in sync. >> >> -- Chris >> >> >> On 10/2/06 18:26, "Andre Merzky"
wrote: >> >>> Quoting [Christopher Smith] (Feb 11 2006): >>>> >>>> What I meant by that comment is that where it is a subset, it should >>>> reflect >>>> the BES terminology. I think that the number of states represented is >>>> enough >>>> already. ;-) >>> >>> Would it make sense to just copy the BES state diagram? >>> >>> It did not exist when we (== you ;-) drafted the SAGA job >>> states - if it would have been around then, we might have >>> had copied it already. >>> >>> Apart from the SystemXXX/UserXXX states, and from Hold, >>> it is not that much different from the SAGA model anyway. >>> >>> Cheers, Andre. >>> >>> >>>> -- Chris >>>> >>>> >>>> On 10/2/06 17:30, "Andre Merzky" wrote: >>>> >>>>> Hi Chris, >>>>> >>>>> many thanks for the answers! :-) >>>>> >>>>>> By the way ... I believe that the state diagram should at least be a >>>>>> subset >>>>>> of the BES state diagram ... we should adopt the same names. >>>>> >>>>> I agree, kind of - I would say that the SAGA job state >>>>> diagram should at _most_ be subset of the BES state diagram. >>>>> It could be _S_implier :-) >>>>> >>>>> Cheers, Andre. >>>>> >>>>> >>>>> Quoting [Christopher Smith] (Feb 10 2006): >>>>>> Date: Fri, 10 Feb 2006 13:41:18 -0800 >>>>>> Subject: Re: [saga-rg] job states... >>>>>> From: Christopher Smith >>>>>> To: Simple API for Grid Applications WG >>>>>> >>>>>> On 4/2/06 11:18, "Andre Merzky" wrote: >>>>>> >>>>>> Ok ... I'll try to answer these, at least from my viewpoint. >>>>>> >>>>>>> >>>>>>> I think that diagram is wrong, isn't it? Well, here are my >>>>>>> questions: >>>>>>> >>>>>>> - if we submit a job, its immediately Queued - is that >>>>>>> right? Should it be pending before (e.g. as long as the >>>>>>> queuing request travels the middleware layers)? >>>>>>> >>>>>> To me, Queued is the same as Pending. Pending is probably a better word >>>>>> for >>>>>> this. Can't remember where the Queued name came from, as LSF uses PEND. >>>>>> >>>>>>> - can the hold and suspend states reached only from >>>>>>> 'Running', or from elsewhere as well? >>>>>>> >>>>>> You can only go into a Hold state from Pending, I think, or directly into >>>>>> Hold on submission. >>>>>> >>>>>>> - What is the difference between 'Hold' and 'Suspend'? >>>>>>> >>>>>> A Hold state tells the scheduler/broker not to consider this job for >>>>>> scheduling/dispatch until the hold is explicitly released. >>>>>> >>>>>>> - Are there signals defined (apart from KILL) which shange >>>>>>> the job state? I guess that is not as simple as saying >>>>>>> SUSP does suspend - that state is probably defined by >>>>>>> the scheduler, not by the OS... >>>>>>> >>>>>> Right ... this is implementation dependent on the mechanism used to >>>>>> suspend >>>>>> a job (might be a signal, might be some other mechanism). What is >>>>>> important >>>>>> is that there is an operation to initiate the state transition. >>>>>> >>>>>>> - What is the use case for distinguishing between UserHold >>>>>>> and SystemHold, or between UserSuspend and >>>>>>> SystemSuspend? >>>>>>> >>>>>> If I preempt workload, the system will put it into a SystemSuspend state >>>>>> that a user cannot cause a switch out of, otherwise a system may become >>>>>> oversubscribed due to the preempted and preempting jobs running at the >>>>>> same >>>>>> time. A UserSuspend can be entered and exited by the user, and is often >>>>>> used >>>>>> to hold processing to check progress, etc. >>>>>> >>>>>> >>>>>> By the way ... I believe that the state diagram should at least be a >>>>>> subset >>>>>> of the BES state diagram ... we should adopt the same names. >>>>>> >>>>>> -- Chris >>>>> >>>>> >>> >>> > > -- "So much time, so little to do..." -- Garfield -- "So much time, so little to do..." -- Garfield
-- "So much time, so little to do..." -- Garfield
Andre,
No, they are different, unfortunately. The DRMAA states are closer to the original SAGA states.
We did look at DRMAA & SAGA as the starting point to the BES state diagram discussions. And CIM which may have confused the process! It was not an attempt to deliberately come up with something new. Steven -- ---------------------------------------------------------------- Dr Steven Newhouse Tel:+44 (0)2380 598789 Director, Open Middleware Infrastructure Institute-UK (OMII-UK) c/o Suite 6005, Faraday Building (B21), Highfield Campus, Southampton University, Highfield, Southampton, SO17 1BJ, UK
Quoting [Steven Newhouse] (Feb 12 2006):
Andre,
No, they are different, unfortunately. The DRMAA states are closer to the original SAGA states.
We did look at DRMAA & SAGA as the starting point to the BES state diagram discussions. And CIM which may have confused the process!
It was not an attempt to deliberately come up with something new.
Well, that is good to know! Just out of interest: where can I find the CIM one? ('GridForge' would be an insufficient answer ;-) Thanks! Andre.
Steven
-- "So much time, so little to do..." -- Garfield
Just out of interest: where can I find the CIM one? ('GridForge' would be an insufficient answer ;-)
Extracted from an email from Chris Smith to the BES group - a record of which is on GridForge somewhere! I've looked into the operational states that the CIM model enumerates for BatchJob. They are actually inherited from the ConcreteJob object from the core model. The states themselves: "JobState is an integer enumeration that indicates the " "operational state of a Job. It can also indicate " "transitions between these states, for example, 'Shutting " "Down' and 'Starting'. Following is a brief description of " "the states: \n" "New (2) indicates that the job has never been started. \n" "Starting (3) indicates that the job is moving from the " "'New', 'Suspended', or 'Service' states into the 'Running' " "state. \n" "Running (4) indicates that the Job is running. \n" "Suspended (5) indicates that the Job is stopped, but may be " "restarted in a seamless manner. \n" "Shutting Down (6) indicates the job is moving to a " "'Completed', 'Terminated', or 'Killed' state. \n" "Completed (7) indicates that the job has completed " "normally. \n" "Terminated (8) indicates that the job has been stopped by a " "'Terminate' state change request. The job and all its " "underlying processes are ended and may be restarted (this " "is job-specific) only as a new job. \n" "Killed (9) indicates that the job has been stopped by a " "'Kill' state change request. Underlying processes may have " "been left running and cleanup may be required to free up " "resources. \n" "Exception (10) indicates that the Job is in an abnormal " "state that may be indicative of an error condition. Actual " "status may be surfaced though job-specific objects. \n" "Service (11) indicates that the Job is in a vendor-specific " "state that supports problem discovery and/or resolution."), -- ---------------------------------------------------------------- Dr Steven Newhouse Tel:+44 (0)2380 598789 Director, Open Middleware Infrastructure Institute-UK (OMII-UK) c/o Suite 6005, Faraday Building (B21), Highfield Campus, Southampton University, Highfield, Southampton, SO17 1BJ, UK
It really does seem like all of these groups should stick their heads together and come to a rough consensus about what's needed and the best thing to do. --Craig At 11:59 PM 2/11/2006, Steven Newhouse wrote:
Andre,
No, they are different, unfortunately. The DRMAA states are closer to the original SAGA states.
We did look at DRMAA & SAGA as the starting point to the BES state diagram discussions. And CIM which may have confused the process!
It was not an attempt to deliberately come up with something new.
Steven
-- ---------------------------------------------------------------- Dr Steven Newhouse Tel:+44 (0)2380 598789 Director, Open Middleware Infrastructure Institute-UK (OMII-UK) c/o Suite 6005, Faraday Building (B21), Highfield Campus, Southampton University, Highfield, Southampton, SO17 1BJ, UK
participants (6)
-
Andre Merzky
-
Christopher Smith
-
Craig Lee
-
Gregor von Laszewski
-
Steven Newhouse
-
Thilo Kielmann