Quoting [Ole Christian Weidner] (Mar 22 2010):
On Mar 22, 2010, at 8:25 AM, Andre Merzky wrote:
Quoting [Ole Christian Weidner] (Mar 22 2010):
Aloha,
what was the reason again *not* to have a "pending" state in the saga job model?
The decision on what states are on the top level of the SAGA state model was based on the operations available in the API: only those states got added which were explicitely reachable via some API method.
Ok, but why? This is IMHO a pretty random decision.
Might as well be, true - some decision had to be made though, and that seems as good a guideline as any other.
A 'Pending' state cannot be reached (or left, depending on semantics) by any SAGA API call, thus that is only available as state detail, not as top level state.
What about saga::job::New --run()--> saga::job::Pending. Also, you could say the same thing about Done and Failed: these states are not explicitly reachable via a call... and wait() doesn't really count! if it does, you could also use it to transition from Pending to Running, Failed, etc.
New->run()->Pending: Sure, but then you have a transition Pending->Running which is not expressed at API level. Done/Failed: correct of course, but yes, we counted wait() to be the point where the application can sync with the job state.
I'm implementing the third job adaptor (gLite CREAM) for saga and again, I don't know if I should map gLite's "pending" state to saga::job::New or saga::job::Running.
It should go to Running (as almost all substates IMHO). New is usually defined so that a job does not yet have a backend representation.
Usually?
Yes: most states in middleware systems are assigned to jobs which have a backend representation, and are thus in Running state from the SGA point of view. An exception I can think of are substates to Suspended (UserSuspended, SystemSuspended etc), and substates to the final states (UserFailed, ApplicationFailed, SystemFailed, UserCanceled, SystemCanceled etc). Most other states we encountered and which are specified for the various systems describe details of a live job (after being accepted by the backend, before being suspended or finished), and can thus be mapped to Running.
In Pending states however, most middleware do already maintain job state.
What do you mean by maintaining a job state?
A better way to express this may be to say: the job has a representation in the backenend. I.e., the backend accepted the job creation request and a job-id exists which uniquely identifies the job.
Most of the middleware API's out there come with a plethora of states (e.g. gLite: 11), but most of them map naturally map to one of the saga job model's states. "Pending" is a state pretty much used by everyone (Condor, PBS, LSF, Globus, gLite, GridSAM) and it really doesn't map to saga's model. IMHO it's a major design flaw - how could this fall through the sieve? Or is there a reason behind this?
See above. As you say, there is a plethora of states, and many are important for specific use cases. Other states have been candidates for SAGA, such as StageIn and StageOut, or Hold, for all of which exist interesting use cases. But again: it did not seem very useful to expose states on the top level which cannot be reached via API calls - they are then only useful for informational purposes. As such, they are still available in the state details.
But again: why didn't it seem very useful? ;-)
I would be perfectly happy using the state detail. The only problem with them is that they're absolutely useless without any formalization. Do you think it would make sense to define an extended state model (on implementation level) for the state details? This is IMHO the only way to make use of it programatically.
The state detail format is specified in GFD.90, as State details in SAGA SHOULD be formatted as follows: â<model>:<state>â with valid models being âBESâ, âDRMAAâ, or other implementation speciï¬c models. For example, a state detail for the BES state âStagingInâ would be rendered as âBES:StagingInâ), and would be a substate of Running. If no state details are available, the metric is still available, but it has alwaysanempty string value. So, 'gLite:Pending' would be what you are looking for, and is should be possible to be interpreted by the application (it needs to have a notion what 'Pending' means, and need to look on the second part of the state detail). The only more convenient way to expose the state detail I could think of would be to expose the state details components individually state_detail_model = gLite state_detail_value = Pending
Also, as a last point: the more states we add to SAGA, the more difficult it is to map to a specific backend state model (DRMAA, AWS, local, ssh and BES come to my mind which do not have a Pending, for example).
I don't think that this is a valid point. Why does it become more difficult? Especially if we're talking about a state that cannot be reached explicitly: you don't have to worry about it at all. If SSH doesn't have a "PENDING" state, it will simply never reach it!
The state model is getting more complicated, as you need to allow state transitions from New to Running to cater for those backends. For example, we have been considering initially to use the DRMAA.v1 state model, as that was the state of the art at that poit in time (long time ago). DRMAA has the following states: UNDETERMINED, QUEUED_ACTIVE, SYSTEM_ON_HOLD, USER_ON_HOLD, USER_SYSTEM_ON_HOLD, RUNNING, SYSTEM_SUSPENDED, USER_SUSPENDED, USER_SYSTEM_SUSPENDED, DONE, FAILED It turned out to be hard to map the globus or gLite states to that model w/o ending up with an insanely complex state mapping rules. Thus we went for the simplest state model possible. Let me turn the question around: what exactly is the use case you need the Pending state for, and why can't that be solved with the state_detail? Finally: if you and other strongly feel that the SAGA state model is too simple, or the state detail is not accessible enough, we should certainly reopen the discussion on how those are rendered in the API. I doubt that it would be prudent to just change our implementation though, w/o revising the spec first. Cheers, Andre. -- Nothing is ever easy.