Hi all, I am forwarding this email on behalf of Paul Strong. -------- Original Message -------- Comments in line... Regards Paul -----Original Message----- From: Christopher Smith [mailto:csmith@platform.com] Sent: Tuesday, November 28, 2006 2:25 PM To: ogsa-bes-wg@ggf.org; Strong, Paul Cc: Hiro Kishimoto Subject: Re: [OGSA-BES-WG] [Fwd: [ogsa-wg] BES Comments]
ii) Why does the state model not end in one
state? Are there any real differences between the 3 enumerated end states other than how the activity ended up in the state?
iii) Also it seems somewhat counter intuitive that if one is going to have three termination states, one should end up in
terminated the
one state which reflects both success and failure (Finished) and a separate one for failures that prevent the activity from exiting in a controlled fashion. It seems like the transitions to the terminated state should be something like -
a. Completed Successfully
b. Failed
c. Cancelled
iv) Is there any notion of a retry, given that the state machine captures failure? Or perhaps rescheduling? For example one could imagine additional useful transitions -
a. Pending to Pending - i.e. rescheduling
b. Running to Pending - i.e. failed/retrying
Having these in the base profile or basic model would enable greater flexibility in implementing future profiles where one could request
The three states are intended to convey how the activity completes. The finished state indicates that the activity itself completed (without outside intervention), but makes no statement about whether the computation that the activity was engaged in was successful or not. This was envisioned to be extended through some extension sub-states. One of the reasons is that it's hard (generically) to determine whether an activity completes successfully or not. Using things like (say) exit code is traditional, but doesn't always help you distinguish between failure ("file doesn't exist") or a problem in the running of the activity ("my calculated result is wrong"). Failed means that some "outside event" (e.g. a host failure) caused the activity to finish prematurely, so that you can't say whether it completed or not. And Cancelled is used to indicate a manual intervention that stops the activity from completing. At one point we discussed having one terminal state, but we felt that three was quite useful. [PDS] I think this raises an important point with respect to defining the nature of an activity as a unit of work and defining its boundaries, pre-conditions and post-conditions in terms of the interaction between the client and the BES container. [PDS] If I were to define an activity then I would say that from the perspective of the BES container it is a unique instance of self contained unit of work. The client creates the activity, can monitor it and can store its result upon termination. That unit of work can be repeated but it becomes a new activity from the perspective of the BES container. I think you need something that says something like (though obviously not necessarily the same as) this. You obviously need to decide whether the container will keep a log of past activities or whether that is the responsibility of the client. [PDS] Three terminal states is obviously fine if there really is a substantive difference in the state of the activity at termination. Otherwise three state transitions to one termination state would typically allow you to capture the three (or more) modes of termination you desire in a simpler form (see attached GIF). And of course the terminated state of the activity can be queried for additional information. Termination implies nothing about whether the activity ceases to exist, merely that there are no state transitions from that state, only to it. Also a single terminated state makes it easy to understand the flow across the interface between the client and the BES container. [PDS] Anyway, whatever route you go requires greater clarity about the transitions and the states. that
the BES container implement retries or allow rescheduling. Or are you categorically stating that users will never be able to reschedule or request automatic retries. Again if this is the case, it should probably be explicitly stated. I would actually urge you to make the base specification as broad as possible in terms of the acceptable state changes at this level to avoid unnecessarily constraining yourself down the line. I think this is especially important given what is stated in 4.2.
I don't think pending to pending is necessary, as you can sub-state this to indicate any internal pending state transitions. [PDS] Perhaps you can, but perhaps there are no sub-states, merely a transition, in which case you have to capture it this way :o) Good point on running to pending ... maybe we should consider this one? How about failed to pending, or finished to pending? [PDS] I think finished to pending may be problematic unless you refine what you mean by finished. Again I guess we're back to what you define an activity to be. If finished is a terminal state then you should not be able to transition from it. The important point here is that you have to very clearly define what an activity is and what are and are not terminal states.
vi) 4.2 - State Specialization really seems to be indicating that this specification makes no assumptions with respect
to
sub-states that may be incorporated within an implementation but that such sub-states are acceptable as long as they do not introduce new state transitions between the standard, specified states. It would be very helpful to simply state this before all of the examples.
Yes ... this should be made very clear. It's definitely time for another BES conference call..... -- Chris -- Hiro Kishimoto
If I am not mistaken, I think there is a mismatch between Paul's expected use of the state machine and the (perhaps underspecified) assumptions of the BES authors. I think the BES authors are taking for granted an idiomatic "Grid monitoring" viewpoint which emphasizes steady-state conditions as descriptive summaries of past activity, while downplaying the transitions or transient events. The BES implementation might have actions associated with transitions, but the main monitoring view is meant to be the conditions between events. In other words: we do not usually assume that the client viewed all transitions, but that he wants to be able to determine the relevant actionable state from a single view of the "current steady state". There are a variety of reasons for this, including a more self-healing distributed system (observer and observed can converge to stable conditions) and more efficient aggregation and indexing (observer can export a meaningful merged model of many resources' conditions). I think this condition/event dichotomy is the gut reasoning behind both wanting multiple container-level termination states and not wanting self-transitions on a state. As a condition, if you started in the state and ended in the state with no intervening state, then you never left the state, as it is impossible to not be in some state! As a client, I will not get a stream of "self transitioned" events with any descriptive information. What Chris suggested is to indicate sub-conditions to indicate, in a more domain-specific way, how the system really did change temporarily to another observable steady-state condition, one that could still be interpreted as Pending in the overall condition model. karl -- Karl Czajkowski karlcz@univa.com
Thanks for the clarification, Karl. I was thinking that I didn't express
this very well.
-- Chris
On 28/11/06 19:37, "Karl Czajkowski"
If I am not mistaken, I think there is a mismatch between Paul's expected use of the state machine and the (perhaps underspecified) assumptions of the BES authors.
I think the BES authors are taking for granted an idiomatic "Grid monitoring" viewpoint which emphasizes steady-state conditions as descriptive summaries of past activity, while downplaying the transitions or transient events. The BES implementation might have actions associated with transitions, but the main monitoring view is meant to be the conditions between events.
In other words: we do not usually assume that the client viewed all transitions, but that he wants to be able to determine the relevant actionable state from a single view of the "current steady state". There are a variety of reasons for this, including a more self-healing distributed system (observer and observed can converge to stable conditions) and more efficient aggregation and indexing (observer can export a meaningful merged model of many resources' conditions).
I think this condition/event dichotomy is the gut reasoning behind both wanting multiple container-level termination states and not wanting self-transitions on a state. As a condition, if you started in the state and ended in the state with no intervening state, then you never left the state, as it is impossible to not be in some state!
As a client, I will not get a stream of "self transitioned" events with any descriptive information. What Chris suggested is to indicate sub-conditions to indicate, in a more domain-specific way, how the system really did change temporarily to another observable steady-state condition, one that could still be interpreted as Pending in the overall condition model.
karl
participants (3)
-
Christopher Smith
-
Hiro Kishimoto
-
Karl Czajkowski