
Dear Mark, Thanks fro reading the API spec! :-) My 2cent worth of comments to your questions below. Quoting [Shantenu Jha] (Jul 07 2005):
---------- Forwarded message ---------- Date: Fri, 10 Jun 2005 14:12:40 +0100 (BST) From: Mark McKeown <zzalsmm3@nessie.mcc.ac.uk> Subject: SAGA question
I have a quick SAGA question about the strawman API - page 41:
..... JobService myjs = SomeJobServiceFactory (...); Job myjob = new Job ();
myjs.submitJob (jobdef, myjob);
while ( something ) { JobState myjobstate; myjob.getJobState (myjobstate);
if ( myjobstate == Running ) etc... ......
The two issues I am concerned with are latency and partial failure (see "A Note on Distributed Computing", http://research.sun.com/techrep/1994/abstract-29.html).
->Latency
Since the job is running remotely is it sensible for a client to make a decision based on if it is running - by the time the client has got the status message the state of the job may have changed.
You have the same race condition locally - just because the times are shorter they are less likely to occur, but they can: you do a ps, and then a kill - and voila, the job is already gone. Since the kill (or whatever you do) will deliver a good error description ("Job does not exist", "Job is already stopped" etc), your application should be able to handle that situation gracefully (just as in the local case, where errno or shell return value give similar infos).
->Partial Failure What happens if there is a network failure when I do
myjob.getJobState (myjobstate);
how will this error by handled by the API?
You will get a different, explicit error message ("could not contact resource/resource manager" or so) with the description of the problem, not just a failure. However, we need to define the possible error codes more specifically in the specification, you certainly have a valid point. We are working on that (Tom today volonteered to have a closer look at the error system). Thanks for your feedback, Cheers, Andre.
cheers MArk
-- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+
participants (1)
-
Andre Merzky