On Mon, 24 Jan 2005, Daniel Templeton wrote:
How is an implementation supposed to handle the case where two threads call drmaa_wait() on the same job id? The choices are:
a) Both get notified when the job ends and both gets copies of the job exit and resource usage information b) Both get notified when the job ends. One gets the job exit and resource information and the other gets a DRMAA_ERRNO_NO_RUSAGE. c) Both get notified when the job ends. Which gets a copy of the job exit and resource information and which gets a DRMAA_ERRNO_NO_RUSAGE depends on which thread runs when. d) That's not allowed
b and c are race conditions and there's no error code to represent d, so that leaves us with a. This conclusion, however, needs to be clearly stated in the spec. I believe the current SGE implementation implements c.
It is not possible to prevent race condition except by not using drmaa_wait() the way you describe it. I believe reasonable behaviour would be one gets the job exit and resource information. The other gets DRMAA_ERRNO_INVALID_JOB very much as if drmaa_wait() had been issued past the first one has reaped the job. Andreas