New subject: [drmaa-wg] drmaa_wait() Clarification

24 Jan 2005

      The spec says there is no reaping data for the late comer.

A quality implementation could try to provide both threads with
everything if the second request comes during the first request
processing.  This is a grey area and our policy so far has been not to
have things over-specified.

    -Hrabri

-----Original Message-----
From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf
Of Daniel Templeton
Sent: Monday, January 24, 2005 8:35 AM
Cc: DRMAA Working Group
Subject: Re: [drmaa-wg] drmaa_wait() Clarification

Andreas Haas wrote:
...
On Mon, 24 Jan 2005, Daniel Templeton wrote:
...
How is an implementation supposed to handle the case where two threads
call drmaa_wait() on the same job id?  The choices are:
a) Both get notified when the job ends and both gets copies of the job
exit and resource usage information
b) Both get notified when the job ends.  One gets the job exit and
resource information and the other gets a DRMAA_ERRNO_NO_RUSAGE.
c) Both get notified when the job ends.  Which gets a copy of the job
exit and resource information and which gets a DRMAA_ERRNO_NO_RUSAGE
depends on which thread runs when.
d) That's not allowed
b and c are race conditions and there's no error code to represent d,
so
that leaves us with a.  This conclusion, however, needs to be clearly
stated in the spec.  I believe the current SGE implementation
implements c.
It is not possible to prevent race condition except by not using
drmaa_wait() the way you describe it.
I believe reasonable behaviour would be one gets the job exit and
resource information. The other gets DRMAA_ERRNO_INVALID_JOB very
much as if drmaa_wait() had been issued past the first one has
reaped the job.
How is there a race condition in choice a?  All threads waiting for a 
job gets copies of the exit status and resource usage when the job 
exists, then the info is disposed of.  Everyone is happy.  Latecomers 
get a DRMAA_ERRNO_INVALID_JOB.
Being the one who has to implement this stuff, I realize that this is a 
lot harder than it sounds, but it is decidedly possible to implement.
However, if what we should decide that my use case is not valid, that 
needs to be explicitly stated in the spec.

Daniel

RE: [drmaa-wg] drmaa_wait() Clarification

Rajic, Hrabri

Daniel Templeton

Daniel Templeton

tags

participants (2)