RE: [drmaa-wg] drmaa_wait() Clarification
The spec says there is no reaping data for the late comer. A quality implementation could try to provide both threads with everything if the second request comes during the first request processing. This is a grey area and our policy so far has been not to have things over-specified. -Hrabri -----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Daniel Templeton Sent: Monday, January 24, 2005 8:35 AM Cc: DRMAA Working Group Subject: Re: [drmaa-wg] drmaa_wait() Clarification Andreas Haas wrote:
On Mon, 24 Jan 2005, Daniel Templeton wrote:
How is an implementation supposed to handle the case where two threads call drmaa_wait() on the same job id? The choices are:
a) Both get notified when the job ends and both gets copies of the job exit and resource usage information b) Both get notified when the job ends. One gets the job exit and resource information and the other gets a DRMAA_ERRNO_NO_RUSAGE. c) Both get notified when the job ends. Which gets a copy of the job exit and resource information and which gets a DRMAA_ERRNO_NO_RUSAGE depends on which thread runs when. d) That's not allowed
b and c are race conditions and there's no error code to represent d, so that leaves us with a. This conclusion, however, needs to be clearly stated in the spec. I believe the current SGE implementation implements c.
It is not possible to prevent race condition except by not using drmaa_wait() the way you describe it.
I believe reasonable behaviour would be one gets the job exit and resource information. The other gets DRMAA_ERRNO_INVALID_JOB very much as if drmaa_wait() had been issued past the first one has reaped the job.
How is there a race condition in choice a? All threads waiting for a job gets copies of the exit status and resource usage when the job exists, then the info is disposed of. Everyone is happy. Latecomers get a DRMAA_ERRNO_INVALID_JOB. Being the one who has to implement this stuff, I realize that this is a lot harder than it sounds, but it is decidedly possible to implement. However, if what we should decide that my use case is not valid, that needs to be explicitly stated in the spec. Daniel
I'm not talking about skittish grey areas. I mean: 1) Thread 1 does drmaa_wait ("1", -1) 2) Thread 2 does drmaa_wait ("1", -1) 3) Job 1 ends Who gets what in that case? The spec does not address concurrent access. It says that if you swap steps 2 and 3, then the second thread gets an error. Daniel Rajic, Hrabri wrote:
The spec says there is no reaping data for the late comer.
A quality implementation could try to provide both threads with everything if the second request comes during the first request processing. This is a grey area and our policy so far has been not to have things over-specified.
-Hrabri
-----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Daniel Templeton Sent: Monday, January 24, 2005 8:35 AM Cc: DRMAA Working Group Subject: Re: [drmaa-wg] drmaa_wait() Clarification
Andreas Haas wrote:
On Mon, 24 Jan 2005, Daniel Templeton wrote:
How is an implementation supposed to handle the case where two threads call drmaa_wait() on the same job id? The choices are:
a) Both get notified when the job ends and both gets copies of the job exit and resource usage information b) Both get notified when the job ends. One gets the job exit and resource information and the other gets a DRMAA_ERRNO_NO_RUSAGE. c) Both get notified when the job ends. Which gets a copy of the job exit and resource information and which gets a DRMAA_ERRNO_NO_RUSAGE depends on which thread runs when. d) That's not allowed
b and c are race conditions and there's no error code to represent d,
so
that leaves us with a. This conclusion, however, needs to be clearly stated in the spec. I believe the current SGE implementation
implements c.
It is not possible to prevent race condition except by not using drmaa_wait() the way you describe it.
I believe reasonable behaviour would be one gets the job exit and resource information. The other gets DRMAA_ERRNO_INVALID_JOB very much as if drmaa_wait() had been issued past the first one has reaped the job.
How is there a race condition in choice a? All threads waiting for a job gets copies of the exit status and resource usage when the job exists, then the info is disposed of. Everyone is happy. Latecomers get a DRMAA_ERRNO_INVALID_JOB. Being the one who has to implement this stuff, I realize that this is a lot harder than it sounds, but it is decidedly possible to implement. However, if what we should decide that my use case is not valid, that needs to be explicitly stated in the spec.
Daniel
Thinking about it a little further, I can see where you might have wanted the spec to say that, but I don't think it's clear. The spec says that all calls subsequent to a *successful* call get an error. In the case below, neither is successful until step three, and then it's arguable as to whether the second call is subsequent to the first call's success or not. Regardless, my point is that we need to make it clear in the spec, whatever we decide it should say. So, is there a reason why the spec is supposed to say that two concurrent calls can't both succeed? That sounds very limiting, and I completely fail to see the advantage. You can't tell me that it's for ease of implementation, after the weeks of work I put into implementing the PartialTimestamp class. Daniel Daniel Templeton wrote:
I'm not talking about skittish grey areas. I mean:
1) Thread 1 does drmaa_wait ("1", -1) 2) Thread 2 does drmaa_wait ("1", -1) 3) Job 1 ends
Who gets what in that case? The spec does not address concurrent access. It says that if you swap steps 2 and 3, then the second thread gets an error.
Daniel
Rajic, Hrabri wrote:
The spec says there is no reaping data for the late comer.
A quality implementation could try to provide both threads with everything if the second request comes during the first request processing. This is a grey area and our policy so far has been not to have things over-specified.
-Hrabri
-----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Daniel Templeton Sent: Monday, January 24, 2005 8:35 AM Cc: DRMAA Working Group Subject: Re: [drmaa-wg] drmaa_wait() Clarification
Andreas Haas wrote:
On Mon, 24 Jan 2005, Daniel Templeton wrote:
How is an implementation supposed to handle the case where two threads call drmaa_wait() on the same job id? The choices are:
a) Both get notified when the job ends and both gets copies of the job exit and resource usage information b) Both get notified when the job ends. One gets the job exit and resource information and the other gets a DRMAA_ERRNO_NO_RUSAGE. c) Both get notified when the job ends. Which gets a copy of the job exit and resource information and which gets a DRMAA_ERRNO_NO_RUSAGE depends on which thread runs when. d) That's not allowed
b and c are race conditions and there's no error code to represent d,
so
that leaves us with a. This conclusion, however, needs to be clearly stated in the spec. I believe the current SGE implementation
implements c.
It is not possible to prevent race condition except by not using drmaa_wait() the way you describe it.
I believe reasonable behaviour would be one gets the job exit and resource information. The other gets DRMAA_ERRNO_INVALID_JOB very much as if drmaa_wait() had been issued past the first one has reaped the job.
How is there a race condition in choice a? All threads waiting for a job gets copies of the exit status and resource usage when the job exists, then the info is disposed of. Everyone is happy. Latecomers get a DRMAA_ERRNO_INVALID_JOB. Being the one who has to implement this stuff, I realize that this is a lot harder than it sounds, but it is decidedly possible to implement. However, if what we should decide that my use case is not valid, that needs to be explicitly stated in the spec.
Daniel
participants (2)
-
Daniel Templeton -
Rajic, Hrabri