RE: [drmaa-wg] Simultaneous waits on the same job id?
Good question. There is no such provision in the spec. One thread would need to be the first ... Hrabri
-----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Ed Baskerville Sent: Friday, June 23, 2006 3:32 PM To: DRMAA Working Group Subject: [drmaa-wg] Simultaneous waits on the same job id?
With all the discussion of wait in multithreaded contexts, I thought I'd throw out another related question...
Are multiple threads allowed to wait simultaneously on the same job id and get back results, or is it required that one of them gets back DRMAA_ERRNO_INVALID_JOB? That is, is it possible for the data to be reaped simultaneously for multiple waiting threads, or must only one of them be lucky enough to get the data back?
For Xgrid, either way is straightforward to implement; obviously having the option of returning data to multiple simultaneous calls would be nice, but I want to get the semantics right.
--Ed
This implies that the second call should fail when interpreted in the context of a multithreaded application, but it doesn't really seem to be written with a mulithreaded application. There's no error code that makes sense here: INVALID_JOB implies that the job data has already been reaped, but that's not necessarily true, because you could have something like this: thread 1: wait(jobId) thread 2: wait(jobId), immediately returns INVALID_JOB_ID because there's already a wait in progress thread 1: wait times out thread 2: wait(jobId)...completes successfully So thread 2 is first told that the job data has already been reaped, then told that the job is valid (because thread 1 happened to time out). That's just weird. Another option is to simply not return INVALID_JOB_ID until the data *has* been reaped (or not), but that seems weird too--why make subsequent threads wait if they're probably just going to get an error message? If this hasn't been decided, I would propose that a provision be added saying that multiple threads are allowed to wait simultaneously, and *all of them* get back the job data. It's not too hard to implement, at least for Xgrid, and the semantics seem cleaner. --Ed On Jun 23, 2006, at 1:52 PM, Rajic, Hrabri wrote:
Good question. There is no such provision in the spec. One thread would need to be the first ...
Hrabri
-----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Ed Baskerville Sent: Friday, June 23, 2006 3:32 PM To: DRMAA Working Group Subject: [drmaa-wg] Simultaneous waits on the same job id?
With all the discussion of wait in multithreaded contexts, I thought I'd throw out another related question...
Are multiple threads allowed to wait simultaneously on the same job id and get back results, or is it required that one of them gets back DRMAA_ERRNO_INVALID_JOB? That is, is it possible for the data to be reaped simultaneously for multiple waiting threads, or must only one of them be lucky enough to get the data back?
For Xgrid, either way is straightforward to implement; obviously having the option of returning data to multiple simultaneous calls would be nice, but I want to get the semantics right.
--Ed
Ed, I'm with you 100%, but this was discussed at length, and the group (minus me) decided that it was important to follow the POSIX wait(4) semantics, in which only one thread will succeed in waiting and the others will fail when the job is reaped. At some point I had found another POSIX wait variant which allowed all waiting threads to receive the exit status information, but I have since forgotten what it was, and Sun's new email retention policy has deleted the email. Daniel Ed Baskerville wrote:
This implies that the second call should fail when interpreted in the context of a multithreaded application, but it doesn't really seem to be written with a mulithreaded application. There's no error code that makes sense here: INVALID_JOB implies that the job data has already been reaped, but that's not necessarily true, because you could have something like this:
thread 1: wait(jobId) thread 2: wait(jobId), immediately returns INVALID_JOB_ID because there's already a wait in progress thread 1: wait times out thread 2: wait(jobId)...completes successfully
So thread 2 is first told that the job data has already been reaped, then told that the job is valid (because thread 1 happened to time out). That's just weird.
Another option is to simply not return INVALID_JOB_ID until the data *has* been reaped (or not), but that seems weird too--why make subsequent threads wait if they're probably just going to get an error message?
If this hasn't been decided, I would propose that a provision be added saying that multiple threads are allowed to wait simultaneously, and *all of them* get back the job data. It's not too hard to implement, at least for Xgrid, and the semantics seem cleaner.
--Ed
On Jun 23, 2006, at 1:52 PM, Rajic, Hrabri wrote:
Good question. There is no such provision in the spec. One thread would need to be the first ...
Hrabri
-----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Ed Baskerville Sent: Friday, June 23, 2006 3:32 PM To: DRMAA Working Group Subject: [drmaa-wg] Simultaneous waits on the same job id?
With all the discussion of wait in multithreaded contexts, I thought I'd throw out another related question...
Are multiple threads allowed to wait simultaneously on the same job id and get back results, or is it required that one of them gets back DRMAA_ERRNO_INVALID_JOB? That is, is it possible for the data to be reaped simultaneously for multiple waiting threads, or must only one of them be lucky enough to get the data back?
For Xgrid, either way is straightforward to implement; obviously having the option of returning data to multiple simultaneous calls would be nice, but I want to get the semantics right.
--Ed
OK, I'll do that. But one more question: how do these issues apply to synchronize? Consider the following sequence: thread 1: wait(job id 5) thread 2: synchronize(job ids 5,6,7) [job id 5 finishes] ¿thread 2 synchronize call fails? [job id 7 finishes] [job id 6 finishes] ¿thread 2 synchronize call succeeds? Should synchronize fail with INVALID_JOB as soon as any of the ids it's waiting on are reaped? Or should it eventually succeed? --Ed On Jun 23, 2006, at 4:22 PM, Daniel Templeton wrote:
Ed,
I'm with you 100%, but this was discussed at length, and the group (minus me) decided that it was important to follow the POSIX wait (4) semantics, in which only one thread will succeed in waiting and the others will fail when the job is reaped.
At some point I had found another POSIX wait variant which allowed all waiting threads to receive the exit status information, but I have since forgotten what it was, and Sun's new email retention policy has deleted the email.
Daniel
Ed Baskerville wrote:
This implies that the second call should fail when interpreted in the context of a multithreaded application, but it doesn't really seem to be written with a mulithreaded application. There's no error code that makes sense here: INVALID_JOB implies that the job data has already been reaped, but that's not necessarily true, because you could have something like this:
thread 1: wait(jobId) thread 2: wait(jobId), immediately returns INVALID_JOB_ID because there's already a wait in progress thread 1: wait times out thread 2: wait(jobId)...completes successfully
So thread 2 is first told that the job data has already been reaped, then told that the job is valid (because thread 1 happened to time out). That's just weird.
Another option is to simply not return INVALID_JOB_ID until the data *has* been reaped (or not), but that seems weird too--why make subsequent threads wait if they're probably just going to get an error message?
If this hasn't been decided, I would propose that a provision be added saying that multiple threads are allowed to wait simultaneously, and *all of them* get back the job data. It's not too hard to implement, at least for Xgrid, and the semantics seem cleaner.
--Ed
On Jun 23, 2006, at 1:52 PM, Rajic, Hrabri wrote:
Good question. There is no such provision in the spec. One thread would need to be the first ...
Hrabri
-----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Ed Baskerville Sent: Friday, June 23, 2006 3:32 PM To: DRMAA Working Group Subject: [drmaa-wg] Simultaneous waits on the same job id?
With all the discussion of wait in multithreaded contexts, I thought I'd throw out another related question...
Are multiple threads allowed to wait simultaneously on the same job id and get back results, or is it required that one of them gets back DRMAA_ERRNO_INVALID_JOB? That is, is it possible for the data to be reaped simultaneously for multiple waiting threads, or must only one of them be lucky enough to get the data back?
For Xgrid, either way is straightforward to implement; obviously having the option of returning data to multiple simultaneous calls would be nice, but I want to get the semantics right.
--Ed
On Fri, 23 Jun 2006, Ed Baskerville wrote:
OK, I'll do that. But one more question: how do these issues apply to synchronize? Consider the following sequence:
thread 1: wait(job id 5) thread 2: synchronize(job ids 5,6,7) [job id 5 finishes] ¿thread 2 synchronize call fails? [job id 7 finishes] [job id 6 finishes] ¿thread 2 synchronize call succeeds?
Should synchronize fail with INVALID_JOB as soon as any of the ids it's waiting on are reaped? Or should it eventually succeed?
There is no reason for synchronize to fail, as long as none of the jobs was reaped when synchronize() gets issued. Andreas
Ed, What Andreas is trying to get at is that since synchronize() doesn't return job information, so multiple simultaneous synchronize() calls will all succeed. Even if synchronize() reaps job info, since other calls to synchronize() don't care whether they actually are able to find the job info, it's all good. As long as synchronize() is running, reaped jobs are viewed as simply having ended, so not even a call to wait() will cause a call to synchronize() to fail. Is that really correct? Can anyone else confirm? I think we should probably have a tracker to clarify that point in the (g2) spec. Daniel Andreas.Haas@Sun.COM wrote:
On Fri, 23 Jun 2006, Ed Baskerville wrote:
OK, I'll do that. But one more question: how do these issues apply to synchronize? Consider the following sequence:
thread 1: wait(job id 5) thread 2: synchronize(job ids 5,6,7) [job id 5 finishes] ¿thread 2 synchronize call fails? [job id 7 finishes] [job id 6 finishes] ¿thread 2 synchronize call succeeds?
Should synchronize fail with INVALID_JOB as soon as any of the ids it's waiting on are reaped? Or should it eventually succeed?
There is no reason for synchronize to fail, as long as none of the jobs was reaped when synchronize() gets issued.
Andreas
Ed,
What Andreas is trying to get at is that since synchronize() doesn't return job information, so multiple simultaneous synchronize () calls will all succeed. Even if synchronize() reaps job info, since other calls to synchronize() don't care whether they actually are able to find the job info, it's all good. As long as synchronize() is running, reaped jobs are viewed as simply having ended, so not even a call to wait() will cause a call to synchronize () to fail.
Is that really correct? Can anyone else confirm? I think we should probably have a tracker to clarify that point in the (g2) spec.
I confirm. The DRMAA group agreed (in another discussion) that sync() has call-time context semantics. The text from the spec says more or less the same. Therefore, your synchronize call should succeed if jobs 5,6,7 existed at the time when thread 2 invokes the operation. Peter.
Daniel
Andreas.Haas@Sun.COM wrote:
On Fri, 23 Jun 2006, Ed Baskerville wrote:
OK, I'll do that. But one more question: how do these issues apply to synchronize? Consider the following sequence:
thread 1: wait(job id 5) thread 2: synchronize(job ids 5,6,7) [job id 5 finishes] ¿thread 2 synchronize call fails? [job id 7 finishes] [job id 6 finishes] ¿thread 2 synchronize call succeeds?
Should synchronize fail with INVALID_JOB as soon as any of the ids it's waiting on are reaped? Or should it eventually succeed?
There is no reason for synchronize to fail, as long as none of the jobs was reaped when synchronize() gets issued.
Andreas
Peter, I think this issue is different from call-time semantics. This issue is about the use of ALL being forgiving. For example, if I do a control(ALL, TERMINATE) and one of the jobs ends before the control() call can kill it, I would not expect the call to fail. Same thing with synchronize(). If I do a synchronize(ALL) and a job is reaped before the call completes, that's OK. To me, that's a very different statement from saying that the ALL constant represents all of the jobs that were submitted at the time of the call, and I think that should be clarified in the spec for synchronize() and control(). The statement that I think we're making here is that a call that uses the ALL constant will operate of the list of jobs as it was at submission time minus any jobs that go out of scope for the operation during the run time of the operation. Am I wrong? Daniel Peter Troeger wrote:
Ed,
What Andreas is trying to get at is that since synchronize() doesn't return job information, so multiple simultaneous synchronize() calls will all succeed. Even if synchronize() reaps job info, since other calls to synchronize() don't care whether they actually are able to find the job info, it's all good. As long as synchronize() is running, reaped jobs are viewed as simply having ended, so not even a call to wait() will cause a call to synchronize() to fail.
Is that really correct? Can anyone else confirm? I think we should probably have a tracker to clarify that point in the (g2) spec.
I confirm. The DRMAA group agreed (in another discussion) that sync() has call-time context semantics. The text from the spec says more or less the same. Therefore, your synchronize call should succeed if jobs 5,6,7 existed at the time when thread 2 invokes the operation.
Peter.
Daniel
Andreas.Haas@Sun.COM wrote:
On Fri, 23 Jun 2006, Ed Baskerville wrote:
OK, I'll do that. But one more question: how do these issues apply to synchronize? Consider the following sequence:
thread 1: wait(job id 5) thread 2: synchronize(job ids 5,6,7) [job id 5 finishes] ¿thread 2 synchronize call fails? [job id 7 finishes] [job id 6 finishes] ¿thread 2 synchronize call succeeds?
Should synchronize fail with INVALID_JOB as soon as any of the ids it's waiting on are reaped? Or should it eventually succeed?
There is no reason for synchronize to fail, as long as none of the jobs was reaped when synchronize() gets issued.
Andreas
participants (5)
-
Andreas.Haas@Sun.COM
-
Daniel Templeton
-
Ed Baskerville
-
Peter Troeger
-
Rajic, Hrabri