Ed, I'm with you 100%, but this was discussed at length, and the group (minus me) decided that it was important to follow the POSIX wait(4) semantics, in which only one thread will succeed in waiting and the others will fail when the job is reaped. At some point I had found another POSIX wait variant which allowed all waiting threads to receive the exit status information, but I have since forgotten what it was, and Sun's new email retention policy has deleted the email. Daniel Ed Baskerville wrote:
This implies that the second call should fail when interpreted in the context of a multithreaded application, but it doesn't really seem to be written with a mulithreaded application. There's no error code that makes sense here: INVALID_JOB implies that the job data has already been reaped, but that's not necessarily true, because you could have something like this:
thread 1: wait(jobId) thread 2: wait(jobId), immediately returns INVALID_JOB_ID because there's already a wait in progress thread 1: wait times out thread 2: wait(jobId)...completes successfully
So thread 2 is first told that the job data has already been reaped, then told that the job is valid (because thread 1 happened to time out). That's just weird.
Another option is to simply not return INVALID_JOB_ID until the data *has* been reaped (or not), but that seems weird too--why make subsequent threads wait if they're probably just going to get an error message?
If this hasn't been decided, I would propose that a provision be added saying that multiple threads are allowed to wait simultaneously, and *all of them* get back the job data. It's not too hard to implement, at least for Xgrid, and the semantics seem cleaner.
--Ed
On Jun 23, 2006, at 1:52 PM, Rajic, Hrabri wrote:
Good question. There is no such provision in the spec. One thread would need to be the first ...
Hrabri
-----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Ed Baskerville Sent: Friday, June 23, 2006 3:32 PM To: DRMAA Working Group Subject: [drmaa-wg] Simultaneous waits on the same job id?
With all the discussion of wait in multithreaded contexts, I thought I'd throw out another related question...
Are multiple threads allowed to wait simultaneously on the same job id and get back results, or is it required that one of them gets back DRMAA_ERRNO_INVALID_JOB? That is, is it possible for the data to be reaped simultaneously for multiple waiting threads, or must only one of them be lucky enough to get the data back?
For Xgrid, either way is straightforward to implement; obviously having the option of returning data to multiple simultaneous calls would be nice, but I want to get the semantics right.
--Ed