drmaa_job_ps on finished jobs
Hi, I have a question concerning drmaa_job_ps behaviour for jobs that have finished (normally or failed). DRMAA 1.0 spec says: "drmaa_job_ps DRMAA SHOULD always get the status of job_id from DRM system, unless the previous status has been DRMAA_PS_FAILED or DRMAA_PS_DONE and the status has been successfully cached. Terminated jobs get DRMAA_PS_FAILED status." Does that mean that DRMAA library should cache job status (_PS_FAILED or _PS_DONE) and return it even when the job data was reaped by drmaa_wait/drmaa_synchronize? I'm a bit confused because SGE's implementation returns DRMAA_ERRNO_INVALID_JOB after drmaa_wait() but Condor's library caches that information forever -- unless lib's log files are deleted... -- Piotr Domagalski
I have a question concerning drmaa_job_ps behaviour for jobs that have finished (normally or failed). DRMAA 1.0 spec says:
"drmaa_job_ps DRMAA SHOULD always get the status of job_id from DRM system, unless the previous status has been DRMAA_PS_FAILED or DRMAA_PS_DONE and the status has been successfully cached. Terminated jobs get DRMAA_PS_FAILED status."
Does that mean that DRMAA library should cache job status (_PS_FAILED or _PS_DONE) and return it even when the job data was reaped by drmaa_wait/drmaa_synchronize? I'm a bit confused because SGE's implementation returns DRMAA_ERRNO_INVALID_JOB after drmaa_wait() but Condor's library caches that information forever -- unless lib's log files are deleted...
SGE is doing it in the right way. The point is that there is no sentence in the spec which prohibits job status availability even after the wait operation. We only have this sentence for drmaa_wait() operation itself, but nothing like this for drmaa_job_ps(): "The routine reaps jobs on a successful call, so any subsequent calls to drmaa_wait SHOULD fail returning an error DRMAA_ERRNO_INVALID_JOB meaning that the job has been already reaped." I would therefore say that Condor currently does not violate the spec, but interprets it in a very unusual way. There is already some according note in the TODO section of the Condor DRMAA documentation. I will fix this for the next version, which will hopefully make it to Condor 6.7.20. Thank you ! Regards, Peter.
participants (2)
-
Peter Troeger -
Piotr Domagalski