Comments on latest DRMAA documents
Hi, We still have some comments on DRMAA specs: 1. The IDL version says that drmaa_wait(SESSION_ANY) should wait only on those jobs that were in the session up until the drmaa_wait call. The latest DRMAA 1.0 spec (June, 2007) doesn't mention it. I remember quite long discussion [1] where Daniel argued that this waiting semantics prevents some useful use cases. As I don't see any clear consensus in that discussion, am I right assuming that the IDL version has the final agreed semantics? 2. I remember asking questions about DRMAA semantics concerning the behaviour of drmaa_job_ps on finished (drmaa_wait'ed) jobs [2]. In the latest DRMAA 1.0 spec, in 3.1.2 it says: "Succesfull drmaa_wait() and drmaa_synchronize(), with dispose = true parameter, calls will make job id's invalid by reaping the job run usage data.". I think this is the only hint in the spec on what might happen to drmaa_job_ps on disposed jobs. In [2] Peter suggested adding something more clear but I can't seen anything like that in current docs. OK, so I firstly assume that the term "job run usage data" doesn't mean that only the "rsuage" structure used in drmaa_wait is disposed, but the whole jobid is from now on considered invalid, so that drmaa_job_ps returns INVALID_JOB. This term, especially with the next point 3.1.3 talking about "run usage data" with other meaning, is a bit confusing. But assuming I was right in the previous paragraph, how is one supposed to implement drmaa_job_ps which supports jobids from other sessions (or jobs submitted not using DRMAA at all)? When I call drmaa_wait, it disposes the job, so drmaa_job_ps now returns INVALID_JOB, but If I ask for the status in another session, it will return it happily. Is this how it's supposed to work? [1] http://www.ogf.org/pipermail/drmaa-wg/2006-June/000505.htm [2] http://www.ogf.org/pipermail/drmaa-wg/2006-May/000453.html -- Piotr Domagalski
Piotr Domagalski wrote:
Hi,
We still have some comments on DRMAA specs:
1. The IDL version says that drmaa_wait(SESSION_ANY) should wait only on those jobs that were in the session up until the drmaa_wait call. The latest DRMAA 1.0 spec (June, 2007) doesn't mention it.
I remember quite long discussion [1] where Daniel argued that this waiting semantics prevents some useful use cases. As I don't see any clear consensus in that discussion, am I right assuming that the IDL version has the final agreed semantics?
Yep. The IDL doc is the final agreement. I gave up that fight a long time ago. :)
2. I remember asking questions about DRMAA semantics concerning the behaviour of drmaa_job_ps on finished (drmaa_wait'ed) jobs [2]. In the latest DRMAA 1.0 spec, in 3.1.2 it says:
"Succesfull drmaa_wait() and drmaa_synchronize(), with dispose = true parameter, calls will make job id's invalid by reaping the job run usage data.".
I think this is the only hint in the spec on what might happen to drmaa_job_ps on disposed jobs. In [2] Peter suggested adding something more clear but I can't seen anything like that in current docs. OK, so I firstly assume that the term "job run usage data" doesn't mean that only the "rsuage" structure used in drmaa_wait is disposed, but the whole jobid is from now on considered invalid, so that drmaa_job_ps returns INVALID_JOB. This term, especially with the next point 3.1.3 talking about "run usage data" with other meaning, is a bit confusing.
I guess we need to open a tracker to clarify the behavior of the drmaa_job_ps routine when a job has been reaped.
But assuming I was right in the previous paragraph, how is one supposed to implement drmaa_job_ps which supports jobids from other sessions (or jobs submitted not using DRMAA at all)? When I call drmaa_wait, it disposes the job, so drmaa_job_ps now returns INVALID_JOB, but If I ask for the status in another session, it will return it happily. Is this how it's supposed to work?
Submitting a job creates a record in the session context. When that job ends, that job's status can be permanently set in the record as DONE or FAILED. When the job is reaped, the record is reaped. Now, before a job has finished, or when querying a job from outside the submitting session, the drmaa_job_ps routine must talk to the DRM to get the job's status. After a job has finished, the DRM is allowed to claim the job doesn't exist. As long as you're in the same session that submitted the job, and you haven't reaped the job, you can use the job's session record instead of asking the DRM. If the job has been reaped or you're not in the same session, if the DRM doesn't know about the job, you get an INVALID_JOB. If the DRM always knows about every job that was ever submitted, then you'll never get an INVALID_JOB, even if the job has been reaped. Daniel
[1] http://www.ogf.org/pipermail/drmaa-wg/2006-June/000505.htm [2] http://www.ogf.org/pipermail/drmaa-wg/2006-May/000453.html
On 7/27/07, Daniel Templeton <Dan.Templeton@sun.com> wrote:
Submitting a job creates a record in the session context. When that job ends, that job's status can be permanently set in the record as DONE or FAILED. When the job is reaped, the record is reaped. Now, before a job has finished, or when querying a job from outside the submitting session, the drmaa_job_ps routine must talk to the DRM to get the job's status. After a job has finished, the DRM is allowed to claim the job doesn't exist. As long as you're in the same session that submitted the job, and you haven't reaped the job, you can use the job's session record instead of asking the DRM. If the job has been reaped or you're not in the same session, if the DRM doesn't know about the job, you get an INVALID_JOB. If the DRM always knows about every job that was ever submitted, then you'll never get an INVALID_JOB, even if the job has been reaped.
Thanks! This explanation made it more clear than any spec before. We'd love to see something like this in the documents. -- Piotr Domagalski
On 7/27/07, Piotr Domagalski <szalik@szalik.net> wrote:
Thanks! This explanation made it more clear than any spec before. We'd love to see something like this in the documents.
Something that also came into my head - wouldn't it be good to discuss some standard resource usage fields (e.g. cpuUserTime, cpuSystemTime, etc.) that are returned by drmaa_wait()? Unfortunately they're currently left entirely up to the implementation... -- Piotr Domagalski
participants (2)
-
Daniel Templeton -
Piotr Domagalski