On 29 April 2011 13:10, Nadav Brandes <nadavbrandes@gmail.com> wrote:
Hi guys,
My team and I have finished going over the latest draft of DRMAA2, and we have some comments, suggestions and questions about it. We want to hear your opinion about these issues.
Given a jobId, you can easily get its Job object using the method JobSession::getJobs(in JobInfo filter), if you give has as a filter a JobInfo with the wanted jobId (maybe it would be an easier shorthand if DRMAA had a method JobSession::getJob(string jobId), but this is a different issue). But, given a jobArrayId, there is no way to get its JobArray object, which is a great limit of DRMAA that doesn't really let users to use the JobArray feature in DRMAA as it is used in most batch systems. I think that there should be added a similar method JobSession::getJobArrays(in JobArrayInfo filter), or at least a method JobSession::getJobArray(string jobArrayId). A very important feature that many batch systems support is the ability to limit the number of jobs in a job array that may run simultaneously (in LSF it's called "Slot Limit" and you can read about it at http://www-cecpv.u-strasbg.fr/Documentations/lsf/html/lsf6.1_admin/G_jobarra...). I think that DRMAA can also support this feature by:
Change the method JobSession::runBulkJobs so it will also accept an optional argument in long slotLimit (if it's UNSET then no slot limit will be assigned to the new job array).
Torque also supports this feature. What about Grid Engine?
Add a new method JobArray::changeSlotLimit(in long slotLimit)
There are some parameters that most batch systems allow changing for already submitted jobs, but DRMAA doesn't support changing them. For example, DRMAA doesn't let you change the priority or queue of an already submitted jobs. I think that methods Job::changePriority(in long priority) and Job::changeQueue(in string queueName) should be added. Many batch systems allow rerunning existing jobs. Although DRMAA has a field called rerunnable in the JobTemplate struct, it doesn't allow users to actually rerun jobs. Maybe a method Job::rerun() could be added to DRMAA. I have a question. Does DRMAA support Generic Resources? (for example, if I have a cluster where some of its nodes have GPU cards, and I want to submit jobs that require a certain amount of GPUs, so I would like the batch system to manage it for me, as many batch systems know how to manage).
Thank you for reading all of this. I would very like to hear what you think about each of the bullets above.
Regards, Nadav
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- Mariusz