Hi, Quoting [Peter Tr?ger] (Mar 23 2010):
Hi,
As long as the MonitoringSession::drmsQueueNames is nothing more than an opaque set of strings that are the valid values for JobTemplate::queueName, I can live with it. I can see where that would be useful for a portal. I thought, however, that we had come to the conclusion previously that portals and user interfaces were not really our target applications. (Anyone remember what feature spawned that conclusion?) I thought that DRMAA was specifically focused on applications integrating with clusters. If so, a list of opaque strings is useless.
We dropped the portal example, that's true. The most convincing DRMAA applications at the moment are high-level APIs and meta-schedulers on top of / with DRMAA support.
I did some field study to get the picture right. LSF, PBS, SGE, LoadLeveler, SAGA, Globus and GridWay can submit jobs to particular queues. In LoadLeveler, queues are called "classes". Condor, JSDL and OGSA-BES seem to have no queue concept - correct me if I am wrong. The retrieval of the list of queue names is only supported in:
LSF: bqueues (http://www.vub.ac.be/BFUCC/LSF/man/bqueues.1.html) PBS: qstat -q (http://linux.die.net/man/1/qstat) SGE: qstat -f LoadLeveler: llclass -l (http://www.ccs.ornl.gov/Cheetah/ LL.html#Classes)
So if we add the monitoring facility, an empty return value must be still valid.
By the way, you'll also have to give a little thought to reconciling the 1:1 queue/host model with the 1:n and n:m models, as far as identifying them in a list goes.
This is the true counter argument. If DRMAA monitoring gives no additional hints here, invalid combinations of valid machine / queue names in the job template could occur. Let's wait if any defender of queue list monitoring stands up. Otherwise, I propose to keep only the queue name attribute in the job template.
It's not a hard requirement from our side, but I think its utterly useful. In general, our end users are complaining more often than not if they need to manually retrieve resource details like service contact URLs or queue names from some obscure web page, instead of being able to retrieve those information programatically. The manual way is simply too error prone, tedious and static. Cheers, Andre.
/Peter.
Daniel
On 3/23/10 10:27 AM, Peter Tröger wrote:
As I said in the email I just wrote, I'm willing to be convinced of the value of adding queues to the job submission side of things. I am, however, fundamentally opposed to adding queues to the monitoring side.
I will heavily insist on queue support in DRMAAv2, This is a long demanded feature, which also popped up again in the survey.
The various concepts of queues are too different for that to make any sense. There is absolutely no way we will be able to model both LSF and SGE queues in a way that is abstract enough to be consistent and still specific enough to be meaningful and accurate. We'll talk on the next call. :)
The intention of the current model is that JobTemplate::queueName and MonitoringSession:: drmsQueueNames act as counterparts. DRMAA would promise that all strings that show up in MonitoringSession:: drmsQueueNames are valid input for JobTemplate::queueName. Nothing more. The use case are DRMAA-based portals and command-line applications. The interpretation of what a queue is can be provided by the library implementation - at the end, the user anyway has to reason about the meaning of queue names.
We could relax the conditions so that other values are also allowed in JobTemplate::queueName. This would allow MonitoringSession:: drmsQueueNames to return nothing in SGE. This must be anyway possible - Condor has no queue concept at all.
I could also agree to remove MonitoringSession:: queueMaxWallclockTime and MonitoringSession:: queueMaxSlotsAllowed, since these two attributes are the ones that demand a particular understanding of what a queue is.
Best, Peter.
-- Nothing is ever easy.