Re: [SAGA-RG] quick poll: SPMD attributes
hi Andre, Quoting [Andre Luckow] (Dec 11 2009):
Hi,
Quoting [Ole Christian Weidner] (Dec 10 2009):
Some of the concerns may get addressed by resource discovery and reservation, but not all. So, we would like to propose the following replacement list for these attributes:
SPMDVariation NumberOfProcesses ProcessesPerMachine ProcessesPerSlot ProcessesPerCore
(Slot and Machine are the DRMAA terms for CPU and Host).
Let's go with CPU and host then. Especially slot is IMHO a rather weird term - cpu is so much easier to understand.
Peter corrected me: its socket, not slot. Anyway, I am by no means sold to those terms, but rather want to nail down concept and hierarchy. Machine and CPU would be fine with me, as would be
node host machine
and
cpu socket processor
I agree in the point that the above attributes are better than the existing SAGA one. However, I believe the way these attributes can be used can become quite complicated. It requires detailed advanced knowledge of the Grid resources that will be used. Further, the user can make contradicting specifications, e.g. if the ProcessesPerMachine attribute does not fit to the ProcessesPerSockets attribute.
Correct. For knowledge: that system would only make sense (or at least make more sense) if combined with a resource discovery and description. Contradiction: well, that is often the case anyway, like you specify an interactive job AND IO/redirection into files, or specify 4 processes but no SPMDVariation. Not sure what to do about that...
On a space sharing system a user usually cannot influence the number of cpus/cores he gets per node. E.g. on LONI you must use 8 cores per node on QB and 4 cores per node on all other machines. Thus, the most common usage mode will probably be:
NumberOfProcesses = x ProcessesPerMachine == number_of_cores_per_cpu * number_of_sockets ProcessesPerSockets == number_of_cores_per_sockets ProcessesPerCore = 1
The question is: How can this 80% case be efficiently supported without loosing the flexibility in the other cases? Does it make sense to declare default values for the ProcessesXXX attributes and to declare them optional?
Agree, they absolutely should be optional, and default to ProcessesPerMachine = unspecified ProcessesPerSockets = unspecified ProcessesPerCore = 1 And right, you can't enforce number of cpus/cores, but you can set upper bounds, to make either sure you are not getting on a core where xxx processes are already running, or, on the other end, allow the DRM to use a core for multiple process instances. Makes sense?
The NumberOfProcesses is to be interpreted as exact number by the backend. The ProcessesPerXYZ are to be interpreted as upper bounds by the backend.
So, for example, one could specify a 16 process MPICH job as
SPMDVariation = MPICH NumberOfProcesses = 16 ProcessesPerCore = 2
Ok, but what if the backend doesn't understand these things, (RSL for example only understands number of nodes and number of processes). In this case the adaptor itself would have to (a) figure out the number of cores per cpu (b) the number of cpus per node and (c) make a reservation for (16/2/#cores_per_cpu/#cores_per_node) nodes. The globus GRAM adaptor already does something similar to this due to the shortcomings of RSL. Not pretty.
Right - but it is even worse if the user has to do it on application space. So, I guess it is like with all other JD attributes: if it is specified, it must be honored - which may effectively limit the number of available backends...
Even worse, different Globus installations differently interpret the "count" RSL attribute. Just try to use Abe, Ranger and QB with the same RSL file you will see a different behavior. It's just too easy to "hack" the JobManager Perl script of Globus.
:-( Can't do nothin about that I guess... Thanks! Andre.
Regards, Andre
-- Nothing is ever easy.
On Dec 11, 2009, at 11:26 AM, Andre Merzky wrote:
hi Andre,
Quoting [Andre Luckow] (Dec 11 2009):
Correct. For knowledge: that system would only make sense (or at least make more sense) if combined with a resource discovery and description.
I totally agree that SAGA's current attributes could be improved. So I am in favour of something better in a DRMAA2 package... The proposal looks good (on first sight). I suggest to have a look at the XtreemOS resource package that I had mailed here earlier. It also has a resource description class, we could fill the here-made proposal in there... (I do not see the direct need for a resource discovery though. Would be "nice to have" but not essential... Thilo -- Thilo Kielmann http://www.cs.vu.nl/~kielmann/
participants (2)
-
Andre Merzky
-
Thilo Kielmann