quick poll: SPMD attributes

Hi all, we had a very productive meeting with the DRMAA guys today, and will continue tomorrow. We'll send around notes next week I think, but for now I'd love to have some quick feedback on a very specific topic. At the moment, SAGA has the following job description attributes to specify job mnultiplicity, which are derived from the JSDL core and JSDL SPMD extension: - SPMDVariation - MPI type etc. - TotalCPUCount - total number of CPUs required by the job - NumberOfProcesses - number of instances of the Executable that the consuming system MUST start - ProcessesPerHost - number of instances of the Executable that the consuming system MUST start per host. - ThreadsPerProcess - number of threads per process (i.e., per instance of the Executable There is not much discussion about SPMDVariation (anymore), but the others are appearently somewhat inconsistent: - processes can be started by the backend, threads can't, so why specify ThreadsPerProcess? - hosts can have multiple CPUs - this is not reflected - CPUs can have multiple cores - this is not reflected - in general, the attributes seem somewhat inconsistent, and its hard to specify the values for specific use cases. Some of the concerns may get addressed by resource discovery and reservation, but not all. So, we would like to propose the following replacement list for these attributes: SPMDVariation NumberOfProcesses ProcessesPerMachine ProcessesPerSlot ProcessesPerCore (Slot and Machine are the DRMAA terms for CPU and Host). The NumberOfProcesses is to be interpreted as exact number by the backend. The ProcessesPerXYZ are to be interpreted as upper bounds by the backend. So, for example, one could specify a 16 process MPICH job as SPMDVariation = MPICH NumberOfProcesses = 16 ProcessesPerCore = 2 That would allow on to run on a 2-way QuadCore host, placing two processes on each of the 8 cores. It would also allow to run on two 4-way SingleCore hosts, placing one process on each core. Well, try to specify that with the current attributes - very difficult! So, the questions we have is - does the above proposal indeed capture the SAGA MPI use cases? - if not, which use cases are not covered? How can they be addressed? To be clear: this is not intented to be an errata to the current SAGA spec, but rather a consideration for the next SAGA Version... Thanks, Andre. -- Nothing is ever easy.

Hi, On Dec 10, 2009, at 3:34 PM, Andre Merzky wrote:
Hi all,
we had a very productive meeting with the DRMAA guys today, and will continue tomorrow. We'll send around notes next week I think, but for now I'd love to have some quick feedback on a very specific topic.
At the moment, SAGA has the following job description attributes to specify job mnultiplicity, which are derived from the JSDL core and JSDL SPMD extension:
- SPMDVariation - MPI type etc.
- TotalCPUCount - total number of CPUs required by the job
- NumberOfProcesses - number of instances of the Executable that the consuming system MUST start
- ProcessesPerHost - number of instances of the Executable that the consuming system MUST start per host.
- ThreadsPerProcess - number of threads per process (i.e., per instance of the Executable
There is not much discussion about SPMDVariation (anymore), but the others are appearently somewhat inconsistent:
- processes can be started by the backend, threads can't, so why specify ThreadsPerProcess? - hosts can have multiple CPUs - this is not reflected - CPUs can have multiple cores - this is not reflected - in general, the attributes seem somewhat inconsistent, and its hard to specify the values for specific use cases.
I agree. The current set of attributes is somewhat clumsy and not always intuitive to use. The notion of a thread should not be reflected in the job description - it should be handled on application side.
Some of the concerns may get addressed by resource discovery and reservation, but not all. So, we would like to propose the following replacement list for these attributes:
SPMDVariation NumberOfProcesses ProcessesPerMachine ProcessesPerSlot ProcessesPerCore
(Slot and Machine are the DRMAA terms for CPU and Host).
Let's go with CPU and host then. Especially slot is IMHO a rather weird term - cpu is so much easier to understand.
The NumberOfProcesses is to be interpreted as exact number by the backend. The ProcessesPerXYZ are to be interpreted as upper bounds by the backend.
So, for example, one could specify a 16 process MPICH job as
SPMDVariation = MPICH NumberOfProcesses = 16 ProcessesPerCore = 2
Ok, but what if the backend doesn't understand these things, (RSL for example only understands number of nodes and number of processes). In this case the adaptor itself would have to (a) figure out the number of cores per cpu (b) the number of cpus per node and (c) make a reservation for (16/2/#cores_per_cpu/#cores_per_node) nodes. The globus GRAM adaptor already does something similar to this due to the shortcomings of RSL. Not pretty. And the more possible attributes you have, the more options you give the user to define the same thing. E.g
That would allow on to run on a 2-way QuadCore host, placing two processes on each of the 8 cores.
It would also allow to run on two 4-way SingleCore hosts, placing one process on each core.
Well, try to specify that with the current attributes - very difficult!
So, the questions we have is
- does the above proposal indeed capture the SAGA MPI use cases? - if not, which use cases are not covered? How can they be addressed?
To be clear: this is not intented to be an errata to the current SAGA spec, but rather a consideration for the next SAGA Version...
Thanks, Andre.
Cheers, Ole
-- Nothing is ever easy. -- saga-rg mailing list saga-rg@ogf.org http://www.ogf.org/mailman/listinfo/saga-rg

Hi Ole, Quoting [Ole Christian Weidner] (Dec 10 2009):
Some of the concerns may get addressed by resource discovery and reservation, but not all. So, we would like to propose the following replacement list for these attributes:
SPMDVariation NumberOfProcesses ProcessesPerMachine ProcessesPerSlot ProcessesPerCore
(Slot and Machine are the DRMAA terms for CPU and Host).
Let's go with CPU and host then. Especially slot is IMHO a rather weird term - cpu is so much easier to understand.
Peter corrected me: its socket, not slot. Anyway, I am by no means sold to those terms, but rather want to nail down concept and hierarchy. Machine and CPU would be fine with me, as would be node host machine and cpu socket processor
The NumberOfProcesses is to be interpreted as exact number by the backend. The ProcessesPerXYZ are to be interpreted as upper bounds by the backend.
So, for example, one could specify a 16 process MPICH job as
SPMDVariation = MPICH NumberOfProcesses = 16 ProcessesPerCore = 2
Ok, but what if the backend doesn't understand these things, (RSL for example only understands number of nodes and number of processes). In this case the adaptor itself would have to (a) figure out the number of cores per cpu (b) the number of cpus per node and (c) make a reservation for (16/2/#cores_per_cpu/#cores_per_node) nodes. The globus GRAM adaptor already does something similar to this due to the shortcomings of RSL. Not pretty.
Right - but it is even worse if the user has to do it on application space. So, I guess it is like with all other JD attributes: if it is specified, it must be honored - which may effectively limit the number of available backends...
And the more possible attributes you have, the more options you give the user to define the same thing. E.g
eg? :-) Thanks! Andre. -- Nothing is ever easy.
participants (2)
-
Andre Merzky
-
Ole Weidner