
Hi all, we had a very productive meeting with the DRMAA guys today, and will continue tomorrow. We'll send around notes next week I think, but for now I'd love to have some quick feedback on a very specific topic. At the moment, SAGA has the following job description attributes to specify job mnultiplicity, which are derived from the JSDL core and JSDL SPMD extension: - SPMDVariation - MPI type etc. - TotalCPUCount - total number of CPUs required by the job - NumberOfProcesses - number of instances of the Executable that the consuming system MUST start - ProcessesPerHost - number of instances of the Executable that the consuming system MUST start per host. - ThreadsPerProcess - number of threads per process (i.e., per instance of the Executable There is not much discussion about SPMDVariation (anymore), but the others are appearently somewhat inconsistent: - processes can be started by the backend, threads can't, so why specify ThreadsPerProcess? - hosts can have multiple CPUs - this is not reflected - CPUs can have multiple cores - this is not reflected - in general, the attributes seem somewhat inconsistent, and its hard to specify the values for specific use cases. Some of the concerns may get addressed by resource discovery and reservation, but not all. So, we would like to propose the following replacement list for these attributes: SPMDVariation NumberOfProcesses ProcessesPerMachine ProcessesPerSlot ProcessesPerCore (Slot and Machine are the DRMAA terms for CPU and Host). The NumberOfProcesses is to be interpreted as exact number by the backend. The ProcessesPerXYZ are to be interpreted as upper bounds by the backend. So, for example, one could specify a 16 process MPICH job as SPMDVariation = MPICH NumberOfProcesses = 16 ProcessesPerCore = 2 That would allow on to run on a 2-way QuadCore host, placing two processes on each of the 8 cores. It would also allow to run on two 4-way SingleCore hosts, placing one process on each core. Well, try to specify that with the current attributes - very difficult! So, the questions we have is - does the above proposal indeed capture the SAGA MPI use cases? - if not, which use cases are not covered? How can they be addressed? To be clear: this is not intented to be an errata to the current SAGA spec, but rather a consideration for the next SAGA Version... Thanks, Andre. -- Nothing is ever easy.