
Hia guys, I'd like to share some thoughts and discussion points from the SAGA group, in respect to the JSDL SPMD spec. Sorry for the long post... At the moment, SAGA has the following job description attributes to specify job multiplicity, which are derived from the JSDL core and JSDL SPMD extension: - SPMDVariation - MPI type etc. - TotalCPUCount - total number of CPUs required by the job - NumberOfProcesses - number of instances of the Executable that the consuming system MUST start - ProcessesPerHost - number of instances of the Executable that the consuming system MUST start per host. - ThreadsPerProcess - number of threads per process (i.e., per instance of the Executable There is not much discussion about SPMDVariation (anymore), but the others are appearently somewhat inconsistent: - processes can be started by the backend, threads can't, so why specify ThreadsPerProcess? How is it to be used by the backend? - hosts can have multiple CPUs - this is not reflected - CPUs can have multiple cores - this is not reflected - cores can have multiple hardware threads - this is not reflected, unless this is what is meant by 'ThreadsPerProcess'? In general, the attributes seem somewhat inconsistent, and not eay to use. Examples: limit the number of hosts to 2, no matter the number of CPUs NumberOfProcesses = 10 ProcessesPerHost = 5 // specify proc per resource limit the number of CPUs to 2, no matter the number of CPUs per host NumberOfProcesses = 10 TotalCPUCount = 2 // specify total resource num Also, there is no way to ensure that an application instance obtains exactly one CPU, unless one limits ProcessesPerHost to 1, which will waste significant resources on multi-CPU systems. I guess the problem really is that JSDL tries to stay out of the resource description business, and thats probably a wise decision. That is what Glue&Co are about to deal with. Nevertheless, the current specs are cumbersome from an *application* perspective. We are curently considering to change our attributes, to SPMDVariation NumberOfProcesses TotalNodeCount TotalSocketCount TotalCoreCount TotalHWThreadCount ProcessesPerNode ProcessesPerSocket (*) ProcessesPerCore ProcessesPerHWThread The 'ProcessesPerXYZ' attributes are here interpreted as upper limits: the backend MUST NOT start more processes per XYZ, but MAY start less. Those attributes are not fully translatable into the JSDL attributes, but map pretty well to other job descriptions so far. The typical use case for us would then boil down to SPMDVariation = MPI NumberOfProcesses = 32 ProcessesPerHWThread = 1 which seems to be what most users want. Other use cases we would be interested in, for example to start one additional IO process per node, would require more attributes and semantics than we are willing to introduce right now, like additional attribs: HWThreadsPerCore HWThreadsPerSocket HWThreadsPerNode CoresPerSocket CoresPerNode SocketsPerNode specification: SPMDVariation = MPI NumberOfProcesses = 32 + TotalNodeCount ProcessesPerNode = HWThreadsPerNode + 1 So, here is the biggie: Is JSDL at some point considering to revise the SPMD spec? If so, can we expect something along the lines above, or is that, aehem, 'out of scope'? If not, how would you propose to align the SAGA use cases with JSDL/Glue/..., so that we still end up with an implementable standards stack? We don't really want to invent our own schemas, as its most likely that it will not map to JSDL then - but we need to cater our use cases one way or the other... Best, Andre. (*) Socket basically stands for CPU, but is supposed to clarify we are not talking about cores. -- Nothing is ever easy.