
Hi Andre, Thank you for the detailed email about SMPD. If you don't mind I'd like to answer the last points first and then comment on the rest.
So, here is the biggie: Is JSDL at some point considering to revise the SPMD spec?
Yes, it's definitely possible given new use cases and sufficient enthusiasm.
If so, can we expect something along the lines above, or is that, aehem, 'out of scope'? If not, how would you propose to align the SAGA use cases with JSDL/Glue/..., so that we still end up with an implementable standards stack?
We don't have an automatic 'out of scope' filter, contrary to popular belief. :-) But for obvious reasons I cannot guarantee you a specific outcome ahead of time. The end result depends on what use cases people think important. From what I heard in the past a number of people seem interested in adding 'core' support to spmd. If cores are adequately represented in the Glue schema, which we said we are going to use for future jsdl resource requirements, I would imagine it would not be a big deal; once other pieces are in place.
We don't really want to invent our own schemas, as its most likely that it will not map to JSDL then - but we need to cater our use cases one way or the other...
I hope you don't feel that you have to produce your own schemas. Let's talk about your use cases at OGF. I can try to arrange some time for this topic in the jsdl general session if you like. Rest inline: On Fri, 05 Mar 2010 20:33:14 +0900, Andre Merzky <andre@merzky.net> wrote:
Hia guys,
I'd like to share some thoughts and discussion points from the SAGA group, in respect to the JSDL SPMD spec. Sorry for the long post...
At the moment, SAGA has the following job description attributes to specify job multiplicity, which are derived from the JSDL core and JSDL SPMD extension:
- SPMDVariation - MPI type etc.
- TotalCPUCount - total number of CPUs required by the job
- NumberOfProcesses - number of instances of the Executable that the consuming system MUST start
- ProcessesPerHost - number of instances of the Executable that the consuming system MUST start per host.
- ThreadsPerProcess - number of threads per process (i.e., per instance of the Executable
There is not much discussion about SPMDVariation (anymore), but the others are appearently somewhat inconsistent:
- processes can be started by the backend, threads can't, so why specify ThreadsPerProcess? How is it to be used by the backend?
Looking back at the SPMD tracker we introduced ThreadsPerProcess for the OpenMP use case, as an indicator of computational weight. One expected usage pattern is described on page 9, 5.4.4. Sec. Attributes: actualIndividualCPUCount—An optional attribute. If true, the value of the individual number of CPUs allocated to the job on each host is used as the value of the ThreadsPerProcess element. Nowadays I would imagine we would have specified this as cores instead of cpus. In any case for a straight MPI application you would not need to use this element.
- hosts can have multiple CPUs - this is not reflected
I think this can be expressed in the JSDL resource requirements.
- CPUs can have multiple cores - this is not reflected
- cores can have multiple hardware threads - this is not reflected, unless this is what is meant by 'ThreadsPerProcess'?
The JSDL specs were done before cores were widely available. So I think it is to be expected that support for multi-cores is not present. (Unless you want to think of fractional CPU values as a way of expressing cores; let's not.)
In general, the attributes seem somewhat inconsistent, and not eay to use. Examples:
I won't argue the 'not easy to use' but in the examples below you should not be using the spmd elements for resource allocation. The spec (p4 & p8) is clear that spmd elements are intended to describe the application not its resources. I'll use jsdl schema examples below:
limit the number of hosts to 2, no matter the number of CPUs NumberOfProcesses = 10 ProcessesPerHost = 5 // specify proc per resource
The above does not guarantee only two hosts. The following does <jsdl:TotalResourceCount> <jsdl:Exact>2.0</jsdl:Exact> </jsdl:TotalResourceCount>
limit the number of CPUs to 2, no matter the number of CPUs per host NumberOfProcesses = 10 TotalCPUCount = 2 // specify total resource num
Only this part is effective <jsdl:TotalCPUCount> <jsdl:Exact>2.0</jsdl:Exact> </jsdl:TotalCPUCount>
Also, there is no way to ensure that an application instance obtains exactly one CPU, unless one limits ProcessesPerHost to 1, which will waste significant resources on multi-CPU systems.
No, exactly one CPU does not depend on ProcessesPerHost. It is simply <jsdl:TotalCPUCount> <jsdl:Exact>1.0</jsdl:Exact> </jsdl:TotalCPUCount> Unless you use <ExclusiveExecution> the default semantics are to share resources so you shouldn't be wasting resources on multi-cpu systems.
I guess the problem really is that JSDL tries to stay out of the resource description business, and thats probably a wise decision. That is what Glue&Co are about to deal with.
Nevertheless, the current specs are cumbersome from an *application* perspective.
We are curently considering to change our attributes, to
SPMDVariation NumberOfProcesses TotalNodeCount TotalSocketCount TotalCoreCount TotalHWThreadCount ProcessesPerNode ProcessesPerSocket (*) ProcessesPerCore ProcessesPerHWThread
The 'ProcessesPerXYZ' attributes are here interpreted as upper limits: the backend MUST NOT start more processes per XYZ, but MAY start less. Those attributes are not fully translatable into the JSDL attributes, but map pretty well to other job descriptions so far.
The typical use case for us would then boil down to
SPMDVariation = MPI NumberOfProcesses = 32 ProcessesPerHWThread = 1
which seems to be what most users want.
Other use cases we would be interested in, for example to start one additional IO process per node, would require more attributes and semantics than we are willing to introduce right now, like
additional attribs: HWThreadsPerCore HWThreadsPerSocket HWThreadsPerNode CoresPerSocket CoresPerNode SocketsPerNode
specification: SPMDVariation = MPI NumberOfProcesses = 32 + TotalNodeCount ProcessesPerNode = HWThreadsPerNode + 1
So, here is the biggie: Is JSDL at some point considering to revise the SPMD spec? If so, can we expect something along the lines above, or is that, aehem, 'out of scope'? If not, how would you propose to align the SAGA use cases with JSDL/Glue/..., so that we still end up with an implementable standards stack?
We don't really want to invent our own schemas, as its most likely that it will not map to JSDL then - but we need to cater our use cases one way or the other...
Best, Andre.
(*) Socket basically stands for CPU, but is supposed to clarify we are not talking about cores.
Take care, -- Andreas Savva Fujitsu Laboratories Ltd