[SAGA-RG] quick poll: SPMD attributes

10 Dec 2009

      Hi all, 

we had a very productive meeting with the DRMAA guys today, and will
continue tomorrow.  We'll send around notes next week I think, but
for now I'd love to have some quick feedback on a very specific
topic.

At the moment, SAGA has the following job description attributes to
specify job mnultiplicity, which are derived from the JSDL core and
JSDL SPMD extension:

  - SPMDVariation
    - MPI type etc.

  - TotalCPUCount
    - total number of CPUs required by the job

  - NumberOfProcesses
    - number of instances of the Executable that the consuming
      system MUST start

  - ProcessesPerHost
    - number of instances of the Executable that the consuming
      system MUST start per host.

  - ThreadsPerProcess
    - number of threads per process (i.e., per instance of the
      Executable

There is not much discussion about SPMDVariation (anymore), but the
others are appearently somewhat inconsistent:

  - processes can be started by the backend, threads can't, so why
    specify ThreadsPerProcess?
  - hosts can have multiple CPUs - this is not reflected
  - CPUs can have multiple cores - this is not reflected
  - in general, the attributes seem somewhat inconsistent, and its
    hard to specify the values for specific use cases.

Some of the concerns may get addressed by resource discovery and
reservation, but not all.  So, we would like to propose the
following replacement list for these attributes:

  SPMDVariation
  NumberOfProcesses
  ProcessesPerMachine
  ProcessesPerSlot
  ProcessesPerCore

(Slot and Machine are the DRMAA terms for CPU and Host).

The NumberOfProcesses is to be interpreted as exact number by the
backend.  The ProcessesPerXYZ are to be interpreted as upper bounds
by the backend.  

So, for example, one could specify a 16 process MPICH job as

  SPMDVariation     = MPICH
  NumberOfProcesses = 16
  ProcessesPerCore  = 2

That would allow on to run on a 2-way QuadCore host, placing two
processes on each of the 8 cores.

It would also allow to run on two  4-way SingleCore hosts, placing
one process on each core.

Well, try to specify that with the current attributes - very
difficult!

So, the questions we have is

  - does the above proposal indeed capture the SAGA MPI use cases?
  - if not, which use cases are not covered?  How can they be
    addressed?

To be clear: this is not intented to be an errata to the current
SAGA spec, but rather a consideration for the next SAGA Version...

Thanks, Andre.

-- 
Nothing is ever easy.

[SAGA-RG] quick poll: SPMD attributes

Andre Merzky