The way SGE (and I think LSF) handles parallel jobs is that there is always a master/slave concept. The DRM system allocates the nodes, starts the master task, and tells it where all the slaves are. The master task is then responsible for starting the slave tasks, usually via the DRM. Maybe I'm missing some context, but this conversation sounds *way* outside of the context of DRMAA to me. DRMAA has nothing to do with how a job is launched. DRMAA is purely on the job management side: submission, monitoring, and control. Daniel On 03/24/10 05:54, Peter Tröger wrote:
Hi Yves,
thanks for a good discussion in Munich, I hope we can rely on your user perspective also in the future.
I understand why you don't want to put a mean to get the hostnamesfile for an MPI code, since it's should be transparently done in the configName (correct name if my rememberings are well).
But I thought of a different use case: a code is just launched on all machines. This code is a socket based one, thus it needs to know the other machine names to be able to run correctly. Of course, this could be bypassed with the use of an external machine where a daemon runs, and where running codes can register -- I think of it like an omniNames running for example. Another solution is to encapsulate applications in an MPI code just to, maybe, have that information.
For me, it sounds like getting the information about allocated machines (for a job) on each of the execution hosts. I wonder if this information is provided by the different DRM systems. Does that depend on the parallelization technology, such as the chosen MPI library ?
Best, Peter.
But don't you think that the cost is very big (if possible: a lot of policy is to not let run user code on the frontal, and a machine only knows that itself is taking part to the parallel run) compared to the possibility to at least having the possibility to copy the file containing the hostnames to all reserved nodes?
Bon courage for the discussions today! Cheers.
.Yves.
-- Yves Caniou Associate Professor at Université Lyon 1, Member of the team project INRIA GRAAL in the LIP ENS-Lyon, Délégation CNRS in Japan French Laboratory of Informatics (JFLI), * in Information Technology Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan tel: +81-3-5841-0540 * in National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan tel: +81-3-4212-2412 http://graal.ens-lyon.fr/~ycaniou/ -- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg