FW: Comments to the HPC Use Cases: Base Case and Common Cases

-----Original Message----- From: Marvin Theimer [mailto:theimer@microsoft.com] Sent: Monday, April 24, 2006 9:06 PM To: Balle, Susanne Cc: Treadwell, Jem; Marvin Theimer Subject: RE: Comments to the HPC Use Cases: Base Case and Common Cases Hi; Thanks for your input! My comments on your comments are in-line. How do you feel about my posting this email to the ogsa-wg mailing list so that others can see the issues you've raised plus my responses? No problem if you'd rather not, but I think it would be interesting to the larger community if you do feel comfortable with it. Let me know -- I will keep this email thread private until you tell me otherwise. Marvin. -----Original Message----- From: Balle, Susanne [mailto:Susanne.Balle@hp.com] Sent: Monday, April 24, 2006 4:44 PM To: Marvin Theimer Cc: Balle, Susanne; Treadwell, Jem Subject: Comments to the HPC Use Cases: Base Case and Common Cases Marvin, I read the document "HPC Use Cases: Base Case and Common Cases" and thought it was a good start. I do have a couple of comments about the documents which I have enclosed below: 1. I would re-title the document to "HPC Job Scheduling Use Cases -- Base Case and Common Case". The main reason is that too many topics which are important to HPC are "out-of-band" areas with regards to the focus of this document. The document does only focus on job scheduling so why not just cal it that? [MT] Good point. I'll try it and see how others react. 2. Under Base Case In this section I would like to add a point about "users being able to query for available resources". I do that all the time before I launch a job. It is nice to know what resources are currently available or currently up so that I don't submit a job for 512 nodes and only 510 of the 512 nodes are up. [MT] This seems reasonable. A key thing will be to define what the "minimal" set of useful information is. Additional "commonly desired" information should then be defined as common case extensions. 3. Page 3 (top). Reading this Section made me decide that renaming the document made more sense. This is NOT "HPC Use Cases" but "HPC Job Scheduling Base Cases". This will allow others to follow your lead and create "HPC Other topic (.ie. Data management, etc.) Use Cases". [MT] Agreed. 4. Page 3 (bottom). One important aspect that you have left out here is that the individual clusters should remain under the control of their local system administrator and/or of their local policies. You cannot impose a FIFO policy onto clusters with a different scheduling policy. [MT] You raise a good point. My goal in specifying FIFO was to say that the "simplest" scheduling policy would be the only thing "required" of all schedulers. But I think the true base case is that a scheduler is free to pick whatever scheduling policy (or policies) that it wishes. I.e. that specification of required scheduling policies is out-of-scope for the base case. 5. Page 3 (bottom) I am not quick sure what you mean by "the only scheduling policy supported is FIFO". Where do you mean? Aren't you planning on passing the jobs onto the local scheduler which will then apply whatever policy it is set to obey? Do you mean that in the scheduler infrastructure you want to create you will only support FIFO which in fact will just consist in passing it on to the local scheduler? I am a little confused after reading this section. [MT] My comment to point 4 is applicable here. My goal was to define the "smallest" or "simplest" set of interop requirements possible for the base case. Extensions could then define scheduling policies that a scheduler promises to provide. Does "only supporting FIFO" mean that you will only submit one job at the time to a cluster? [MT] No, if more than one job will fit on the cluster then there is no reason not to run two or more jobs simultaneously. But I'm changing the base case to not require any particular scheduling policy in any case. 6. " A job consists of a single program running on a single node". I believe this is too restrictive. MPI programs need to be considered to make sure that we have the right level of confidence that the design/infrastructure will work for parallel programs as well. How about OpenMP or threaded programs? Have you taken this type of programs into consideration? I do not see them mentioned anywhere. [MT] These fall under the "common cases that will be handled via extensions" category. Note that support for MPI programs typically involves infrastructure (such as SMPD daemons) that not all scheduling systems necessarily support; hence MPI programs are not explicitly supported in the base case. The whole purpose of defining the various common cases in the next section of the document is to ensure that we address how to handle things like MPI programs correctly by means of thought-out extensions. The same applies for threaded/OpenMP programs. 7. After having read Section 2 I believe that the common case is too restrictive. I understand that you want things to be simple but maybe they have become too simple. I am worried that you cannot extend this simple base case to fit most common requirements for HPC applications. [MT] Figuring out how to extend the base case to cover all the listed common cases is the task of the working group. :-) Personally I believe that we can define suitable extensions that will enable the simple base case I've defined to be extended to the common cases. If that's not the case then we'll certainly change the base case as necessary. But I want to start with the simplest possible base case since it's a constant battle to keep peoples' pet features out of the base design and it's a slippery slope to hell once you step away from the absolutely most minimal base case that lets the whole edifice (base case plus extensions) hang together. 8. Section 3.1 You forgot SLURM in the list and also Moab which is the commercial product from Cluster Resources who support Maui. Moab's Grid scheduler is very interested and offers a lot of very desirable features for a Grid environment. More info on SLURm is available at: http://www.llnl.gov/linux/slurm/slurm.html I am working on SLURM and would be happy to provide you with what you need for your document or to answer any questions you have about SLURM. [MT] Great! I really appreciate it. I have some more comments on the remaining sections. I will send them in a separate email tomorrow. Regards Susanne ----------------------------------------- Susanne M. Balle Hewlett-Packard High Performance Computing R&D Organization MS ZKO01-3 110 Spit Brook Road Nashua, NH 03062 Phone: 603-884-7732 Fax: 603-884-0630 Susanne.Balle@hp.com
participants (1)
-
Balle, Susanne