
Ok ... here are my thoughts on process topology and what's currently expressible in JSDL. First, I'll list some use cases (they're all parallel jobs): 1. Simple MPI job. Wants 32 processors with 1 processor per resource (in JSDL, a host is a "resource"). 2. OpenMPI job. Wants 32 processors with 8 processors per resource. 3. An OpenMP job. Wants 32 processors. Shared mem of course, so one resource. 4. A "homegrown" master/slave parallel job (say a ligand docking job). Wants 32 processors. No tiling constraints at all. * Note that I'm specifically leaving out the Naregi "coupled simulation" use case (sorry guys), since we determined at the last GGF that it was a case which could be decomposed into multiple JSDL documents. Second ... what is process topology? It provides the user a way to express how resources should be _allocated_ given the characteristics of the job (usually in terms of IO patterns ... e.g. network communication, disk IO channel contention, etc). Thus, it's used when the resource manager is _allocating_ the resources, not when the job is being started/launched. Therefore, none of the elements used to express process topology should be in the POSIXApplication section What we have in JSDL now: ResourceCount (how many "resources" i.e. hosts I want) CPUCount (how many processors _per resource_) TileSize (how many processors to allocate per resource as a unit) ProcessCount (total number of _processes_ that the job will use to execute the job) I will argue that ProcessCount is useless for the purposes of process topology, since a) it isn't about allocation, and b) there isn't enough information to tell me how to start/launch a parallel job. It isn't about allocation since it is irrelevant to the scheduler whether I'll be computing using threads or processes. It isn't useful for launching because it doesn't tell me how to spread the ProcessCount processes given a particular allocated topology. So that leaves the rest of them. TileSize and CPUCount are pretty much the same thing. At least for 80% (or more) of the uses I've seen. The only thing that might cause them to differ is that I could possibly allocate more than one tile on a host. Given that CPUCount is a range and that we could express step values in the range (we can express step values in the range, right?), we don't need TileSize any more. Note: I'm making an assumption here that CPUCount is the number of cpus that I want from the resource, rather than an expression of how many cpus the host needs to have configured. If it is the latter, then we do need TileSize, and replace CPUCount in my examples below with TileSize. So let's see how these map to the use cases. 1. ResourceCount == 32, CPUCount == 1 -> LSF : "-n 32 -R span[ptile=1]" -> PBS : "-l nodes=32:ppn=1" (ppn=1 might be the default) 2. ResourceCount == 4, CPUCount == 8 -> LSF : "-n 32 -R span[ptile=8]" -> PBS : "-l nodes=4:ppn=8" 3. ResourceCount == 1, CPUCount == 32 -> LSF : "-n 32 -R span[hosts=1]" (hosts=1 equivalent to ptile=<-n val>) -> PBS : "-l nodes=1:ppn=32" 4. ResourceCount == 32, CPUCount == 1 -> oops ... it doesn't care about tiling ResourceCount == 1, CPUCount == 32 -> hmm ... artificial constraint ... would suck on a blade cluster ResourceCount == 1-32, CPUCount == 1,32 -> oops again ... I might get a total allocation of 32*32 cpus * there seems to be a gap! If we had a term called "TotalCPUCount" for the entire job, I could do: 4. TotalCPUCount == 32 -> LSF : "-n 32" -> PBS : "not sure how to express" It basically means to grab 32 cpus, regardless of how they are spread. Basically I just need cpus. This is used a whole hell of a lot within our customer base. So ... in summary ... I propose: CPUCount (as is if it's allocated cpus per resource) TileSize (iff CPUCount is an expression of configured cpus in a host) ResourceCount (as is ... hmmm ... maybe the default value needs to change) TotalCPUCount (how many cpus this jobs needs to run in total) -- Chris

Just to followup with one issue I've identified. The current specification has a default value of ResourceCount to be 1. In order for the TotalCPUCount thing to work properly, the default should be undefined (much like CPUCount). By the way ... setting a default TotalCPUCount to 1 implies a ResourceCount of 1 when allocation is done. -- Chris On 19/4/05 09:40, "Christopher Smith" <csmith@platform.com> wrote:
Ok ... here are my thoughts on process topology and what's currently expressible in JSDL.
First, I'll list some use cases (they're all parallel jobs):
1. Simple MPI job. Wants 32 processors with 1 processor per resource (in JSDL, a host is a "resource").
2. OpenMPI job. Wants 32 processors with 8 processors per resource.
3. An OpenMP job. Wants 32 processors. Shared mem of course, so one resource.
4. A "homegrown" master/slave parallel job (say a ligand docking job). Wants 32 processors. No tiling constraints at all.
* Note that I'm specifically leaving out the Naregi "coupled simulation" use case (sorry guys), since we determined at the last GGF that it was a case which could be decomposed into multiple JSDL documents.
Second ... what is process topology? It provides the user a way to express how resources should be _allocated_ given the characteristics of the job (usually in terms of IO patterns ... e.g. network communication, disk IO channel contention, etc). Thus, it's used when the resource manager is _allocating_ the resources, not when the job is being started/launched. Therefore, none of the elements used to express process topology should be in the POSIXApplication section
What we have in JSDL now:
ResourceCount (how many "resources" i.e. hosts I want) CPUCount (how many processors _per resource_) TileSize (how many processors to allocate per resource as a unit) ProcessCount (total number of _processes_ that the job will use to execute the job)
I will argue that ProcessCount is useless for the purposes of process topology, since a) it isn't about allocation, and b) there isn't enough information to tell me how to start/launch a parallel job. It isn't about allocation since it is irrelevant to the scheduler whether I'll be computing using threads or processes. It isn't useful for launching because it doesn't tell me how to spread the ProcessCount processes given a particular allocated topology.
So that leaves the rest of them.
TileSize and CPUCount are pretty much the same thing. At least for 80% (or more) of the uses I've seen. The only thing that might cause them to differ is that I could possibly allocate more than one tile on a host. Given that CPUCount is a range and that we could express step values in the range (we can express step values in the range, right?), we don't need TileSize any more.
Note: I'm making an assumption here that CPUCount is the number of cpus that I want from the resource, rather than an expression of how many cpus the host needs to have configured. If it is the latter, then we do need TileSize, and replace CPUCount in my examples below with TileSize.
So let's see how these map to the use cases.
1. ResourceCount == 32, CPUCount == 1 -> LSF : "-n 32 -R span[ptile=1]" -> PBS : "-l nodes=32:ppn=1" (ppn=1 might be the default)
2. ResourceCount == 4, CPUCount == 8 -> LSF : "-n 32 -R span[ptile=8]" -> PBS : "-l nodes=4:ppn=8"
3. ResourceCount == 1, CPUCount == 32 -> LSF : "-n 32 -R span[hosts=1]" (hosts=1 equivalent to ptile=<-n val>) -> PBS : "-l nodes=1:ppn=32"
4. ResourceCount == 32, CPUCount == 1 -> oops ... it doesn't care about tiling ResourceCount == 1, CPUCount == 32 -> hmm ... artificial constraint ... would suck on a blade cluster ResourceCount == 1-32, CPUCount == 1,32 -> oops again ... I might get a total allocation of 32*32 cpus
* there seems to be a gap!
If we had a term called "TotalCPUCount" for the entire job, I could do:
4. TotalCPUCount == 32 -> LSF : "-n 32" -> PBS : "not sure how to express"
It basically means to grab 32 cpus, regardless of how they are spread. Basically I just need cpus. This is used a whole hell of a lot within our customer base.
So ... in summary ... I propose:
CPUCount (as is if it's allocated cpus per resource) TileSize (iff CPUCount is an expression of configured cpus in a host) ResourceCount (as is ... hmmm ... maybe the default value needs to change) TotalCPUCount (how many cpus this jobs needs to run in total)
-- Chris

Chris: One concern I have is that in typical use with GRAM our users expect a certain number of job processes to be launched across all resources. In other words, their typical job is a fixed-layout SPMD-style executable or a master/slave executable and they do not do any explicit process startup in the program. I am nearly certain that we pass the GRAM "count" attribute as the allocation parameter (-n in your example) to the local scheduler but depending on the value of our jobType=single|multi|mpi attribute we also construct a job script which does the launching of all processes. We also have a "hostCount" which we use to sort out SMP allocation issues in a simple round-robin fashion. So, we view a single set of declarative parameters as having a projection into both allocation and job-control functions. We do not handle fancier topology options except through some site-specific extensions that people have tried. I am not sure what your note means when it says to allocate "as a unit", so I'd like clarification on that before we move on. I think it is a little too extreme to say JSDL shouldn't express job process topology. That seems just important to me as resource topology, and the "isomorphism" or mapping between the two is also important. What you rightly point out is that the operational behavior to _get_ the mapping is dependent on some combination of the job and local scheduler runtime environments, e.g. does the manager or the job take care of task launch. I never much liked our jobType attribute, but it is there because this problem had to be addressed somehow. I would accept an alternate rendering where separate logical managers accepted a homogeneous job type and had a fixed behavior, so this would be implicit context. This would also better handle the esoteric "which mpirun to use" problem we can have with "mpi" type jobs on systems with native MPI and MPICH-G, etc. So to reiterate: shouldn't JSDL express job process topology or mapping into resources even if it is underspecified "how" the processes will get that way? Is it necessary to express two topologies, or just to expect that they have identical expression in one place in JSDL, and it requires context and/or extended content to define how the processes get that way? karl -- Karl Czajkowski karlcz@univa.com

If we need to specify different mechanisms for starting up tasks of a parallel job a la the RSL jobType, then I'd like that to be separate from the description of the resource allocation required. For what it's worth, queuing systems like LSF/PBS/SGE don't handle this startup phase (it's up to the job), so I'd like to see some example terms describing job process topology (basically simple|multi|mpi use cases), since I'm not too sure what they would look like, or what semantics would be required. Allocate "as a unit" just means that if I'm going to allocate any cpus from a resource, I have to allocate "tileSize" cpus. -- Chris On 19/4/05 19:04, "Karl Czajkowski" <karlcz@univa.com> wrote:
Chris:
One concern I have is that in typical use with GRAM our users expect a certain number of job processes to be launched across all resources. In other words, their typical job is a fixed-layout SPMD-style executable or a master/slave executable and they do not do any explicit process startup in the program.
I am nearly certain that we pass the GRAM "count" attribute as the allocation parameter (-n in your example) to the local scheduler but depending on the value of our jobType=single|multi|mpi attribute we also construct a job script which does the launching of all processes. We also have a "hostCount" which we use to sort out SMP allocation issues in a simple round-robin fashion. So, we view a single set of declarative parameters as having a projection into both allocation and job-control functions.
We do not handle fancier topology options except through some site-specific extensions that people have tried. I am not sure what your note means when it says to allocate "as a unit", so I'd like clarification on that before we move on.
I think it is a little too extreme to say JSDL shouldn't express job process topology. That seems just important to me as resource topology, and the "isomorphism" or mapping between the two is also important. What you rightly point out is that the operational behavior to _get_ the mapping is dependent on some combination of the job and local scheduler runtime environments, e.g. does the manager or the job take care of task launch.
I never much liked our jobType attribute, but it is there because this problem had to be addressed somehow. I would accept an alternate rendering where separate logical managers accepted a homogeneous job type and had a fixed behavior, so this would be implicit context. This would also better handle the esoteric "which mpirun to use" problem we can have with "mpi" type jobs on systems with native MPI and MPICH-G, etc.
So to reiterate: shouldn't JSDL express job process topology or mapping into resources even if it is underspecified "how" the processes will get that way? Is it necessary to express two topologies, or just to expect that they have identical expression in one place in JSDL, and it requires context and/or extended content to define how the processes get that way?
karl

On Apr 19, Christopher Smith loaded a tape reading:
If we need to specify different mechanisms for starting up tasks of a parallel job a la the RSL jobType, then I'd like that to be separate from the description of the resource allocation required.
For what it's worth, queuing systems like LSF/PBS/SGE don't handle this startup phase (it's up to the job), so I'd like to see some example terms describing job process topology (basically simple|multi|mpi use cases), since I'm not too sure what they would look like, or what semantics would be required.
Allocate "as a unit" just means that if I'm going to allocate any cpus from a resource, I have to allocate "tileSize" cpus.
-- Chris
Well, I am struggling because I do not want to propose creeping featurism for JSDL... if possible, I think the startup mechanism should be left to extensions because it is such a rich and messy thing as I will try to describe below. What I am struggling to understand w.r.t. JSDL is whether there is some aspect of job layout that is a meaningful part of the job definition but not as simple as the resource topology stuff you were discussing. Because of the RSL legacy, I keep wanting to see some generic concepts for process count etc. that are orthogonal to the specific startup mechanism but which in essence parameterize both allocation and job startup. Perhaps if resource topology is precise enough, there is nothing more needed? Maybe a precise description of allocated resources defines a "job shaped hole" into which an implied job topology would fit? :-) The constellation of resource requirements and posix limits (and any other extensions?) is what defines the virtual resource or "job shaped hole" within which the executable is activated. A practical runtime environment feature might be for a job system like GRAM to expose a "resource map" in the form of JSDL resource syntax in a file or environment variable so the job can introspect on the actual allocation it received... this is a different but related portability/interop problem for job execution systems when you include runtime middleware in the executable. For example, if a future MPICH-Gx release supports the dynamic task features of MPI, the runtime implementation might require this sort of information from the scheduler so it can work within its allocation? karl OK, here is the messy stuff I hope can remain somehow out of scope but still feasible. Basically, GRAM is a higher-level job submission model than what you describe for LSF/PBS/SGE where we try to provide a more generic user-oriented job model instead of the very low-level "job script" model of the local scheduler. Job types in RSL are different activation methods: single: one instance of executable is activated and it must, through site-specific means, do whatever else is needed; for example, read a scheduler-specific HOSTS file and use some site-specific launch mechanism to start tasks on each allocated host. multi: all "count" instances of executable are activated so the job needs not do anything but calculate; for example, GRAM generates a job script that does the site-specific stuff described in "single". mpirun: the parameters are mapped through to an 'mpirun' invocation to launch the runtime required for the job. in practice, I think this is a wrapped form of "single" where the user executable is mapped to an argument and the scheduled executable is mpirun. but I'd need to check to be sure. condor: the job is submitted to a condor flock, if my memory serves me. missing is a funny SMP-aware hybrid one can imagine: spreadsingle: one instance of executable is activated on each allocated resource but it must start additional parallel tasks itself if it wants parallelism on a resource. so, handle site-specific resource activation for job, but leave job to "expand" on each host (node). -- Karl Czajkowski karlcz@univa.com

Chris, Thanks for kick-starting this discussion. I'm commenting somewhat out of order so I've extracted some of the points here and left the entire text from your original email at the bottom of the email:
ProcessCount (total number of _processes_ that the job will use to execute the job)
As I mention in section 8 of the spec ProcessCount is misnamed. (The usage of the term 'process' is unfortunate.) It isn't intended to control how many actual processes the job will use at execution time but to indicate the aggregate processing power that the application needs. So I see it as 1-1 mapping to a CPU and I have actually translated it as such in the examples of section 8.
Note: I'm making an assumption here that CPUCount is the number of cpus that I want from the resource, rather than an expression of how many cpus the host needs to have configured. If it is the latter, then we do need TileSize, and replace CPUCount in my examples below with TileSize.
CPUcount is defined as the number of CPUs you want the resource to have. So it is the latter one.
4. A "homegrown" master/slave parallel job (say a ligand docking job). Wants 32 processors. No tiling constraints at all. : 4. ResourceCount == 32, CPUCount == 1 -> oops ... it doesn't care about tiling ResourceCount == 1, CPUCount == 32 -> hmm ... artificial constraint ... would suck on a blade cluster ResourceCount == 1-32, CPUCount == 1,32 -> oops again ... I might get a total allocation of 32*32 cpus
* there seems to be a gap!
If we had a term called "TotalCPUCount" for the entire job, I could do:
4. TotalCPUCount == 32 -> LSF : "-n 32" -> PBS : "not sure how to express"
Using the current terminology (bear with me) I would translate it as <Application> ... <jsdl-posix:POSIXApplication> ... <ProcessCount>32</ProcessCount> </jsdl-posix:POSIXApplication> </Application> <Resource> ... <ResourceCount> <LowerBoundedRange>1.0</LowerBoundedRange> </ResourceCount> </Resource> I translate "No tiling constraints" as meaning TileSize=1 and since it is the default value I have omitted it.
So ... in summary ... I propose:
CPUCount (as is if it's allocated cpus per resource) TileSize (iff CPUCount is an expression of configured cpus in a host) ResourceCount (as is ... hmmm ... maybe the default value needs to change) TotalCPUCount (how many cpus this jobs needs to run in total)
We need TileSize. I agree that the default ResourceCount=1 definition should be changed to 'undefined' as you mention in a subsequent email. So there are two issues: 1. Whether the topology requirements should be in the Application section or not. If they are in the Application section then the terms used should not be resource-flavored, i.e., not TotalCPUCount but something else. 2. How to rename 'ProcessCount' to eliminate the confusion with the term 'process' My answer to (1) is to keep these in the Application section. I am not sure how to rename ProcessCount though. Could I also ask you to let me know if the examples in section 8 of the spec actually make sense or not? Andreas Christopher Smith wrote:
Ok ... here are my thoughts on process topology and what's currently expressible in JSDL.
First, I'll list some use cases (they're all parallel jobs):
1. Simple MPI job. Wants 32 processors with 1 processor per resource (in JSDL, a host is a "resource").
2. OpenMPI job. Wants 32 processors with 8 processors per resource.
3. An OpenMP job. Wants 32 processors. Shared mem of course, so one resource.
4. A "homegrown" master/slave parallel job (say a ligand docking job). Wants 32 processors. No tiling constraints at all.
* Note that I'm specifically leaving out the Naregi "coupled simulation" use case (sorry guys), since we determined at the last GGF that it was a case which could be decomposed into multiple JSDL documents.
Second ... what is process topology? It provides the user a way to express how resources should be _allocated_ given the characteristics of the job (usually in terms of IO patterns ... e.g. network communication, disk IO channel contention, etc). Thus, it's used when the resource manager is _allocating_ the resources, not when the job is being started/launched. Therefore, none of the elements used to express process topology should be in the POSIXApplication section
What we have in JSDL now:
ResourceCount (how many "resources" i.e. hosts I want) CPUCount (how many processors _per resource_) TileSize (how many processors to allocate per resource as a unit) ProcessCount (total number of _processes_ that the job will use to execute the job)
I will argue that ProcessCount is useless for the purposes of process topology, since a) it isn't about allocation, and b) there isn't enough information to tell me how to start/launch a parallel job. It isn't about allocation since it is irrelevant to the scheduler whether I'll be computing using threads or processes. It isn't useful for launching because it doesn't tell me how to spread the ProcessCount processes given a particular allocated topology.
So that leaves the rest of them.
TileSize and CPUCount are pretty much the same thing. At least for 80% (or more) of the uses I've seen. The only thing that might cause them to differ is that I could possibly allocate more than one tile on a host. Given that CPUCount is a range and that we could express step values in the range (we can express step values in the range, right?), we don't need TileSize any more.
Note: I'm making an assumption here that CPUCount is the number of cpus that I want from the resource, rather than an expression of how many cpus the host needs to have configured. If it is the latter, then we do need TileSize, and replace CPUCount in my examples below with TileSize.
So let's see how these map to the use cases.
1. ResourceCount == 32, CPUCount == 1 -> LSF : "-n 32 -R span[ptile=1]" -> PBS : "-l nodes=32:ppn=1" (ppn=1 might be the default)
2. ResourceCount == 4, CPUCount == 8 -> LSF : "-n 32 -R span[ptile=8]" -> PBS : "-l nodes=4:ppn=8"
3. ResourceCount == 1, CPUCount == 32 -> LSF : "-n 32 -R span[hosts=1]" (hosts=1 equivalent to ptile=<-n val>) -> PBS : "-l nodes=1:ppn=32"
4. ResourceCount == 32, CPUCount == 1 -> oops ... it doesn't care about tiling ResourceCount == 1, CPUCount == 32 -> hmm ... artificial constraint ... would suck on a blade cluster ResourceCount == 1-32, CPUCount == 1,32 -> oops again ... I might get a total allocation of 32*32 cpus
* there seems to be a gap!
If we had a term called "TotalCPUCount" for the entire job, I could do:
4. TotalCPUCount == 32 -> LSF : "-n 32" -> PBS : "not sure how to express"
It basically means to grab 32 cpus, regardless of how they are spread. Basically I just need cpus. This is used a whole hell of a lot within our customer base.
So ... in summary ... I propose:
CPUCount (as is if it's allocated cpus per resource) TileSize (iff CPUCount is an expression of configured cpus in a host) ResourceCount (as is ... hmmm ... maybe the default value needs to change) TotalCPUCount (how many cpus this jobs needs to run in total)
-- Chris
-- Andreas Savva Fujitsu Laboratories Ltd

On Apr 20, Andreas Savva loaded a tape reading: ...
CPUcount is defined as the number of CPUs you want the resource to have. So it is the latter one.
Unfortunately, this answer does not clarify (for me) the point Chris is asking. I would think we want to express the number of CPUs allocated from a resource and a far secondary concern would be the total number of CPUs in the resource (including the ones not allocated to us). By saying "the latter", you are saying it is this total CPUs including unallocated ones. Do you really mean that? I think I can agree with Chris's statement that this is a sufficient set of terms (spelled out to be unambiguous): 1. number of CPUs allocated per resource (a range value) 2. number of allocated resources (a range value) 3. total number of CPUs allocated to job (a range value) given that draft 17 shows a multiplicity 0-1 for Resource, I think all three of these should be given here and left out of the Application section. I also think tileSize is redundant with these three, as Chris suggested. If we want the fourth: 4. total number of CPUs in allocated resource including unallocated CPUs (idle or allocated to other jobs) Then we should be blatantly obvious about the difference, e.g. jsdl:PerResourceAllocatedCPUs jsdl:PerResourceInstalledCPUs jsdl:TotalJobAllocatedCPUs or something like that! If we were in fact going to allow multiple Resource elements (to express heterogeneous resource mixes), it would be awkward to put the total number of CPUs allocated within a Resource element. It would be better to keep that somewhere else which obviously has "global" scope within the job definition. In fact, this applies to all "global metrics" like total RAM, etc. karl -- Karl Czajkowski karlcz@univa.com

Karl Czajkowski wrote:
On Apr 20, Andreas Savva loaded a tape reading: ...
CPUcount is defined as the number of CPUs you want the resource to have. So it is the latter one.
Unfortunately, this answer does not clarify (for me) the point Chris is asking. I would think we want to express the number of CPUs allocated from a resource and a far secondary concern would be the total number of CPUs in the resource (including the ones not allocated to us). By saying "the latter", you are saying it is this total CPUs including unallocated ones. Do you really mean that?
After sending it out I thought it might be unclear. Sorry. Let me use a couple of simple examples. I meant that if a resource description is given as <Resource> <CPUCount> <Exact> 2.0 </Exact> </CPUCount> </Resource> it means a 2-cpu host. Not 2 CPUs from a possibly larger N CPU machine. If on the other hand I said <Resource> <CPUCount> <LowerBoundedRange> 2.0 </LowerBoundedRange> </CPUCount> </Resource> I would accept any machine with at least 2 CPUs. So the (my?) definition would be closer to Chris' second definition. Maybe we should leave the rest until we can clarify what is the correct definition of resource multiplicity (and what other people think the correct meaning of the above fragments are....) Andreas
I think I can agree with Chris's statement that this is a sufficient set of terms (spelled out to be unambiguous):
1. number of CPUs allocated per resource (a range value)
2. number of allocated resources (a range value)
3. total number of CPUs allocated to job (a range value)
given that draft 17 shows a multiplicity 0-1 for Resource, I think all three of these should be given here and left out of the Application section. I also think tileSize is redundant with these three, as Chris suggested. If we want the fourth:
4. total number of CPUs in allocated resource including unallocated CPUs (idle or allocated to other jobs)
Then we should be blatantly obvious about the difference, e.g.
jsdl:PerResourceAllocatedCPUs jsdl:PerResourceInstalledCPUs jsdl:TotalJobAllocatedCPUs
or something like that!
If we were in fact going to allow multiple Resource elements (to express heterogeneous resource mixes), it would be awkward to put the total number of CPUs allocated within a Resource element. It would be better to keep that somewhere else which obviously has "global" scope within the job definition. In fact, this applies to all "global metrics" like total RAM, etc.
karl
-- Andreas Savva Fujitsu Laboratories Ltd

On Apr 20, Andreas Savva loaded a tape reading:
I meant that if a resource description is given as <Resource> <CPUCount> <Exact> 2.0 </Exact> </CPUCount> </Resource>
it means a 2-cpu host. Not 2 CPUs from a possibly larger N CPU machine.
Yes, this is indeed the "latter" case from Chris. I had hoped we didn't care about the aspects of a resource _not_ allocated to us. Why do you actually care whether your 2 CPUs are a whole host or not? Is this actually some sort of selection preference sneaking in, like do not give me the more rare nodes? Or is it a QoS issue like "exclusivity" of the resource? I think it is important that we understand why you want to express this, now that it is clear what you are expressing. Do you expect to express similar physical constraints on other resources not necessarily allocated to you, e.g. amount of RAM or virtual memory space not necessarily allocated to you? I'm not trying to be snarky but this avenue does confuse me lots!
If on the other hand I said <Resource> <CPUCount> <LowerBoundedRange> 2.0 </LowerBoundedRange> </CPUCount> </Resource>
I would accept any machine with at least 2 CPUs.
I find this ambiguous again... you need to distinguish again acceptance of an allocation from acceptance of characteristics of a resource that have not been allocated to you!
So the (my?) definition would be closer to Chris' second definition.
Maybe we should leave the rest until we can clarify what is the correct definition of resource multiplicity (and what other people think the correct meaning of the above fragments are....)
Andreas
It seems to me the issue of resource element mulitplicity is orthogonal... it is whether we support heterogeneous resource requirements or not, e.g. some number of resources that look like THIS and some other number of resources that look like THAT. As for you parenthetic comment, I think it is useless to investigate "what people think the correct meaning of [...] fragments are" but rather we should focus on what meanings do we want to convey in JSDL and then fix the syntax to do so unambiguously. It sounds to me like you are advocating the full set of concepts I suggested in my earlier message: allocated CPUs per resource total allocated CPUs installed CPUs per resource (your 2-cpu machine) As Chris said, using range value expressions on "allocated CPUs per resource" will give us tile size, e.g. I want 2,4,8,16 CPUs per resource (but not 7 etc.) and I want between 32 and 128 CPUs total. One thing that confuses me is whether there needs to be a way to express symmetry or "strict homegeneity" e.g. 4 nodes, each with 4 cpus OR 4 nodes, each with 8 cpus versus 4 nodes, each with 4 or 8 cpus I think the latter is what we know how to express right now using exact resource counts, by considering them equivalent to unrolling into a fixed number of resource elements. I do not know if others intended this or the stricter symmetric interpretation. karl -- Karl Czajkowski karlcz@univa.com

Karl Czajkowski wrote:
On Apr 20, Andreas Savva loaded a tape reading:
I meant that if a resource description is given as <Resource> <CPUCount> <Exact> 2.0 </Exact> </CPUCount> </Resource>
it means a 2-cpu host. Not 2 CPUs from a possibly larger N CPU machine.
Yes, this is indeed the "latter" case from Chris. I had hoped we didn't care about the aspects of a resource _not_ allocated to us.
Why do you actually care whether your 2 CPUs are a whole host or not? Is this actually some sort of selection preference sneaking in, like do not give me the more rare nodes? Or is it a QoS issue like "exclusivity" of the resource? I think it is important that we understand why you want to express this, now that it is clear what you are expressing.
Do you expect to express similar physical constraints on other resources not necessarily allocated to you, e.g. amount of RAM or virtual memory space not necessarily allocated to you? I'm not trying to be snarky but this avenue does confuse me lots!
My requirement is very simple. If I ask for a 2-way SMP machine I want a 2-way SMP machine and nothing else. It is not a preference. Consider that I may want to run software specifically tuned to some configuration (cpu, ram, and so on) and I don't want the system to be 'helpful' and give me something that it thinks is better for me. Unless I tell it so by using a range expression. Obviously this has nothing (or doesn't necessarily) have to do with 'process topology.'
If on the other hand I said <Resource> <CPUCount> <LowerBoundedRange> 2.0 </LowerBoundedRange> </CPUCount> </Resource>
I would accept any machine with at least 2 CPUs.
I find this ambiguous again... you need to distinguish again acceptance of an allocation from acceptance of characteristics of a resource that have not been allocated to you!
Not sure what was ambiguous; maybe the "I would accept.." part? Perhaps I should say instead that the above fragment to me means "I want a host with at least 2 cpus."
So the (my?) definition would be closer to Chris' second definition.
Maybe we should leave the rest until we can clarify what is the correct definition of resource multiplicity (and what other people think the correct meaning of the above fragments are....)
Andreas
It seems to me the issue of resource element mulitplicity is orthogonal... it is whether we support heterogeneous resource requirements or not, e.g. some number of resources that look like THIS and some other number of resources that look like THAT.
I agree that we are mixing in too many concepts. For 'process topology' let's just focus on a single resource element.
As for you parenthetic comment, I think it is useless to investigate "what people think the correct meaning of [...] fragments are" but rather we should focus on what meanings do we want to convey in JSDL and then fix the syntax to do so unambiguously.
True. But I assume (hope rather) there is already some consensus on what those fragments mean and would hate to try to start from scratch.
It sounds to me like you are advocating the full set of concepts I suggested in my earlier message:
I think we are not that far apart. Just to map these to what I've been using:
allocated CPUs per resource
I guess 'similar' to TileSize at the moment.
total allocated CPUs
'ProcessCount' I guess.
installed CPUs per resource (your 2-cpu machine)
CPUCount
As Chris said, using range value expressions on "allocated CPUs per resource" will give us tile size, e.g. I want 2,4,8,16 CPUs per resource (but not 7 etc.) and I want between 32 and 128 CPUs total.
One thing that confuses me is whether there needs to be a way to express symmetry or "strict homegeneity" e.g.
4 nodes, each with 4 cpus OR 4 nodes, each with 8 cpus
versus
4 nodes, each with 4 or 8 cpus
I think the latter is what we know how to express right now using exact resource counts, by considering them equivalent to unrolling into a fixed number of resource elements. I do not know if others intended this or the stricter symmetric interpretation.
karl
Yes. What we have in the spec right now can express the latter version but not the former. We don't have an 'OR' operator. I think the former is where we should start to be thinking of combining JSDL with other specs like WS-Agreement. -- Andreas Savva Fujitsu Laboratories Ltd

On Apr 20, Andreas Savva loaded a tape reading:
My requirement is very simple. If I ask for a 2-way SMP machine I want a 2-way SMP machine and nothing else. It is not a preference. Consider that I may want to run software specifically tuned to some configuration (cpu, ram, and so on) and I don't want the system to be 'helpful' and give me something that it thinks is better for me. Unless I tell it so by using a range expression.
I cannot help thinking you have made some assumption of exclusivity or free-reign and it is keeping you from seeing what is confusing us. I think I (and Chris) have had an assumption that resource requirements are a request for allocation, and these allocations MAY always be a portion of a larger (shared) resource. For us, allocation is a virtualization of resources: allocating 2 cpus from a 4-way SMP should nearly always be indistinguishable from allocation of 2 cpus from a 2-way SMP. That's a scheduler/admin prerogative. Placing your job on a 4-way SMP does not imply that you have access to 4 CPUs. If you asked for exactly 2 CPUs per resource, I understand LSF might well allocate you 2 from a 4-way node if it wants to. This is not to give you a "better" resource but because it is the "closest sufficient fit" that the scheduler could find for whatever reasons it has. You would be violating the scheduler's allocation decision if you somehow used more than 2 of those CPUs. Some schedulers might enforce the allocation with dynamic OS partitioning directives etc. while others might require cooperative behavior from the application and penalize violations through social or accounting means. Going the other way should not happen in my book: I cannot allocate you 4 CPUs on a 2-way SMP. I don't know if Chris or other throughput-oriented folks may disagree on that point. I suppose you could "allocate" 4 1 GHz virtual CPUs on a 2-way SMP with 2 GHz CPUs... :-) If I am charitable and assume I am completely misunderstanding you above, I understand your requirement to express hardware selection as a very advanced or esoteric performance requirement, e.g. your selection of exactly 2-way nodes somehow implicitly selects for a cache or bus architecture, local process scheduler behavior, etc. I could then see wanting to exactly select physical RAM sizes (effects cache associativity and miss rates) or any other attribute in the same way. I did not think such selection of resource properties beyond the reach of the allocation requirements was in scope. It is not something that I (nor Chris, I am guessing) was expecting to express. We figured that by the time you are making such specialized selection criteria you are probably choosing nodes by name or type using the hostname mechanism or something weirder and even more site-specific. ...
Not sure what was ambiguous; maybe the "I would accept.." part? Perhaps I should say instead that the above fragment to me means "I want a host with at least 2 cpus."
This is exactly the confusion. "I want a host with X" is ambiguous because there exists hosts with X but you do not "get" hosts. You "get" allocations X from a host with X or from a host with Y>X. karl -- Karl Czajkowski karlcz@univa.com

Given the discussion below, it does make me wonder whether we need an Exclusive boolean in the Resource section. I also don't see the problem in keeping CPUCount as is ... it just won't make any sense for my use cases, as Karl says. :-) -- Chris On 20/4/05 03:22, "Karl Czajkowski" <karlcz@univa.com> wrote:
On Apr 20, Andreas Savva loaded a tape reading:
My requirement is very simple. If I ask for a 2-way SMP machine I want a 2-way SMP machine and nothing else. It is not a preference. Consider that I may want to run software specifically tuned to some configuration (cpu, ram, and so on) and I don't want the system to be 'helpful' and give me something that it thinks is better for me. Unless I tell it so by using a range expression.
I cannot help thinking you have made some assumption of exclusivity or free-reign and it is keeping you from seeing what is confusing us. I think I (and Chris) have had an assumption that resource requirements are a request for allocation, and these allocations MAY always be a portion of a larger (shared) resource. For us, allocation is a virtualization of resources: allocating 2 cpus from a 4-way SMP should nearly always be indistinguishable from allocation of 2 cpus from a 2-way SMP. That's a scheduler/admin prerogative.
Placing your job on a 4-way SMP does not imply that you have access to 4 CPUs. If you asked for exactly 2 CPUs per resource, I understand LSF might well allocate you 2 from a 4-way node if it wants to. This is not to give you a "better" resource but because it is the "closest sufficient fit" that the scheduler could find for whatever reasons it has. You would be violating the scheduler's allocation decision if you somehow used more than 2 of those CPUs. Some schedulers might enforce the allocation with dynamic OS partitioning directives etc. while others might require cooperative behavior from the application and penalize violations through social or accounting means.
Going the other way should not happen in my book: I cannot allocate you 4 CPUs on a 2-way SMP. I don't know if Chris or other throughput-oriented folks may disagree on that point. I suppose you could "allocate" 4 1 GHz virtual CPUs on a 2-way SMP with 2 GHz CPUs... :-)
If I am charitable and assume I am completely misunderstanding you above, I understand your requirement to express hardware selection as a very advanced or esoteric performance requirement, e.g. your selection of exactly 2-way nodes somehow implicitly selects for a cache or bus architecture, local process scheduler behavior, etc. I could then see wanting to exactly select physical RAM sizes (effects cache associativity and miss rates) or any other attribute in the same way. I did not think such selection of resource properties beyond the reach of the allocation requirements was in scope.
It is not something that I (nor Chris, I am guessing) was expecting to express. We figured that by the time you are making such specialized selection criteria you are probably choosing nodes by name or type using the hostname mechanism or something weirder and even more site-specific.
...
Not sure what was ambiguous; maybe the "I would accept.." part? Perhaps I should say instead that the above fragment to me means "I want a host with at least 2 cpus."
This is exactly the confusion. "I want a host with X" is ambiguous because there exists hosts with X but you do not "get" hosts. You "get" allocations X from a host with X or from a host with Y>X.
karl

On 19/4/05 22:05, "Andreas Savva" <andreas.savva@jp.fujitsu.com> wrote:
If we had a term called "TotalCPUCount" for the entire job, I could do:
4. TotalCPUCount == 32 -> LSF : "-n 32" -> PBS : "not sure how to express"
Using the current terminology (bear with me) I would translate it as
<Application> ... <jsdl-posix:POSIXApplication> ... <ProcessCount>32</ProcessCount> </jsdl-posix:POSIXApplication> </Application> <Resource> ... <ResourceCount> <LowerBoundedRange>1.0</LowerBoundedRange> </ResourceCount> </Resource>
This works ... I didn't think of using the range on ResourceCount. :-)
I translate "No tiling constraints" as meaning TileSize=1 and since it is the default value I have omitted it.
Nope ... "no tiling constraints" means "no tiling constraints" (i.e. TileSize undefined). TileSize=1 is a tiling constraint.
We need TileSize. I agree that the default ResourceCount=1 definition should be changed to 'undefined' as you mention in a subsequent email.
So there are two issues: 1. Whether the topology requirements should be in the Application section or not. If they are in the Application section then the terms used should not be resource-flavored, i.e., not TotalCPUCount but something else. 2. How to rename 'ProcessCount' to eliminate the confusion with the term 'process'
My answer to (1) is to keep these in the Application section. I am not sure how to rename ProcessCount though.
Ha ... so my answer to (1) is to put it somewhere else (near the Resource section). My view on this is that TotalCPUCount and TileSize are resource requirements on the global allocation, and not really tied to the application at all (i.e. they equally apply to POSIX applications, a clustered service instance, etc, etc). I basically like to categorize things based on whether they are associated with allocating resources, or whether they are associated with binding the "work unit" to the allocation, since these are often two separate phases to getting work done in a batch system, or other execution management systems. You can also subcategorize allocation requirements based on whether they apply globally to the entire allocation (e.g. TotalCPUCount) or whether they apply at an individual resource level (e.g. CPUCount). I don't think we have the notion of the former, do we?
Could I also ask you to let me know if the examples in section 8 of the spec actually make sense or not?
Sort of? I won't be sure until we agree on the terminology changes. I actually think that my 4 use cases cover it pretty well (from the allocation point of view for parallel jobs), although some examples could be used to illustrate the use of CPUCount ... perhaps in conjunction with an "Exclusive" flag. -- Chris
participants (3)
-
Andreas Savva
-
Christopher Smith
-
Karl Czajkowski