Re: [ogsa-rss-wg] Re: [ogsa-wg] Teleconference minutes - 2 November 2005

Karl Czajkowski wrote:
Dave, I think the interesting point of collision with "candidate set generation" and "planning", e.g. traditional scheduling, is when you consider non-trivial QoS regimes. For example, when services do not give best effort service to all callers, and you cannot just assume statistical averaging to predict future service based on past, etc.
I prefer to think of this in terms of resource selector services having a non-trivial dependency on the security context, and in particular upon the user's role. The key is that optimization from a user's perspective is only possible within the context of the relevant user role(s), since it is that which determines what resources are visible to those users. Trying to use information given to one role with another isn't going to be helpful at all.
Then, there is a significant difference between a scheduler or broker who returns advisory information and one who returns authoritative information.
I don't see a major difference, or at least, not until you start having nailed-down reservations attached. Without any committed reservations, a candidate execution plan is indistinguishable in the case of it being issued by an advisory or an authoritative broker. Furthermore, it was agreed at the last GGF that neither of the RSS services would cause any reservations to be entered into. By that, I mean that no consumption of resources would start that could be charged to the user; systems are naturally free to try to pre-stage configurations in response to queries for candidates if they so choose, but they will be doing the service grid equivalent of speculative execution (and, for example, won't be able to start staging files using the user's identity) and if things fall through, it will be the provider who will have to swallow the costs. (I forsee this sort of risk being the foundation of a viable business model in some circumstances, but would not want to force anyone to adopt it.)
In the advisory case, he is essentially a discovery service helping you locate services with nice properties observed in the usefully recent past. However, until you act on that information by attempting to acquire service, you have no idea whether the service is actually nice or not for your goals. He may be suddenly congested, or have differentiated policies which make your request impossible to fullfill despite his overall nice status.
In the authoritative case, the scheduler actually has some control over those remote services, i.e. (part of) their capacity is reserved to be used at his discretion. In this case, he can actually determined when and how much service you should obtain, and inform you of a stateful adjustment to his overall resource plans that includes you.
I see that as part of an advance-reservation protocol, and not as part of the CSG/EPS. The reason for this is that different applications will need different strategies for recovery from this sort of failure. Some will want to just go straight back to the user and say "it can't be done" but others will wish to try some number of execution plans first. Because of the wide variety of possibilities involved, it looks like a generalized workflow problem (not just a job workflow problem) and it is therefore outside the scope of RSS and instead inside the "Job Manager" wooly cloud^W^Wblack box. The other point is that there is no way to stop a resource from going away unexpectedly or a client from failing to consume its allocation. The resource might get hit by a disaster (natural or man-made), the client might die, etc. But this isn't a new problem; it occurs in the world of business every day and they deal with it. We should learn from them (e.g. by describing the consequences of such failures in terms of finanicial penalties) and not reinvent this particular wheel.
There might be protocol differences to support these cases, e.g. an authoritative answer might carry some rights assertion that you can present to the service, or the broker may have to go update remote services behind the scenes but before you manage to contact them. Also, the authoritative broker may demand that you "decline" allocations he has issued you, while you can presumably walk away from an advisory information source since no stateful allocation has been made.
What you think of as brokering is more than what I think of as brokering it seems. In my view of things, what you call brokering is what I would describe as a higher-level service built on top of brokering (i.e. brokering plus reservation). While this is an interesting topic to work on, I'm not convinced that enough of the answers exist in more than one place for it to be worth pushing forward with this wider stuff at this stage. Once we (as a community) have a bit more implementation experience, the time for standardizing this stuff will be a lot riper.
I do think you are on the right track to call a ranking function "policy", and in non-trivial scheduling regimes I think you will always have to present such policy rather than being able to obtain a sufficient picture of the environment to make decisions yourself. This is because of several things:
1. You will not get an atomically consistent view of the environment to act on, while the remote manager may have such a mechanism.
That's true. An atomically consistent view would require locking of databases (or equivalent) across organizations, and is therefore a total non-starter.
2. You will most likely not get a complete picture of the future service allocation plans, in the event of advance reservation. It would be too much data to exchange for each request; it might include confidential information; and it might not be clearly separable from the scheduling algorithm that might include statistical approximations and/or other hueristics.
At the University of Manchester we're looking into these things in more detail. At the moment, it looks like the brokering world is going to be categorizable into multiple types of service depending on the quality of information available. This work is still actively ongoing though, so it is a bit too soon to report on it.
3. You will not get a complete picture of the differentiated polcicies that different brokers or services apply to specific individuals, because it may be confidential and is also meaningless without a global view of other competing activities.
(I think I've covered this point already.)
Thus, I think a significant step of ANY distributed planning exercise with non-trivial QoS will be the sort of handshake captured in WS-Agreement: make an "offer" bearing policy expressions that describe what you want; allow the authoritative resource manager to consider this offer among others and its stateful policy and resource availability models; obtain an answer of whether the manager can arrange services according to your offer; and (optionally) introspect to find out HOW the manager will provide you service.
All interface "refactorings" are not equivalent across protocol boundaries between parties with different authority and trust roles...
True, but I'm not at all convinced that WS-Ag is quite in the sweet spot either. It would help a lot if it was easier to tackle the document parts of the spec separately from the service parts because at the moment, the sheer size of the overall spec and the feeling that you have to read virtually all of it to understand it[*] is scareing some people off. Donal. [* FWIW, I found I had to read not just the spec but also some of the presentations about it too to understand what was going on. To me, that indicates that the spec document itself has not yet captured all that you intend it to. ]

On Dec 12, Donal K. Fellows modulated: ...
I don't see a major difference, or at least, not until you start having nailed-down reservations attached. Without any committed reservations, a candidate execution plan is indistinguishable in the case of it being issued by an advisory or an authoritative broker. Furthermore, it was agreed at the last GGF that neither of the RSS services would cause any reservations to be entered into. By that, I mean that no consumption of resources would start that could be charged to the user; systems are
OK, it sounds like you are clearly scoped to have "advisory" brokers, with my definition of authoritative meaning that the selection response is "what will be allowed", e.g. an obligating decision. :-) BTW, I was responding to how I read Dave's inquiry about generalizing the problem, and not specifically to scoping of RSS...
The other point is that there is no way to stop a resource from going away unexpectedly or a client from failing to consume its allocation. The resource might get hit by a disaster (natural or man-made), the client might die, etc. But this isn't a new problem; it occurs in the world of business every day and they deal with it. We should learn from them (e.g. by describing the consequences of such failures in terms of finanicial penalties) and not reinvent this particular wheel.
Yes, I entirely agree. I guess it is out of scope though, if the RSS result is just a list of candidates rather than some obligation of service. Filtering and ordering should be sufficient, rather than the added difficulty of precise monetized (linear) values.
2. You will most likely not get a complete picture of the future service allocation plans, in the event of advance reservation. It would be too much data to exchange for each request; it might include confidential information; and it might not be clearly separable from the scheduling algorithm that might include statistical approximations and/or other hueristics.
At the University of Manchester we're looking into these things in more detail. At the moment, it looks like the brokering world is going to be categorizable into multiple types of service depending on the quality of information available. This work is still actively ongoing though, so it is a bit too soon to report on it.
Right; as I was trying to describe in response to Dave's question, I think the boundaries of what is to be accomplished must be set pretty specifically, because different protocols will be appropriate depending on the kind of information that can be revealed (and the sizes of the various data for realistic environments).
All interface "refactorings" are not equivalent across protocol boundaries between parties with different authority and trust roles...
True, but I'm not at all convinced that WS-Ag is quite in the sweet spot either. It would help a lot if it was easier to tackle the document parts of the spec separately from the service parts because at the moment, the sheer size of the overall spec and the feeling that you have to read virtually all of it to understand it[*] is scareing some people off.
Donal. [* FWIW, I found I had to read not just the spec but also some of the presentations about it too to understand what was going on. To me, that indicates that the spec document itself has not yet captured all that you intend it to. ]
Yes, public comments have said as much... the problem area is pretty large, unfortunately, and GRAAP-WG is trying to trim the spec down to the basic normative aspects. The group also plans to develop a sort of primer to provide more useful non-normative introduction to the approach. (This was decided because earlier feedback indicated the specification was too large to digest for people looking for all the normative bits.) The difficulty is that the core abstraction of agreement-about-service is pretty abstract stuff, and the domain-specific examples that can help illustrate it are definitely non-normative to the base spec. In an ideal world, we could present some fully-developed profiles of WS-Agreement with specific service/resource management domains to illustrate the base model. In practice, I think we need to get a "version 1" out there so that domain profiles can be developed and usage experience gathered to inform further versions of WS-Agreement. karl -- Karl Czajkowski karlcz@univa.com

Karl Czajkowski wrote:
Yes, I entirely agree. I guess it is out of scope though, if the RSS result is just a list of candidates rather than some obligation of service. Filtering and ordering should be sufficient, rather than the added difficulty of precise monetized (linear) values.
Getting rid of the obligation simplifies many other things too (e.g. the relationship of resource selection to deployment, which is what prompted the decision in the first place). Pricing models are a topic for some other time; for now, stating that they need to exist and will inform the decisions taken by the agents in the system is probably good enough while we work on the other details of the arch...
Right; as I was trying to describe in response to Dave's question, I think the boundaries of what is to be accomplished must be set pretty specifically, because different protocols will be appropriate depending on the kind of information that can be revealed (and the sizes of the various data for realistic environments).
This sort of thing is why I was pleased to start thinking of things in terms of ordered sets delivered through WS-Enumeration (and I'm told that RSS permits similar stuff, but I don't know the details of that). It means that the "ten thousand candidates" case goes away in practice because you can probably use the first few and have things good enough. Once you no longer have to deal with enormous quantities of data, you can get away with using any description system that works. (Real large data wouldn't get shipped through the brokering system anyway; that'd go by something like GridFTP to the agreed target using agreed network resources, or something like that.) Donal.

Hi all, I am happy to see this discussion. Let me state my position in two steps: Wearing my researcher hat and my OGSA-RSS-Cochair hat. As a researcher, I think we need advance reservation in grids for a huge class of jobs. I would like to have an agreement between the client/consumer and the provider/producer for each job (possibly not for very small tasks). The provider should decide which terms he can offer, and the client should accept or refuse. A scheduler/resource broker has then the duty of bringing clients and provider together. There is no such thing as atomicity in distributed systems, therefore I would like to have an economic model that punishes participants that violate any contracts. Basically, since the provider needs to execute a given job, he needs to know the dependencies of a job and may then map the job to a set of local resources, or start provisioning of resources. The broker gathers offers from various providers, which consist of multiple properties. The client provides information about his preference structure. The broker is then responsible for matching the offers to the clients needs. The client is then presented with this solution and may confirm it. If one of the parties cancels the agreement, the broker should propagate the cancelation cost to the entity that caused the cancelation. As a OGSA-RSS Cochair, I also have to state that this is out of scope. RSS deals with the Execution Planing System (EPS) and the Candidate Set Generator (CSG). As Donal already wrote, there is no change on the environment during the execution of those services. Therefore, the CSG can point to a (ordered) set that may suit the needs of a given job, and the EPS can then choose an appropriate execution location. There is no reservation support at the moment. If the Jobmanager chooses to reserve resources, he is on his own. There is no way of supporting e.g. coallocation of resources in the RSS. But nevertheless, I would love to see a short primer document about WS-Agreement. -Mathias

Mathias Dalheimer wrote:
There is no way of supporting e.g. coallocation of resources in the RSS.
That's not quite true. Provided there is a way of expressing it (JSDL does not give us enough) coallocation can be supported by the EPS, though it would just be returning possible coallocation plans and not ones backed up by hard reservations. (As you said in text I elided, that is stuff that is up to the job manager and reservation service). Just as with the CSG, arranging for the EPS to return an ordered set of plans is likely to be the right approach, especially as it allows the JM to decide what to do if the first plan falls through (and different plans might be needed depending on exactly what happened too). Donal.

Mathias and Donal, If the coallocation issue has to be addressed in the context of RSS, I agree with your idea of letting the generation of an ordered set of possible coalloaction plans be done by EPS. In fact, that's exactly what we have been exploring for the last two years in the NAREGI project. Let's assume that you want to run a MPI job that requires a large number of nodes, and also assume that there is no single resource with that amount of nodes. In this case, there is no way to run the MPI job on a single resource. Either the MPI job fails to run, or it has to run across multiple resources with the help of some coallocation mechanism. In order to deal with this situation, what the NAREGI Super Scheduler (consisting of EPS and CSG) did is that CSG just returns a set of candidate resources to EPS and EPS then generates a set of feasible co-allocation plans based on the set of candidate resources returned from CSG and application's requirements (e.g., network bandwidth, latency, etc). In our current implementation, since we haven't defined any ranking function yet, EPS randomly choose one among possible co-allication plans and then tries to make hard reservations with local resources associated with the plan selected. If succeeding in making reservation, EPS returns the co-allocation plan to the job manager. I am wondering as to what kinds of job types that EPS/CSG should be able to handle as its input. Examples of different job types include a single job, a set of independent jobs (i.e., no dependency constraint between jobs), workflow jobs (i.e., dependency constraints between jobs), or MPI jobs (where maybe co-allocation is one of key requirements). I am also wondering how these different job types will affect the design of EPS/CSG interface and protocol. The way to describe resource requirements for job execution may be different depending on the type of jobs. Is the resource element part of JSDL v1 spec sufficient to be used to describe requirements of jobs of different types? I suspect that depending on job types, the parameters used for ranking function/mechamism is likely to be different, and that we might end up having different EPS/CSG interfaces for each job type. Soonwook -----Original Message----- From: owner-ogsa-wg@ggf.org [mailto:owner-ogsa-wg@ggf.org] On Behalf Of Donal K. Fellows Sent: Tuesday, December 13, 2005 11:33 PM To: ogsa-wg@ggf.org; ogsa-rss-wg@ggf.org Subject: [ogsa-wg] Re: [ogsa-rss-wg] Re:[RSS Architecture Discussion] Mathias Dalheimer wrote:
There is no way of supporting e.g. coallocation of resources in the RSS.
That's not quite true. Provided there is a way of expressing it (JSDL does not give us enough) coallocation can be supported by the EPS, though it would just be returning possible coallocation plans and not ones backed up by hard reservations. (As you said in text I elided, that is stuff that is up to the job manager and reservation service). Just as with the CSG, arranging for the EPS to return an ordered set of plans is likely to be the right approach, especially as it allows the JM to decide what to do if the first plan falls through (and different plans might be needed depending on exactly what happened too). Donal.

On Dec 14, Soonwook Hwang modulated: ...
I am also wondering how these different job types will affect the design of EPS/CSG interface and protocol. The way to describe resource requirements for job execution may be different depending on the type of jobs. Is the resource element part of JSDL v1 spec sufficient to be used to describe requirements of jobs of different types? I suspect that depending on job types, the parameters used for ranking function/mechamism is likely to be different, and that we might end up having different EPS/CSG interfaces for each job type.
I think you could express an MPI job in JSDL v1 resource language, and have a co-allocation aware selection service return different possible mappings where some of the homogeneous resources come from different managers. E.g. a "simple" JSDL job is exploded into a set of concurrent JSDL jobs to be directed to different execution sites. However, I think you would quickly want a more rich resource language extension---beyond JSDL v1---where you could describe network connectivity requirements (possibly hierarchically) and heterogeneous resource requirements. For example, describing a job where you need a certain number of "large memory" nodes with a good interconnect, a certain number of "fast CPU" nodes with a separate good interconnect, and a less stringent WAN connection between the two node sets for some coupled numerical code. This is definitely beyond the scope of the JSDL v1 resource language, though a nice extended language could encapsulate multiple v1 resource clauses, one for each homogeneous node set... I think the big debates here will always return to which way such natural job/resource graphs will be normalized into a document tree. Should there be one job with complex resource requirements? Or some complex workflow with lots of simple (sub-)job documents? In the general case, there can be lots of cross-cutting relationships, e.g. correlating different executable specifications or data-staging requirements with different node types (or node identities), having a mixture of localized per-node, per-nodeset, and global resource allocation requirements, etc. Different normalized document trees will emphasize one job regime over another, making that one convenient to specify using nice, lexically scoped properties while either blocking other kinds of job or at least making them use less convenient cross-references and such to encode the natural graph. Of course, the debates are not just about syntactic style, but assumed document processing algorithms. Out of necessity, people are implementing restricted hueristics; out of convenience, I think they are also relying on document syntax to help enforce representation invariants for their implemented hueristics. I haven't seen anybody favoring a unified "big mess blob in/big mess blob out" interface where all the implementation-specific constraints must be checked internally while mapping to some more prosaic internal format. I have often wondered, and am not sure this would lead to better or worse interop than a bunch of more specialized interface standards... karl -- Karl Czajkowski karlcz@univa.com

Karl Czajkowski wrote:
However, I think you would quickly want a more rich resource language extension---beyond JSDL v1---where you could describe network connectivity requirements (possibly hierarchically) and heterogeneous resource requirements. For example, describing a job where you need a certain number of "large memory" nodes with a good interconnect, a certain number of "fast CPU" nodes with a separate good interconnect, and a less stringent WAN connection between the two node sets for some coupled numerical code. This is definitely beyond the scope of the JSDL v1 resource language, though a nice extended language could encapsulate multiple v1 resource clauses, one for each homogeneous node set...
That's the sort of situation I'd prefer to describe as two (or more) applications within a workflow with some kind of coupling constraint, rather than as a single distributed app. In any case, the "multiple coupled apps" case is probably going to be more common than the "divided single app" case, simply because it is a rare application that can be chopped apart that way without horrible performance penalties. Apart from this clarification, (I think) I agree with the rest of what you say. (The OGSA-RSS-WG is not going to tackle Workflow Language problem, but it must be done. I'm not at all sure that BPEL is the answer, but half the problem is finding recent versions of the BPEL spec; the only readable one I've ever seen was obviously not powerful enough and does not seem to be what people are using these days anyway. But this a topic for some other group/time.) Donal.

I was wondering whether you meant "divided single apps" by "parameter sweep types of apps." If that is the case, I don't think that it's less common than "multiple coupled apps," and that it's a rare application. I think that it's in fact the kind of applications that will perhaps benefit most from exploiting the Grid with the current limitations in network bandwidth and latercy between WAN interconnections. As we see in case of Condor, Nimrod and other systems. there are many existing systems aiming at tackling "parameter sweep apps." In addition, we might be able to consider a collection of single job submitted from different job managers to EPS/CSG to be a kind (?) of parameter sweep apps that EPS/CSG needs to make mapping plans. In brief, the point I am trying to make here is that when it comes to the design of EPS/CSG interface and protocol, having the matching/scheduling of parameter sweep type of apps (i.e., a set of independent tasks of job type) in mind is important as well. Soonwook -----Original Message----- From: owner-ogsa-rss-wg@ggf.org [mailto:owner-ogsa-rss-wg@ggf.org] On Behalf Of Donal K. Fellows Sent: Wednesday, December 14, 2005 8:57 PM To: Karl Czajkowski Cc: ogsa-wg@ggf.org; ogsa-rss-wg@ggf.org Subject: Re: [ogsa-wg] Re: [ogsa-rss-wg] Re:[RSS Architecture Discussion] Karl Czajkowski wrote:
However, I think you would quickly want a more rich resource language extension---beyond JSDL v1---where you could describe network connectivity requirements (possibly hierarchically) and heterogeneous resource requirements. For example, describing a job where you need a certain number of "large memory" nodes with a good interconnect, a certain number of "fast CPU" nodes with a separate good interconnect, and a less stringent WAN connection between the two node sets for some coupled numerical code. This is definitely beyond the scope of the JSDL v1 resource language, though a nice extended language could encapsulate multiple v1 resource clauses, one for each homogeneous node set...
That's the sort of situation I'd prefer to describe as two (or more) applications within a workflow with some kind of coupling constraint, rather than as a single distributed app. In any case, the "multiple coupled apps" case is probably going to be more common than the "divided single app" case, simply because it is a rare application that can be chopped apart that way without horrible performance penalties. Apart from this clarification, (I think) I agree with the rest of what you say. (The OGSA-RSS-WG is not going to tackle Workflow Language problem, but it must be done. I'm not at all sure that BPEL is the answer, but half the problem is finding recent versions of the BPEL spec; the only readable one I've ever seen was obviously not powerful enough and does not seem to be what people are using these days anyway. But this a topic for some other group/time.) Donal.

Soonwook Hwang wrote:
I was wondering whether you meant "divided single apps" by "parameter sweep types of apps." If that is the case, I don't think that it's less common than "multiple coupled apps," and that it's a rare application. I think that it's in fact the kind of applications that will perhaps benefit most from exploiting the Grid with the current limitations in network bandwidth and latercy between WAN interconnections. As we see in case of Condor, Nimrod and other systems. there are many existing systems aiming at tackling "parameter sweep apps." In addition, we might be able to consider a collection of single job submitted from different job managers to EPS/CSG to be a kind (?) of parameter sweep apps that EPS/CSG needs to make mapping plans. In brief, the point I am trying to make here is that when it comes to the design of EPS/CSG interface and protocol, having the matching/scheduling of parameter sweep type of apps (i.e., a set of independent tasks of job type) in mind is important as well.
As far as I can tell, there are two major classes of non-trivial grid usage here. One is the "we have this complex plan of work to carry out" (i.e. a workflow) and the other is "we want to do this long list of fairly simple things" (i.e. a parameter sweep). These two major classes of system have very different properties; in the workflow, the main desire is to coordinate what may be very large atomic jobs to carry out some larger task and the resource-selection plan is focussed on maximizing the likelihood of success while minimizing the cost, and in the parameter sweep the focus is no longer on the individual executions but on the maximizing of "job throughput" and resource-selection can be done more on the basis of "who is free for another work packet". Indeed I suspect that the best approach might involve the use of a CSG to select a set of worker containers, with very little planning at all. Oh well, that would at least demonstrate the need for having the CSG as an entity separate to the EPS... Donal.
participants (4)
-
Donal K. Fellows
-
Karl Czajkowski
-
Mathias Dalheimer
-
Soonwook Hwang