
Karl Czajkowski wrote:
Dave, I think the interesting point of collision with "candidate set generation" and "planning", e.g. traditional scheduling, is when you consider non-trivial QoS regimes. For example, when services do not give best effort service to all callers, and you cannot just assume statistical averaging to predict future service based on past, etc.
I prefer to think of this in terms of resource selector services having a non-trivial dependency on the security context, and in particular upon the user's role. The key is that optimization from a user's perspective is only possible within the context of the relevant user role(s), since it is that which determines what resources are visible to those users. Trying to use information given to one role with another isn't going to be helpful at all.
Then, there is a significant difference between a scheduler or broker who returns advisory information and one who returns authoritative information.
I don't see a major difference, or at least, not until you start having nailed-down reservations attached. Without any committed reservations, a candidate execution plan is indistinguishable in the case of it being issued by an advisory or an authoritative broker. Furthermore, it was agreed at the last GGF that neither of the RSS services would cause any reservations to be entered into. By that, I mean that no consumption of resources would start that could be charged to the user; systems are naturally free to try to pre-stage configurations in response to queries for candidates if they so choose, but they will be doing the service grid equivalent of speculative execution (and, for example, won't be able to start staging files using the user's identity) and if things fall through, it will be the provider who will have to swallow the costs. (I forsee this sort of risk being the foundation of a viable business model in some circumstances, but would not want to force anyone to adopt it.)
In the advisory case, he is essentially a discovery service helping you locate services with nice properties observed in the usefully recent past. However, until you act on that information by attempting to acquire service, you have no idea whether the service is actually nice or not for your goals. He may be suddenly congested, or have differentiated policies which make your request impossible to fullfill despite his overall nice status.
In the authoritative case, the scheduler actually has some control over those remote services, i.e. (part of) their capacity is reserved to be used at his discretion. In this case, he can actually determined when and how much service you should obtain, and inform you of a stateful adjustment to his overall resource plans that includes you.
I see that as part of an advance-reservation protocol, and not as part of the CSG/EPS. The reason for this is that different applications will need different strategies for recovery from this sort of failure. Some will want to just go straight back to the user and say "it can't be done" but others will wish to try some number of execution plans first. Because of the wide variety of possibilities involved, it looks like a generalized workflow problem (not just a job workflow problem) and it is therefore outside the scope of RSS and instead inside the "Job Manager" wooly cloud^W^Wblack box. The other point is that there is no way to stop a resource from going away unexpectedly or a client from failing to consume its allocation. The resource might get hit by a disaster (natural or man-made), the client might die, etc. But this isn't a new problem; it occurs in the world of business every day and they deal with it. We should learn from them (e.g. by describing the consequences of such failures in terms of finanicial penalties) and not reinvent this particular wheel.
There might be protocol differences to support these cases, e.g. an authoritative answer might carry some rights assertion that you can present to the service, or the broker may have to go update remote services behind the scenes but before you manage to contact them. Also, the authoritative broker may demand that you "decline" allocations he has issued you, while you can presumably walk away from an advisory information source since no stateful allocation has been made.
What you think of as brokering is more than what I think of as brokering it seems. In my view of things, what you call brokering is what I would describe as a higher-level service built on top of brokering (i.e. brokering plus reservation). While this is an interesting topic to work on, I'm not convinced that enough of the answers exist in more than one place for it to be worth pushing forward with this wider stuff at this stage. Once we (as a community) have a bit more implementation experience, the time for standardizing this stuff will be a lot riper.
I do think you are on the right track to call a ranking function "policy", and in non-trivial scheduling regimes I think you will always have to present such policy rather than being able to obtain a sufficient picture of the environment to make decisions yourself. This is because of several things:
1. You will not get an atomically consistent view of the environment to act on, while the remote manager may have such a mechanism.
That's true. An atomically consistent view would require locking of databases (or equivalent) across organizations, and is therefore a total non-starter.
2. You will most likely not get a complete picture of the future service allocation plans, in the event of advance reservation. It would be too much data to exchange for each request; it might include confidential information; and it might not be clearly separable from the scheduling algorithm that might include statistical approximations and/or other hueristics.
At the University of Manchester we're looking into these things in more detail. At the moment, it looks like the brokering world is going to be categorizable into multiple types of service depending on the quality of information available. This work is still actively ongoing though, so it is a bit too soon to report on it.
3. You will not get a complete picture of the differentiated polcicies that different brokers or services apply to specific individuals, because it may be confidential and is also meaningless without a global view of other competing activities.
(I think I've covered this point already.)
Thus, I think a significant step of ANY distributed planning exercise with non-trivial QoS will be the sort of handshake captured in WS-Agreement: make an "offer" bearing policy expressions that describe what you want; allow the authoritative resource manager to consider this offer among others and its stateful policy and resource availability models; obtain an answer of whether the manager can arrange services according to your offer; and (optionally) introspect to find out HOW the manager will provide you service.
All interface "refactorings" are not equivalent across protocol boundaries between parties with different authority and trust roles...
True, but I'm not at all convinced that WS-Ag is quite in the sweet spot either. It would help a lot if it was easier to tackle the document parts of the spec separately from the service parts because at the moment, the sheer size of the overall spec and the feeling that you have to read virtually all of it to understand it[*] is scareing some people off. Donal. [* FWIW, I found I had to read not just the spec but also some of the presentations about it too to understand what was going on. To me, that indicates that the spec document itself has not yet captured all that you intend it to. ]