RE: [ogsa-wg] Teleconference minutes - 2 November 2005

Donal, I see from the RSS description on GridForge that RSS will consist of a Candidate Set Generator, which I assume could be loosely described as a "matching" service, and an Execution Planning Service, which I assume could loosely be described at a "scheduling service"). Come to think of it, I guess the combination of the two could be loosely described as a "broking service". A couple of questions come to mind. First, at this year's London F2F people began talking about the need to do scheduling over all services, not just execution services. I.e. we may need to schedule access to data services and network services. It would be desirable to have a framework generic enough to handle all sorts of resources (as long as they in turn provide the necessary information). Has anyone considered this w.r.t. RSS? Second, there are lots of ways to implement scheduling. Would it be possible just to specify the interfaces and allow implementations of the services to use whichever algorithms they want? E.g. in the data architecture we say almost nothing about the functioning of a data federation service, because we can abstract from most of the inner workings. Do we clearly understood what the output of the RSS service should be? Third, you mention the idea of passing in a ranking function parameter. This sounds similar to a question I raised recently on this list about selecting data sources, in which I suggested passing in a policy argument. Admittedly I was making that suggestion in the particular context of passing a policy argument to a reference renewing service, but the two cases do seem sufficiently similar to merit a comparison. The replies I received then expressed a strong preference for letting the selection be done by the client rather than attempting to parameterise a service. I think it's worth drilling down on this issue, and this is what I attempt to do for the rest of this message. At a sufficiently high level, we are attempting to do something simple. We have two inputs: a list of activities and a list of resources, and one output: a mapping of activities to resources. In the simplest case, we are selecting one resource (e.g. the replica to read data from or the protocol/source from which to transfer the data), and this may be so simple that we treat it differently (e.g. always in client code instead of a separate service). More complicated cases might produce a set of pairs, or a more complicated conditional mapping. In each case we could either (i) move the data to the client and perform the selection in client code (ii) pass the algorithm as a parameter to another service (iii) have a separate service that incorporates some aspects of the algorithm while allowing some parameterisation by policy. One factor that influences this choice might be who has the access to the necessary information. If the client has privileged information that should not be made public, then the choice might have to be made by the client; or the framework should encrypt the policy parameter and provide a trust relationship with the selection service. Conversely, the client may want to see only an abstraction of the detailed selection process. If we have parameterisation, then the question arises of how expressive a language we need to specify the parameterisation. Do we need a general-purpose programming language (e.g. Python script) or will the draft WS-Agreement language be sufficient? The replies I received suggested both that the parameterisation would require a general programming language and that the client may have privileged knowledge that it does not wish to share, and therefore it would be better to leave decisions to the client. I'm rather sceptical of the first point but it would be good to examine real systems to determine what is really needed. There are clearly many possible answers in this space - which brings us back to Ian's list of schedulers. One question might be whether we can identify an interface, or a small number of interfaces, that generalises a significant number of practical use cases. Best wishes, Dave. -----Original Message----- From: owner-ogsa-wg@ggf.org [mailto:owner-ogsa-wg@ggf.org] On Behalf Of Donal K. Fellows Sent: 09 December 2005 09:37 To: 'ogsa-wg' Cc: ogsa-rss-wg@ggf.org Subject: Re: [ogsa-wg] Teleconference minutes - 2 November 2005 Steven Newhouse wrote:
A selector interface is being defined as part of the RSS activities (https://forge.gridforum.org/projects/ogsa-rss-wg/) . They may have a comment to make on the nature of the interface they are considering - and how it could/could not encompass these services.
I (as OGSA-RSS co-chair) welcome any input people have over possible strategies. At the moment, I'm thinking in terms of ordered sets where the caller supplies a ranking function[*] (or input to such a function). That gives plenty of flexibility and power. I'd love to have input from other people as to what sorts of things ought to go in the ranking function parameter; we could leave it unspecified (easy for the RSS WG!) but that's not good for working towards an interoperable solution so I'd like to seed the space with at least a minimal level of things that must be supported. Donal Fellows. [* Experience with Condor indicates that a "general acceptability" filter is also useful, but a first hack at that is a straight resource satisfaction check which ought to be performed anyway. ]

Dave, I think the interesting point of collision with "candidate set generation" and "planning", e.g. traditional scheduling, is when you consider non-trivial QoS regimes. For example, when services do not give best effort service to all callers, and you cannot just assume statistical averaging to predict future service based on past, etc. Then, there is a significant difference between a scheduler or broker who returns advisory information and one who returns authoritative information. In the advisory case, he is essentially a discovery service helping you locate services with nice properties observed in the usefully recent past. However, until you act on that information by attempting to acquire service, you have no idea whether the service is actually nice or not for your goals. He may be suddenly congested, or have differentiated policies which make your request impossible to fullfill despite his overall nice status. In the authoritative case, the scheduler actually has some control over those remote services, i.e. (part of) their capacity is reserved to be used at his discretion. In this case, he can actually determined when and how much service you should obtain, and inform you of a stateful adjustment to his overall resource plans that includes you. There might be protocol differences to support these cases, e.g. an authoritative answer might carry some rights assertion that you can present to the service, or the broker may have to go update remote services behind the scenes but before you manage to contact them. Also, the authoritative broker may demand that you "decline" allocations he has issued you, while you can presumably walk away from an advisory information source since no stateful allocation has been made. I do think you are on the right track to call a ranking function "policy", and in non-trivial scheduling regimes I think you will always have to present such policy rather than being able to obtain a sufficient picture of the environment to make decisions yourself. This is because of several things: 1. You will not get an atomically consistent view of the environment to act on, while the remote manager may have such a mechanism. 2. You will most likely not get a complete picture of the future service allocation plans, in the event of advance reservation. It would be too much data to exchange for each request; it might include confidential information; and it might not be clearly separable from the scheduling algorithm that might include statistical approximations and/or other hueristics. 3. You will not get a complete picture of the differentiated polcicies that different brokers or services apply to specific individuals, because it may be confidential and is also meaningless without a global view of other competing activities. Thus, I think a significant step of ANY distributed planning exercise with non-trivial QoS will be the sort of handshake captured in WS-Agreement: make an "offer" bearing policy expressions that describe what you want; allow the authoritative resource manager to consider this offer among others and its stateful policy and resource availability models; obtain an answer of whether the manager can arrange services according to your offer; and (optionally) introspect to find out HOW the manager will provide you service. All interface "refactorings" are not equivalent across protocol boundaries between parties with different authority and trust roles... (More specific comments inline.) On Dec 10, Dave Berry modulated: ...
One factor that influences this choice might be who has the access to the necessary information. If the client has privileged information that should not be made public, then the choice might have to be made by the client; or the framework should encrypt the policy parameter and provide a trust relationship with the selection service. Conversely, the client may want to see only an abstraction of the detailed selection process.
Right, I think it is critically important to find out the nature of the roles here before trying to design a protocol.
If we have parameterisation, then the question arises of how expressive a language we need to specify the parameterisation. Do we need a general-purpose programming language (e.g. Python script) or will the draft WS-Agreement language be sufficient?
The replies I received suggested both that the parameterisation would require a general programming language and that the client may have privileged knowledge that it does not wish to share, and therefore it would be better to leave decisions to the client. I'm rather sceptical of the first point but it would be good to examine real systems to determine what is really needed.
Just a reminder: one often-discussed "trick" for handling such isolation of data and abstraction is to allow delegation of resource management authority. An explicit advance reservation interface can abstract away the application goals (and confidential information) to describe required capabilities. The remote manager can schedule and allocate a solution to these without understanding the application, policy permitting. The result of the reservation is a new "virtual resource" and a policy giving the client-side manager authority over it. Then, the client-side logic can perform the fine-grained planning of that resource capability for its specific application functions, revealing only the minimum operational information.
There are clearly many possible answers in this space - which brings us back to Ian's list of schedulers. One question might be whether we can identify an interface, or a small number of interfaces, that generalises a significant number of practical use cases.
Best wishes,
Dave.
karl -- Karl Czajkowski karlcz@univa.com
participants (2)
-
Dave Berry
-
Karl Czajkowski