
Donal K. Fellows wrote:
First, at this year's London F2F people began talking about the need to do scheduling over all services, not just execution services. I.e. we may need to schedule access to data services and network services. It would be desirable to have a framework generic enough to handle all sorts of resources (as long as they in turn provide the necessary information). Has anyone considered this w.r.t. RSS?
It's not being actively considered as such, but it is certainly at the back of my mind and I'm trying to not take any decisions that would close off such use.
Second, there are lots of ways to implement scheduling. Would it be possible just to specify the interfaces and allow implementations of the services to use whichever algorithms they want? E.g. in the data architecture we say almost nothing about the functioning of a data federation service, because we can abstract from most of the inner workings. Do we clearly understood what the output of the RSS service should be?
We're most certainly not saying much about the innards of the services! That would be a mistake as it is clear from the pre-existing work (done by the GSA-RG) that there are many approaches in this space. From my own experience, I suspect that it is likely that there will be many different "brokering" systems about, many of which are specialized to dealing with a particular domain. At the moment, I thinking about the abstract output of the RSS services as ordered sets of candidate execution plans, probably encapsulated within WS-Agreement Templates. The ground elements of the plans (or at least those parts that relate to computational activity, of course) will likely be JSDL documents suitable for submission to BES containers. When it comes to data stuff, I don't understand the requirements well enough to say much, but I believe that the suggested outer structure (ordered set of agreement templates) will extend to that sort of thing nicely; it is just the leaves of such a tree that I don't understand.
Third, you mention the idea of passing in a ranking function parameter. This sounds similar to a question I raised recently on this list about selecting data sources, in which I suggested passing in a policy argument. Admittedly I was making that suggestion in the particular context of passing a policy argument to a reference renewing service, but the two cases do seem sufficiently similar to merit a comparison. The replies I received then expressed a strong preference for letting the selection be done by the client rather than attempting to parameterise a service. I think it's worth drilling down on this issue, and this is what I attempt to do for the rest of this message.
The problem with the other way (caller gets all offers and then chooses) is that it doesn't scale well when the number of potential candidates goes up. By supplying a ranking function to the generator side and using a service concretization based on something like WS-Enumeration, you can come up with protocols that take reasonably high quality decisions with very little network traffic or local computation. Note that I'm looking for "good enough" decisions and not optimal ones. I think that optimality is something of a chimera, and that you can get within a few percent of it for far less effort. This will produce an overall system that works very well cheaply most of the time, and in those cases where the costs are so massive that even a small difference is expensive it will be possible to use a non-standard service that tries harder for optimality (while probably charging extra for the privilege of course). [...]
If we have parameterisation, then the question arises of how expressive a language we need to specify the parameterisation. Do we need a general-purpose programming language (e.g. Python script) or will the draft WS-Agreement language be sufficient?
As far as I can see from reading the version that went out to public comment, WS-Agreement currently just punts on this issue. I'm hoping that it will be possible to avoid putting a general programming language expression though; it's remarkably hard to secure such things, and mandating one language over another (necessary for interoperability) would trigger a lot of arguments. This is an area that's definitely going to need more work. (If we can't specify something XMLish, we need to choose a suitable language based on capabilities and safety properties.)
The replies I received suggested both that the parameterisation would require a general programming language and that the client may have privileged knowledge that it does not wish to share, and therefore it would be better to leave decisions to the client. I'm rather sceptical of the first point but it would be good to examine real systems to determine what is really needed.
I think that as long as you are just ordering and not selecting, you have far less of a problem. The final decision rests with the client still; all they're doing is exporting the first stage of their selection process (the initial sort) to the server side. But you can only make that efficient if you use something like WS-Enumeration. The other advantage of doing this is that you end up with the equivalent of a distributed merge sort when you start doing tricky things with delegated candidate set generation.
There are clearly many possible answers in this space - which brings us back to Ian's list of schedulers. One question might be whether we can identify an interface, or a small number of interfaces, that generalises a significant number of practical use cases.
I thought that (modulo that RSS isn't working on reservation; out of scope according to the current charter) was what we were working on. :-) Donal.