
Trying to wrap several tangents back together here... I agree that at-most-once submission could be supported by an "almost universally used extension" and everyone should be happy on that front. Having a few extensions such as the "hold+release" and "annotation" might even be workable if less than ideal. What is less desirable is having many different extensions that "sort of" provide it, so that a typical heterogeneous-environment client/metascheduler must mode switch with every remote scheduler to try to get the same QoS, because they each have a different mechanism. The thing I fear more would be when the client/metascheduler cannot even detect the presence of extensions and must utilize detailed out-of-band knowledge to perform this feat of re-synthesizing reliable submission. As for your other comments, I suspect we are all being too abstract to communicate. I cannot imagine what it is about the "common HPC use cases" that is mutually exclusive of the "grid HPC use cases", so I have trouble understanding the dichotomy you are implying. In my world-view, the Grid case is the common use case with additional federation and distribution complexities. I think the crux of the issue for practical interoperability is defining the extension discovery mechanism(*) AND getting community-based standardization for some actual everyday extensions, and it sounds like we may agree on that point. If this part is not of primary focus, I believe interop will fail and the rest of the discussion would be pointless. karl (*) Extension discovery problem: to determine what extensions are supported by a service endpoint in order to customize the request, not to be confused with the harder and almost as critical extension-aware discovery problem: to choose an appropriate service endpoint based on the availability of extensions one wishes to employ. On Mar 21, Marvin Theimer modulated:
Hi;
I have no doubt that it would be relatively easy to add transactional semantics to most, if not all job schedulers. In a separate email to Ian and this mailing list I talk about the potential challenge of doing so in a manner that is efficient enough to support "ultra-high-throughput" HPC use cases that I'm aware of. ASSUMING that it is indeed difficult to support these existing use cases then I argue it's better to support transactional job submission semantics as an almost universally used extension than to simply exclude the use case by requiring those semantics in the base case.
As I point out in the email, my assumption may be wrong and in fact the main scheduler vendors/suppliers may all (or mostly all) say that supporting transactional semantics is either something they already do or would have no objection to adding. In that case, we should definitely add this requirement to the base case and happily move forward.
Regarding your concern that I'm trying to define as small-as-possible a base case, I'm not sure how to respond. An important thing to keep in mind is that I want to define an HPC profile that covers the common HPC use cases, not just the common HPC grid use cases. If the HPC grid profile doesn't cover the common "in-house" use cases then a second set of web service protocols will be needed to cover those cases (interoperability among heterogeneous clusters within an organization is definitely a common case). If that happens then we risk almost certain failure because vendors will not be willing to support two separate protocol sets and the in-house use cases are currently far more common than the grid use cases. Vendors will extend the in-house protocol set to cover grid use cases and "grid-only" protocols will very likely get ignored.
That said, I agree with your last paragraph about the requirements for a design, namely the need for an interoperable interface subset plus a robust extensibility mechanism that covers the topics you listed. But I will argue that transactional semantics are not a REQUIREMENT for interoperability -- merely something that in MOST cases is enormously useful.
Marvin.
-- Karl Czajkowski karlcz@univa.com