
Hi; I have no doubt that it would be relatively easy to add transactional semantics to most, if not all job schedulers. In a separate email to Ian and this mailing list I talk about the potential challenge of doing so in a manner that is efficient enough to support "ultra-high-throughput" HPC use cases that I'm aware of. ASSUMING that it is indeed difficult to support these existing use cases then I argue it's better to support transactional job submission semantics as an almost universally used extension than to simply exclude the use case by requiring those semantics in the base case. As I point out in the email, my assumption may be wrong and in fact the main scheduler vendors/suppliers may all (or mostly all) say that supporting transactional semantics is either something they already do or would have no objection to adding. In that case, we should definitely add this requirement to the base case and happily move forward. Regarding your concern that I'm trying to define as small-as-possible a base case, I'm not sure how to respond. An important thing to keep in mind is that I want to define an HPC profile that covers the common HPC use cases, not just the common HPC grid use cases. If the HPC grid profile doesn't cover the common "in-house" use cases then a second set of web service protocols will be needed to cover those cases (interoperability among heterogeneous clusters within an organization is definitely a common case). If that happens then we risk almost certain failure because vendors will not be willing to support two separate protocol sets and the in-house use cases are currently far more common than the grid use cases. Vendors will extend the in-house protocol set to cover grid use cases and "grid-only" protocols will very likely get ignored. That said, I agree with your last paragraph about the requirements for a design, namely the need for an interoperable interface subset plus a robust extensibility mechanism that covers the topics you listed. But I will argue that transactional semantics are not a REQUIREMENT for interoperability -- merely something that in MOST cases is enormously useful. Marvin. -----Original Message----- From: Karl Czajkowski [mailto:karlcz@univa.com] Sent: Tuesday, March 21, 2006 12:50 PM To: Marvin Theimer Cc: Carl Kesselman; humphrey@cs.virginia.edu; ogsa-wg@ggf.org Subject: Re: [ogsa-wg] Paper proposing "evolutionary vertical design efforts" On Mar 21, Marvin Theimer modulated:
Hi;
Whereas I agree with you that at-most-once semantics are very desirable, I would like to point out that not all existing job schedulers implement them. I know that both LSF and CCS (the Microsoft HPC job scheduler) don't. I've been trying to find out whether PBS and SGE do or don't.
Aside from the comment Ian made that it is potentially useful to have at-most-once message semantics even if there is some potential for a local failure in the message processing (handoff from message layer to local scheduler), I believe LSF does support "hold" states where a job can be submitted and released as a two-phase interaction. Such a mechanism is sufficient to implement a complete end-to-end at-most-once submission by implementing logging in the message engine to associate the client message with a local job handle before submitting. Most schedulers also support job naming/annotation fields which are exposed through the job query interface. This can also be used to implement a reliable correlation between message/request IDs and the local implementation job. This can also be used to synthesize an at most once semantics in front of the scheduler, by determining if a local job exists before trying to resubmit with the same name. This behavior can be hidden in the message engine and "local adapter".
So, this brings up the following slightly more general question: should the simplest base case be the simplest case that does something useful, or should it be more complicated than that? I can see good arguments on both sides:
I find it a little disconcerting that this question is still being asked about job systems, because there is a history of having made and retracted this decision before. We did it in Globus with GRAM, and I think several of the other Grid projects did as well... The subset interface is not sufficient for users. A solution MUST incorporate an interoperable subset plus a robust extensibility mechanism to allow any of: 1. incremental evolution of the core subset 2. vendor-specific localization/extension 3. community/site-specific localization/extension 4. discovery of extended mode support 5. graceful degradation in the absence of extended mode support In my opinion, anything short of this will just add another non-interoperable interface to the hodge-podge of non-interoperable solutions that already exist. karl -- Karl Czajkowski karlcz@univa.com