
On Mar 21, Christopher Smith modulated:
No Ian ... I’m not saying that the ability to tell whether your job has been submitted is not important. What I am saying is that for systems like LSF, implementing this in the submission protocol is not necessary as there are other ways of figuring this out (such as Karl outlined in another email). Thus, having this protocol implemented is not important for our customers, who might rather see other features added to the product.
-- Chris
Chris, that is not an accurate characterization of what I wrote. I said that it is easy to implement an idempotent message protocol in front of two different LSF local submission mechanisms (hold+release and job name annotations), and therefore was implying that it is not a significant implementation burden to support idempotence in a standard protocol! I suspect that most schedulers have some "client provided job name" option that can be used in a general adapter solution: 1. standard protocol client chooses unique idempotence ID 2. standard protocol client sends message, possibly more than once 3. standard protocol service receives message, possibly more than once a. implement a persistent atomic <client-ID, engine-ID, job-ID> map b. # log client-ID for protocol idempotence if <client-ID, *, *> is not in map then engine-ID := new unique ID() enter <client-ID, engine-ID, nil> into map else find engine-ID for client-ID in map endif c. # log local ID and job ID for local idempotence if <client-ID, engine-ID, nil> is in map then if job system has a job annotated with engine-ID then job-ID := job system ID for job annotated with engine-ID else job-ID := submit(job annotated with engine-ID) endif enter <client-ID, engine-ID, job-ID> into map else find job-ID for engine-ID in map endif this process is crash-recoverable to provide at-most-once semantics for local job submission, with as much reliability as there is in the persistent log mechanism. it also requires that jobs "linger" in the local scheduler (or accessible log files) long enough for recovery to take place. This works in practice if the service engine can formulate unique IDs that are unlikely to have a collision with any other client-specified name annotations for jobs. I use a separate client-ID and engine-ID to clarify that this solution does not depend on the client following a locale-specific unique job naming convention. Of course, the client must follow the standard protocol's conventions for unique message naming. karl -- Karl Czajkowski karlcz@univa.com