Re: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"

22 Mar 2006

      On Mar 21, Christopher Smith modulated:
...
No Ian ... I’m not saying that the ability to tell whether your job has
been submitted is not important. What I am saying is that for systems
like LSF, implementing this in the submission protocol is not necessary
as there are other ways of figuring this out (such as Karl outlined in
another email). Thus, having this protocol implemented is not important
for our customers, who might rather see other features added to the
product.
-- Chris
Chris, that is not an accurate characterization of what I wrote.

I said that it is easy to implement an idempotent message protocol in
front of two different LSF local submission mechanisms (hold+release
and job name annotations), and therefore was implying that it is not a
significant implementation burden to support idempotence in a standard
protocol!

I suspect that most schedulers have some "client provided job name"
option that can be used in a general adapter solution:

   1. standard protocol client chooses unique idempotence ID

   2. standard protocol client sends message, possibly more than once

   3. standard protocol service receives message, possibly more than once

      a. implement a persistent atomic <client-ID, engine-ID, job-ID> map

      b. # log client-ID for protocol idempotence
         if <client-ID, *, *> is not in map
         then 
             engine-ID := new unique ID()
             enter <client-ID, engine-ID, nil> into map
         else
             find engine-ID for client-ID in map
         endif

      c. # log local ID and job ID for local idempotence
         if <client-ID, engine-ID, nil> is in map
         then
             if job system has a job annotated with engine-ID
             then
                job-ID := job system ID for job annotated with engine-ID
             else
                job-ID := submit(job annotated with engine-ID)
             endif
             enter <client-ID, engine-ID, job-ID> into map
         else
             find job-ID for engine-ID in map
         endif

      this process is crash-recoverable to provide at-most-once semantics
      for local job submission, with as much reliability as there is in
      the persistent log mechanism.  it also requires that jobs "linger"
      in the local scheduler (or accessible log files) long enough for
      recovery to take place.

This works in practice if the service engine can formulate unique IDs
that are unlikely to have a collision with any other client-specified
name annotations for jobs.  I use a separate client-ID and engine-ID
to clarify that this solution does not depend on the client following
a locale-specific unique job naming convention.  Of course, the client
must follow the standard protocol's conventions for unique message
naming.

karl

-- 
Karl Czajkowski
karlcz@univa.com