
On Mar 21, Marvin Theimer modulated:
I know that systems like LSF get used in high throughput settings where the service time for a job request is an issue.... ... If my assumption is correct, then this common use case in the HPC world may be one that many, if not most job schedulers would have a hard time supporting if they have to provide at-most-once transactional semantics for all job submissions.
I do not think anyone claimed that at-most-once semantics should be mandated on all requests. Certainly nobody from Globus says this... it is an optional feature of our job submission protocol, to be chosen by the client depending on their needs. I think the question is much more about whether (or how many times) an optional at-most-once extension mechanism is defined. Secondarily, there is the question of efficiently determining if it (as an extension) is available in a remote service. A third interesting question might be determining what the "cost" of the extension is versus the cost of having lost jobs against an unknown remote service implementation when setting up to do an extremely high throughput run as you describe. The high throughput case is interesting to me, because it is precisely the user community that demanded an efficient at-most-once semantics from GRAM! They are the ones who blast enough jobs through to notice statistical failure rates and the cost of recovery. karl -- Karl Czajkowski karlcz@univa.com