For what it's worth, the Globus user community has been
running thousands of instances of our GRAM job submission service for
quite a few years, with many many millions of jobs running through them,
and as far as I am aware, no-one has ever asked for the ability to manage
more than one job at a time. Certainly the lack of this facility hasn't
seemed to stop anyone.
Lots of caveats can be applied here: maybe people did ask, and I didn't
hear; maybe they didn't think to ask; maybe our workloads are special
(although there is a great variety). But it is a data point.
Ian.
At 11:59 AM 4/6/2005 +0100, Mark McKeown wrote:
Hi Paul,
Moving the question from
can I suspend multiple
jobs by sending a single message to a resource (either
REST or WS-Resource) to weither this is a good thing.
There is a balance between simplicity and efficiency -
using a single message intoduces more complexities, as
Steve Loughran illustrated, but is potentially more
efficient than sending mutliple messages.
Remembering that "Early optimisation is the root of all
evil" (Knuth) - is adding support for suspending mutiple
jobs using a single message an example of early
optimisation?
I would imagine that this should be a straight forward
question since there is already considerable experience
in using computational grids. Are users demanding the
ability to suspend mutliple jobs using a single message?
Is it for improved efficiency reasons? From my experience
no, but others on this list will have considerably more
experience.
Could this be a case of "worse is better", simplicity
is more important than efficiency?
Perhaps there are other reasons for using a single message
to interact with multiple jobs?
cheers
Mark
> Ian,
>
>
>
> I agree that this is good progress. So let's bank that and see if we
can
> we can agree on one more thing, and then I'll ask a question.
>
>
>
> Considering your list of abilities (a, b & c) below, do we agree
that in
> terms of expressiveness, the ordering is:
>
>
>
> c>b>a
>
>
>
> i.e. using approach c, a client can request operations on:
>
> a) single jobs: "where (jobid =
urn:guid:364)"
>
> b) sets of jobs: "where (jobid = urn:guid:364) or
(jobid =
> urn:guid:401)"
>
>
>
> If there is agreement on this, then we could move on to discussing
why
> it is felt necessary to provide more than just c for the job
submission
> service.
>
>
>
> Regards
>
> Paul
>
>
>
> Ian wrote...
>
> >Savas:
>
> >
>
> >It seems that we are in agreement, then, that we want the
ability to:
>
> >
>
> >a) Request operations on individual jobs identified by some sort
of
> "jobid"
>
> >
>
> >b) Request operations on sets of jobs identified by a
user-supplied
> list of "jobids"
>
> >
>
> >c) Request operations on sets of jobs identified by more
abstract
> criteria
>
> >
>
> >We also agree that (as I expressed in the email that started
this
> discussion) such >requests can be expressed in a few different
ways,
> with somewhat different >characteristics.
>
> >
>
> >That's progress I hope.
>
> >
>
> >Ian.
>
>
>
> ________________________________
>
> From: Ian Foster
[mailto:foster@mcs.anl.gov]
> Sent: 05 April 2005 17:59
> To: Savas Parastatidis; Steve Loughran
> Cc: Mark McKeown; Karl Czajkowski; Dennis Gannon; Samuel Meder;
ogsa-wg;
> dave.pearson@oracle.com; gray@microsoft.com;
humphrey@cs.virginia.edu;
> grimshaw@virginia.edu; aherbert@microsoft.com;
gcf@indiana.edu;
> mark.linesch@hp.com; Frank Siebenlist; Tony Hey; Dave Berry; Paul
Watson
> Subject: RE: [ogsa-wg] RE: Modeling State: Technical Questions
>
>
>
> [I'm feeling increasingly bad about sending email to all of the
people
> CCed here, who may not be interested in these issues at all but
got
> addressed by Tony long ago...]
>
> Savas:
>
> It seems that we are in agreement, then, that we want the ability
to:
>
> a) Request operations on individual jobs identified by some sort
of
> "jobid"
>
> b) Request operations on sets of jobs identified by a user-supplied
list
> of "jobids"
>
> c) Request operations on sets of jobs identified by more
abstract
> criteria
>
> We also agree that (as I expressed in the email that started
this
> discussion) such requests can be expressed in a few different ways,
with
> somewhat different characteristics.
>
> That's progress I hope.
>
> Ian.
>
> At 02:44 PM 4/5/2005 +0100, Savas Parastatidis wrote:
>
>
>
>
> Dear Ian,
>
>
>
> I dont think that the approach I proposed forces the user to do
more
> than they would have to do anyway if EPRs were used. It is still
the
> case that someone has to manage the EPRs to the resources in WSRF.
This
> is similar to what happens in the real world. The online bookstore
will
> ask for my credit card number (a URI), or the book store will as for
an
> ISBN (another URI) or multiple ISBNs if I want to buy multiple
books.
> The banking service will ask for my bank account number (another
URI
> perhaps).
>
>
>
> Also, there is no reason why a kill all my jobsmessage couldnt also
be
> supported. But please note that this message is now addressed to
the
> service (the container of resources) and not, as in the case of
WSRF, to
> a specific resource. This is no different from what I am
advocating.
>
>
>
> Also& to Steves point about partial failure. If one wishes
atomic
> transaction semantics, I dont see the difference from the two
> approaches&
>
>
>
> Atomic
>
> Msg -> resource 1
>
> Msg -> resource 2
>
> Msg -> resource 3
>
> End Atomic
>
>
>
> Vs
>
>
>
> Msg
>
> Atomic
>
> Resource 1
>
> Resource 2
>
> Resource 3
>
> End Atomic
>
>
>
> In fact, I would argue that the latter is better because:
>
>
>
> 1. It uses fewer messages (and, Steve, I am not assuming only HTTP
and
> the optimisations that may be supported)
>
>
>
> 2. I can more easily deal with the failures in an
application
> specific-manner since my atomic TX semantics do not span multiple
msgs.
>
>
>
> (Anyway& who wants to do atomic TXs over the Web anyway?
:-)
>
>
>
> Regards,
>
> --
> Savas Parastatidis
>
http://savas.parastatidis.name
>
>
>
>
> From: Ian Foster
[mailto:foster@mcs.anl.gov]
> Sent: Tuesday, April 05, 2005 2:22 PM
> To: Steve Loughran; Savas Parastatidis
> Cc: Mark McKeown; Karl Czajkowski; Dennis Gannon; Samuel Meder;
ogsa-wg;
> dave.pearson@oracle.com; gray@microsoft.com;
humphrey@cs.virginia.edu;
> grimshaw@virginia.edu; aherbert@microsoft.com;
gcf@indiana.edu;
> mark.linesch@hp.com; Frank Siebenlist; Tony Hey; Dave Berry
> Subject: Re: [ogsa-wg] RE: Modeling State: Technical Questions
>
>
>
> Steve's note raises a key point for me: do we really want to force
the
> user (as Savas seems to be advocating) to keep track of jobs running
at
> a remote site?
>
> I'd rather send a request "kill all my jobs" or "kill
all my jobs that
> have run for more than a day" to the factory than carefully
keep track
> of all jobs that I have active, and how long they have been running,
so
> that I can send the big document (or stream) discussed below.
>
> Ian.
>
>
> At 02:10 PM 4/5/2005 +0100, Steve Loughran wrote:
>
> Savas Parastatidis wrote:
>
> Dear all,
> I think something needs to be clarified with regards to
handling
> multiple jobs with one message. The beauty of
document-oriented
> interactions is that you can do things like...
> <job-details-request>
>
<job-id>urn:ogsa:job:guid:bla-bla-bla-001</job-id>
>
<job-id>urn:ogsa:job:guid:bla-bla-bla-010</job-id>
>
<job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id>
>
<job-id>urn:ogsa:job:guid:bla-bla-bla-029</job-id>
> </job-details-request>
> Or
> <job-suspend-request>
>
<job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id>
>
<job-id>urn:ogsa:job:guid:bla-bla-bla-005</job-id>
>
<job-id>urn:ogsa:job:guid:bla-bla-bla-008</job-id>
> </job-suspend-request>
> The schema for the above document can allow anything from 0 to N
number
> of <job-id> elements.
>
>
> the trouble with any bulk operation is you have to handle
partial
> failure. You need either atomic operations (not long lived
transactions
> over HTTP Savas, I wouldn't be that daft), or a way of indicating
that
> only a bit went wrong
>
> Hence the 207 Multi-Status response in WebDav, the "something
failed,
> look in the message". WebDav is still single instance (here a
RESTy
> URL), but you can set >1 property and so have partial
failure.
>
> SOAP just has SOAPFault and extensions; no explicit multiple
failure
> response. WS-RF-ResourceProperties has a similar problem with
> SetResourceProperties, but a different failure model in which
any
> failure to set can result in a WS-BaseFault, indicating which
failed,
> but providing no apparent information on which worked.
>
> It seems to me that if you want to bulk stuff, you do need ways of
(a)
> handling partial failure and (b) declaring what happens on
partial
> failure. For the curions, WebDav's failure mode on file
operations
> (MOVE, COPY) is explicitly declared to be that of failed file
operations
> of Win98 on a FAT32 filesystem [1,2]
>
> Alternatively, you dont go for bulk operations, neither on a
multiple
> jobs, or on multiple properties of a job (remember, WS-RF
doesn't
> declare atomic/transacted property operations, so all you do here
is
> increase the window of instability, a window that already
exists).
> Instead you just stream a series of operations over the same
HTTP1.1
> connection -assuming that everything is accessible at the same
far-end
> host, and get a series of (potentially out of order, we are
talking
> HTTP1.1) responses.
>
> This could be efficient, and you could do better handling of
failure.
> But you do need a SOAP stack that can keep an HTTP1.1 channel open
for
> multiple requests. Axis doesnt, even if you get httpclient to do
the
> HTTP work; I don't know about .NET/WSE. You also need developers
to
> model the communication correctly. Manipulating JAXRPC proxies as
if
> they represent remote objects is *clearly* the wrong way to do it.
You'd
> almost want to model a queue of requests waiting to be POSTed, a
queue
> you can fill up then push out. Something like this, in your
Java-era
> language of choice :-
>
> //different queues for SOAP, REST
> Queue q=new Soap12RequestQueue();
>
> q.add(new StatePut(job1.uri,Job.LIFECYCLE,Job.SUSPENDED));
> //let the queue reorder stuff if it wants to
> q.add(new
>
StatePut(job2.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_OPTIMAL);
> q.add(new
>
StatePut(job3.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_LAST);
>
> q.setEventHandler(this);
> q.nonBlockingSubmit();
>
> No, there is no code behind this example, and I am avoiding any
hints as
> to what the even handler would look like. I think the key point is
that
> once you embrace remote operations as async actions, then you can
model
> the manipulations differently. Note also that I am
representing job
> suspension not as an explicit suspend() operation, but as a request
to
> put a job into the suspended state. This API could work with our
friend
> REST just as easily as with WS-RF...
>
> Anyway Savas, to conclude: do you have any evidence that a
single
> document is suboptimal compared to a sequences of requests over an
open
> HTTP/1.1 connection? That is, assuming we ignore the SHOULD in
the
> HTTP1.1 specification " Clients SHOULD NOT pipeline requests
using
> non-idempotent methods or non-idempotent sequences of methods"
[3]
>
> -Steve
>
>
> [1] WebDav
http://www.ietf.org/rfc/rfc2518.txt
S8.9.2
>
> "after encountering an error moving a non-collection
> resource as part of an infinite depth move, the server SHOULD try to
> finish as much of the original move operation as possible."
>
> [2]
> http://lists.w3.org/Archives/Public/w3c-dist-auth/1997JulSep/0177.html
>
> [3] RFC2616 HTTP1.1
>
> _______________________________________________________________
> Ian Foster www.mcs.anl.gov/~foster
> Math & Computer Science Div. Dept of Computer Science
> Argonne National Laboratory The University of Chicago
> Argonne, IL 60439, U.S.A. Chicago, IL 60637, U.S.A.
> Tel: 630 252 4619 Fax: 630 252 1997
> Globus Alliance, www.globus.org <http://www.globus.org/>
>
> _______________________________________________________________
> Ian Foster www.mcs.anl.gov/~foster
> Math & Computer Science Div. Dept of Computer Science
> Argonne National Laboratory The University of Chicago
> Argonne, IL 60439, U.S.A. Chicago, IL 60637, U.S.A.
> Tel: 630 252 4619 Fax: 630 252 1997
> Globus Alliance, www.globus.org <http://www.globus.org/>
>
>
_______________________________________________________________
Ian Foster www.mcs.anl.gov/~foster
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The University of Chicago
Argonne, IL 60439, U.S.A. Chicago, IL 60637, U.S.A.
Tel: 630 252 4619 Fax: 630 252 1997
Globus Alliance, www.globus.org