RE: [ogsa-wg] RE: Modeling State: Technical Questions

Dear Ian, I don't think that the approach I proposed forces the user to do more than they would have to do anyway if EPRs were used. It is still the case that someone has to manage the EPRs to the resources in WSRF. This is similar to what happens in the real world. The online bookstore will ask for my credit card number (a URI), or the book store will as for an ISBN (another URI) or multiple ISBNs if I want to buy multiple books. The banking service will ask for my bank account number (another URI perhaps). Also, there is no reason why a "kill all my jobs" message couldn't also be supported. But please note that this message is now addressed to the service (the container of resources) and not, as in the case of WSRF, to a specific resource. This is no different from what I am advocating. Also... to Steve's point about partial failure. If one wishes atomic transaction semantics, I don't see the difference from the two approaches... Atomic Msg -> resource 1 Msg -> resource 2 Msg -> resource 3 End Atomic Vs Msg Atomic Resource 1 Resource 2 Resource 3 End Atomic In fact, I would argue that the latter is better because: 1. It uses fewer messages (and, Steve, I am not assuming only HTTP and the optimisations that may be supported) 2. I can more easily deal with the failures in an application specific-manner since my atomic TX semantics do not span multiple msgs. (Anyway... who wants to do atomic TXs over the Web anyway? :-) Regards, -- Savas Parastatidis http://savas.parastatidis.name ________________________________ From: Ian Foster [mailto:foster@mcs.anl.gov] Sent: Tuesday, April 05, 2005 2:22 PM To: Steve Loughran; Savas Parastatidis Cc: Mark McKeown; Karl Czajkowski; Dennis Gannon; Samuel Meder; ogsa-wg; dave.pearson@oracle.com; gray@microsoft.com; humphrey@cs.virginia.edu; grimshaw@virginia.edu; aherbert@microsoft.com; gcf@indiana.edu; mark.linesch@hp.com; Frank Siebenlist; Tony Hey; Dave Berry Subject: Re: [ogsa-wg] RE: Modeling State: Technical Questions Steve's note raises a key point for me: do we really want to force the user (as Savas seems to be advocating) to keep track of jobs running at a remote site? I'd rather send a request "kill all my jobs" or "kill all my jobs that have run for more than a day" to the factory than carefully keep track of all jobs that I have active, and how long they have been running, so that I can send the big document (or stream) discussed below. Ian. At 02:10 PM 4/5/2005 +0100, Steve Loughran wrote: Savas Parastatidis wrote: Dear all, I think something needs to be clarified with regards to handling multiple jobs with one message. The beauty of document-oriented interactions is that you can do things like... <job-details-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-001</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-010</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-029</job-id> </job-details-request> Or <job-suspend-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-005</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-008</job-id> </job-suspend-request> The schema for the above document can allow anything from 0 to N number of <job-id> elements. the trouble with any bulk operation is you have to handle partial failure. You need either atomic operations (not long lived transactions over HTTP Savas, I wouldn't be that daft), or a way of indicating that only a bit went wrong Hence the 207 Multi-Status response in WebDav, the "something failed, look in the message". WebDav is still single instance (here a RESTy URL), but you can set >1 property and so have partial failure. SOAP just has SOAPFault and extensions; no explicit multiple failure response. WS-RF-ResourceProperties has a similar problem with SetResourceProperties, but a different failure model in which any failure to set can result in a WS-BaseFault, indicating which failed, but providing no apparent information on which worked. It seems to me that if you want to bulk stuff, you do need ways of (a) handling partial failure and (b) declaring what happens on partial failure. For the curions, WebDav's failure mode on file operations (MOVE, COPY) is explicitly declared to be that of failed file operations of Win98 on a FAT32 filesystem [1,2] Alternatively, you dont go for bulk operations, neither on a multiple jobs, or on multiple properties of a job (remember, WS-RF doesn't declare atomic/transacted property operations, so all you do here is increase the window of instability, a window that already exists). Instead you just stream a series of operations over the same HTTP1.1 connection -assuming that everything is accessible at the same far-end host, and get a series of (potentially out of order, we are talking HTTP1.1) responses. This could be efficient, and you could do better handling of failure. But you do need a SOAP stack that can keep an HTTP1.1 channel open for multiple requests. Axis doesnt, even if you get httpclient to do the HTTP work; I don't know about .NET/WSE. You also need developers to model the communication correctly. Manipulating JAXRPC proxies as if they represent remote objects is *clearly* the wrong way to do it. You'd almost want to model a queue of requests waiting to be POSTed, a queue you can fill up then push out. Something like this, in your Java-era language of choice :- //different queues for SOAP, REST Queue q=new Soap12RequestQueue(); q.add(new StatePut(job1.uri,Job.LIFECYCLE,Job.SUSPENDED)); //let the queue reorder stuff if it wants to q.add(new StatePut(job2.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_OPTIMAL); q.add(new StatePut(job3.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_LAST); q.setEventHandler(this); q.nonBlockingSubmit(); No, there is no code behind this example, and I am avoiding any hints as to what the even handler would look like. I think the key point is that once you embrace remote operations as async actions, then you can model the manipulations differently. Note also that I am representing job suspension not as an explicit suspend() operation, but as a request to put a job into the suspended state. This API could work with our friend REST just as easily as with WS-RF... Anyway Savas, to conclude: do you have any evidence that a single document is suboptimal compared to a sequences of requests over an open HTTP/1.1 connection? That is, assuming we ignore the SHOULD in the HTTP1.1 specification " Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods" [3] -Steve [1] WebDav http://www.ietf.org/rfc/rfc2518.txt S8.9.2 "after encountering an error moving a non-collection resource as part of an infinite depth move, the server SHOULD try to finish as much of the original move operation as possible." [2] http://lists.w3.org/Archives/Public/w3c-dist-auth/1997JulSep/0177.html [3] RFC2616 HTTP1.1 _______________________________________________________________ Ian Foster www.mcs.anl.gov/~foster Math & Computer Science Div. Dept of Computer Science Argonne National Laboratory The University of Chicago Argonne, IL 60439, U.S.A. Chicago, IL 60637, U.S.A. Tel: 630 252 4619 Fax: 630 252 1997 Globus Alliance, www.globus.org <http://www.globus.org/>

Dear Savas, I'm a little worried about all your oversimplifications that state that this is just the same as that... none of this stuff is trivial in distributed applications! For example, if you send individual requests, then each of those requests is intercepted by the runtime where some form of policy is enforced. If you group the jobIds on the application level as you propose, then you will push all the associated policy enforcement into the application level, and make it the responsibility of the application to do the right thing. I remember from previous lives that we had the exact same discussion about policy enforcement in the interceptors for corba and while breaching the containers in EJBs. If you are able to intercept in the soap-runtimes/corba-interceptors/EJB-containers, then you can enforce policy transparent of the application, and any policy enforcement that you can keep out of the hand of the application programmers is GOOD. The more the interceptor-code understands about the request-message semantics, the finer-grained you can express the policy that can be transparently enforced. If you only understand interfaces and operations inside of the interceptor, then that is as fine-grained as it gets. If you understand WSRF's concepts of resourceIds, then you can go one level deeper. The latter would be the same if the interceptor-code would know about the convention where resources are identified in a certain message parameter...or understand the convention used in the hierarchical structure of URIs... many application-transparent authorization solutions are built with the help of this "insider's knowledge". And if you don't do it like that inside of an interceptor, then I'm very sure that you will build an additional layer on the application level which would do the exact same thing - this wheel has been reimplemented so many times... The big issue is that you get into potential problems if you start to mix those implementation paradigms by sometimes going through the "automated" policy enforcement and sometimes going around it. You have to be very careful and explain this clearly to the application programmer otherwise all kinds of interesting exploits will be possible. Note that this has nothing to do with wsrf vs soa or interceptors vs library-layers: in all cases you will try to keep the policy enforcement as much as possible hidden from the application programmer, and in all cases you have to watch for those loopholes that would circumvent the anticipated policy enforcement because of ill-thought-out performance hacks. In other words, please don't make it sound so simple. For all operations that are associated with the breach of a security boundary, you have to think twice, three times before you start to optimize. And if the optimization is truly warranted, it won't be a simple hack anymore after you're done implementing it right. Securely yours, Frank. Savas Parastatidis wrote:
Dear Ian,
I don’t think that the approach I proposed forces the user to do more than they would have to do anyway if EPRs were used. It is still the case that someone has to manage the EPRs to the resources in WSRF. This is similar to what happens in the real world. The online bookstore will ask for my credit card number (a URI), or the book store will as for an ISBN (another URI) or multiple ISBNs if I want to buy multiple books. The banking service will ask for my bank account number (another URI perhaps).
Also, there is no reason why a “kill all my jobs” message couldn’t also be supported. But please note that this message is now addressed to the service (the container of resources) and not, as in the case of WSRF, to a specific resource. This is no different from what I am advocating.
Also… to Steve’s point about partial failure. If one wishes atomic transaction semantics, I don’t see the difference from the two approaches…
Atomic
Msg -> resource 1
Msg -> resource 2
Msg -> resource 3
End Atomic
Vs
Msg
Atomic
Resource 1
Resource 2
Resource 3
End Atomic
In fact, I would argue that the latter is better because:
1. It uses fewer messages (and, Steve, I am not assuming only HTTP and the optimisations that may be supported)
2. I can more easily deal with the failures in an application specific-manner since my atomic TX semantics do not span multiple msgs.
(Anyway… who wants to do atomic TXs over the Web anyway? :-)
Regards,
-- Savas Parastatidis http://savas.parastatidis.name
* From: * Ian Foster [mailto:foster@mcs.anl.gov] *Sent:* Tuesday, April 05, 2005 2:22 PM *To:* Steve Loughran; Savas Parastatidis *Cc:* Mark McKeown; Karl Czajkowski; Dennis Gannon; Samuel Meder; ogsa-wg; dave.pearson@oracle.com; gray@microsoft.com; humphrey@cs.virginia.edu; grimshaw@virginia.edu; aherbert@microsoft.com; gcf@indiana.edu; mark.linesch@hp.com; Frank Siebenlist; Tony Hey; Dave Berry *Subject:* Re: [ogsa-wg] RE: Modeling State : Technical Questions
Steve's note raises a key point for me: do we really want to force the user (as Savas seems to be advocating) to keep track of jobs running at a remote site?
I'd rather send a request "kill all my jobs" or "kill all my jobs that have run for more than a day" to the factory than carefully keep track of all jobs that I have active, and how long they have been running, so that I can send the big document (or stream) discussed below.
Ian.
At 02:10 PM 4/5/2005 +0100, Steve Loughran wrote:
Savas Parastatidis wrote:
Dear all, I think something needs to be clarified with regards to handling multiple jobs with one message. The beauty of document-oriented interactions is that you can do things like... <job-details-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-001</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-010</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-029</job-id> </job-details-request> Or <job-suspend-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-005</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-008</job-id> </job-suspend-request> The schema for the above document can allow anything from 0 to N number of <job-id> elements.
the trouble with any bulk operation is you have to handle partial failure. You need either atomic operations (not long lived transactions over HTTP Savas, I wouldn't be that daft), or a way of indicating that only a bit went wrong
Hence the 207 Multi-Status response in WebDav, the "something failed, look in the message". WebDav is still single instance (here a RESTy URL), but you can set >1 property and so have partial failure.
SOAP just has SOAPFault and extensions; no explicit multiple failure response. WS-RF-ResourceProperties has a similar problem with SetResourceProperties, but a different failure model in which any failure to set can result in a WS-BaseFault, indicating which failed, but providing no apparent information on which worked.
It seems to me that if you want to bulk stuff, you do need ways of (a) handling partial failure and (b) declaring what happens on partial failure. For the curions, WebDav's failure mode on file operations (MOVE, COPY) is explicitly declared to be that of failed file operations of Win98 on a FAT32 filesystem [1,2]
Alternatively, you dont go for bulk operations, neither on a multiple jobs, or on multiple properties of a job (remember, WS-RF doesn't declare atomic/transacted property operations, so all you do here is increase the window of instability, a window that already exists). Instead you just stream a series of operations over the same HTTP1.1 connection -assuming that everything is accessible at the same far-end host, and get a series of (potentially out of order, we are talking HTTP1.1) responses.
This could be efficient, and you could do better handling of failure. But you do need a SOAP stack that can keep an HTTP1.1 channel open for multiple requests. Axis doesnt, even if you get httpclient to do the HTTP work; I don't know about .NET/WSE. You also need developers to model the communication correctly. Manipulating JAXRPC proxies as if they represent remote objects is *clearly* the wrong way to do it. You'd almost want to model a queue of requests waiting to be POSTed, a queue you can fill up then push out. Something like this, in your Java-era language of choice :-
//different queues for SOAP, REST Queue q=new Soap12RequestQueue();
q.add(new StatePut(job1.uri,Job.LIFECYCLE,Job.SUSPENDED)); //let the queue reorder stuff if it wants to q.add(new StatePut(job2.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_OPTIMAL); q.add(new StatePut(job3.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_LAST);
q.setEventHandler(this); q.nonBlockingSubmit();
No, there is no code behind this example, and I am avoiding any hints as to what the even handler would look like. I think the key point is that once you embrace remote operations as async actions, then you can model the manipulations differently. Note also that I am representing job suspension not as an explicit suspend() operation, but as a request to put a job into the suspended state. This API could work with our friend REST just as easily as with WS-RF...
Anyway Savas, to conclude: do you have any evidence that a single document is suboptimal compared to a sequences of requests over an open HTTP/1.1 connection? That is, assuming we ignore the SHOULD in the HTTP1.1 specification " Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods" [3]
-Steve
[1] WebDav http://www.ietf.org/rfc/rfc2518.txt S8.9.2
"after encountering an error moving a non-collection resource as part of an infinite depth move, the server SHOULD try to finish as much of the original move operation as possible."
[2] http://lists.w3.org/Archives/Public/w3c-dist-auth/1997JulSep/0177.html
[3] RFC2616 HTTP1.1
_______________________________________________________________ Ian Foster www.mcs.anl.gov/~foster <http://www.mcs.anl.gov/%7Efoster> Math & Computer Science Div. Dept of Computer Science Argonne National Laboratory The University of Chicago Argonne , IL 60439 , U.S.A. Chicago , IL 60637 , U.S.A. Tel: 630 252 4619 Fax: 630 252 1997 Globus Alliance, www.globus.org <http://www.globus.org/>
-- Frank Siebenlist franks@mcs.anl.gov The Globus Alliance - Argonne National Laboratory

Hi Savas, see in line...
Also... to Steve's point about partial failure. If one wishes atomic transaction semantics, I don't see the difference from the two approaches...
Atomic
Msg -> resource 1
Msg -> resource 2
Msg -> resource 3
End Atomic
Vs
Msg
Atomic
Resource 1
Resource 2
Resource 3
End Atomic
In fact, I would argue that the latter is better because:
1. It uses fewer messages (and, Steve, I am not assuming only HTTP and the optimisations that may be supported)
2. I can more easily deal with the failures in an application specific-manner since my atomic TX semantics do not span multiple msgs.
I agree with 2, but some operations cannot be made Atomic - it depends on what the Resources you are dealing with. If the Resources are rockets and the command is launch it is not exactly possible to roll-back if there is a failure with one of the rockets - the same logic may apply to jobs, it may not be possible to roll-back a terminated job. (I am not sure what an atomic operation is in the context of Web services - I am assuming the operation has ACID properties) Would it be possible to use SMTP to solve the efficiency problem - if we give the jobs mailto URIs then only one message would have to be transfered across the network. RFC 821: "When the same message is sent to multiple recipients the SMTP encourages the transmission of only one copy of the data for all the recipients at the same destination host." Each resource could report their faults independently using standard SOAP faults or HTTP error codes. Not sure this would work with WS-Addressing & SOAP, is it possible to have more than one wsa:To in a SOAP Header? cheers MArk PS - I am not advocating SMTP for BES.
(Anyway... who wants to do atomic TXs over the Web anyway? :-)
Regards,
-- Savas Parastatidis http://savas.parastatidis.name
________________________________
From: Ian Foster [mailto:foster@mcs.anl.gov] Sent: Tuesday, April 05, 2005 2:22 PM To: Steve Loughran; Savas Parastatidis Cc: Mark McKeown; Karl Czajkowski; Dennis Gannon; Samuel Meder; ogsa-wg; dave.pearson@oracle.com; gray@microsoft.com; humphrey@cs.virginia.edu; grimshaw@virginia.edu; aherbert@microsoft.com; gcf@indiana.edu; mark.linesch@hp.com; Frank Siebenlist; Tony Hey; Dave Berry Subject: Re: [ogsa-wg] RE: Modeling State: Technical Questions
Steve's note raises a key point for me: do we really want to force the user (as Savas seems to be advocating) to keep track of jobs running at a remote site?
I'd rather send a request "kill all my jobs" or "kill all my jobs that have run for more than a day" to the factory than carefully keep track of all jobs that I have active, and how long they have been running, so that I can send the big document (or stream) discussed below.
Ian.
At 02:10 PM 4/5/2005 +0100, Steve Loughran wrote:
Savas Parastatidis wrote:
Dear all, I think something needs to be clarified with regards to handling multiple jobs with one message. The beauty of document-oriented interactions is that you can do things like... <job-details-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-001</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-010</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-029</job-id> </job-details-request> Or <job-suspend-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-005</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-008</job-id> </job-suspend-request> The schema for the above document can allow anything from 0 to N number of <job-id> elements.
the trouble with any bulk operation is you have to handle partial failure. You need either atomic operations (not long lived transactions over HTTP Savas, I wouldn't be that daft), or a way of indicating that only a bit went wrong
Hence the 207 Multi-Status response in WebDav, the "something failed, look in the message". WebDav is still single instance (here a RESTy URL), but you can set >1 property and so have partial failure.
SOAP just has SOAPFault and extensions; no explicit multiple failure response. WS-RF-ResourceProperties has a similar problem with SetResourceProperties, but a different failure model in which any failure to set can result in a WS-BaseFault, indicating which failed, but providing no apparent information on which worked.
It seems to me that if you want to bulk stuff, you do need ways of (a) handling partial failure and (b) declaring what happens on partial failure. For the curions, WebDav's failure mode on file operations (MOVE, COPY) is explicitly declared to be that of failed file operations of Win98 on a FAT32 filesystem [1,2]
Alternatively, you dont go for bulk operations, neither on a multiple jobs, or on multiple properties of a job (remember, WS-RF doesn't declare atomic/transacted property operations, so all you do here is increase the window of instability, a window that already exists). Instead you just stream a series of operations over the same HTTP1.1 connection -assuming that everything is accessible at the same far-end host, and get a series of (potentially out of order, we are talking HTTP1.1) responses.
This could be efficient, and you could do better handling of failure. But you do need a SOAP stack that can keep an HTTP1.1 channel open for multiple requests. Axis doesnt, even if you get httpclient to do the HTTP work; I don't know about .NET/WSE. You also need developers to model the communication correctly. Manipulating JAXRPC proxies as if they represent remote objects is *clearly* the wrong way to do it. You'd almost want to model a queue of requests waiting to be POSTed, a queue you can fill up then push out. Something like this, in your Java-era language of choice :-
//different queues for SOAP, REST Queue q=new Soap12RequestQueue();
q.add(new StatePut(job1.uri,Job.LIFECYCLE,Job.SUSPENDED)); //let the queue reorder stuff if it wants to q.add(new StatePut(job2.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_OPTIMAL); q.add(new StatePut(job3.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_LAST);
q.setEventHandler(this); q.nonBlockingSubmit();
No, there is no code behind this example, and I am avoiding any hints as to what the even handler would look like. I think the key point is that once you embrace remote operations as async actions, then you can model the manipulations differently. Note also that I am representing job suspension not as an explicit suspend() operation, but as a request to put a job into the suspended state. This API could work with our friend REST just as easily as with WS-RF...
Anyway Savas, to conclude: do you have any evidence that a single document is suboptimal compared to a sequences of requests over an open HTTP/1.1 connection? That is, assuming we ignore the SHOULD in the HTTP1.1 specification " Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods" [3]
-Steve
[1] WebDav http://www.ietf.org/rfc/rfc2518.txt S8.9.2
"after encountering an error moving a non-collection resource as part of an infinite depth move, the server SHOULD try to finish as much of the original move operation as possible."
[2] http://lists.w3.org/Archives/Public/w3c-dist-auth/1997JulSep/0177.html
[3] RFC2616 HTTP1.1
_______________________________________________________________ Ian Foster www.mcs.anl.gov/~foster Math & Computer Science Div. Dept of Computer Science Argonne National Laboratory The University of Chicago Argonne, IL 60439, U.S.A. Chicago, IL 60637, U.S.A. Tel: 630 252 4619 Fax: 630 252 1997 Globus Alliance, www.globus.org <http://www.globus.org/>
participants (3)
-
Frank Siebenlist
-
Mark McKeown
-
Savas Parastatidis