RE: [ogsa-wg] RE: Modeling State: Technical Questions

Dear all, I think something needs to be clarified with regards to handling multiple jobs with one message. The beauty of document-oriented interactions is that you can do things like... <job-details-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-001</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-010</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-029</job-id> </job-details-request> Or <job-suspend-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-005</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-008</job-id> </job-suspend-request> The schema for the above document can allow anything from 0 to N number of <job-id> elements. What WS-RF and WS-Transfer and REST are doing is require that each message be directed to only one resource. As a result, when it comes to defining groups of resources, additional resources (representing collections) have to be created. Populating and managing the collections require additional messages. The WS-RF/Ws-Transfer/REST model is a special case of the document-oriented model I described above... <!-- just one resource all the time --> <job-suspend-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> </job-suspend-request> In the WS-RF/WS-Transfer case the job ID will have to be part of the wsa:To (wsa -> WS-Addressing) header. In REST, it is the URI on which the operations are called (if we are using HTTP and the HTTP verbs, then the URI usually has the 'http' prefix). An example, of a WS-Addresing EPR... <my:MyEndpointReference> <wsa:Address> urn:ogsa:job:guid:bla-bla-bla-002</wsa:Address> </my:MyEndpointReference> Please note that the address doesn't have to carry transport/transfer specific semantics (i.e. it doesn't have to be an 'http' URI). The above would require a registry look up if that's necessary or perhaps a P2P network that will know how to direct the message to its destination based only on the above information. The sender of the message may never actually see the transport-specific address of the receiving service. This means that a SOAP msg like the one bellow will have to be sent... <soap:Envelope> <soap:Header> <wsa:To>urn:ogsa:job:guid:bla-bla-bla-002</wsa:To> </soap:Header> <soap:Body> <job:job-suspend-request /> </soap:Body> </soap:Envelope> Well... it turns out that this can by the special case of a message that looks like this... <soap:Envelope> <soap:Header> <wsa:To>urn:ogsa:job:service:Newcastle-Job-Service</wsa:To> <!-- again... a registry lookup although http://ncl.ac.uk/job-service could have also been used --> </soap:Header> <soap:Body> <job:job-suspend-request> <job:job-id>urn:ogsa:job:guid:bla-bla-bla-002</job:job-id> <job:job-id>urn:ogsa:job:guid:bla-bla-bla-003</job:job-id> <job:job-id>urn:ogsa:job:guid:bla-bla-bla-004</job:job-id> </job:job-suspend-request> </soap:Body> </soap:Envelope> What WS-RF and WS-Transfer seem to be doing is to expose to the wire the programming abstraction that most of us are used to (i.e. calling methods on an object). As a result, systems based on a special case are designed rather than the more general case. It's been our argument all along that this may not be the most efficient way of designing systems in general (perhaps in certain application domains the WS-RF/WS-Transfer approach may be appropriate) but I am prepared to be corrected on this :-) Best regards, -- Savas Parastatidis http://savas.parastatidis.name

Savas Parastatidis wrote:
Dear all,
I think something needs to be clarified with regards to handling multiple jobs with one message. The beauty of document-oriented interactions is that you can do things like...
<job-details-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-001</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-010</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-029</job-id> </job-details-request>
Or
<job-suspend-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-005</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-008</job-id> </job-suspend-request>
The schema for the above document can allow anything from 0 to N number of <job-id> elements.
the trouble with any bulk operation is you have to handle partial failure. You need either atomic operations (not long lived transactions over HTTP Savas, I wouldn't be that daft), or a way of indicating that only a bit went wrong Hence the 207 Multi-Status response in WebDav, the "something failed, look in the message". WebDav is still single instance (here a RESTy URL), but you can set >1 property and so have partial failure. SOAP just has SOAPFault and extensions; no explicit multiple failure response. WS-RF-ResourceProperties has a similar problem with SetResourceProperties, but a different failure model in which any failure to set can result in a WS-BaseFault, indicating which failed, but providing no apparent information on which worked. It seems to me that if you want to bulk stuff, you do need ways of (a) handling partial failure and (b) declaring what happens on partial failure. For the curions, WebDav's failure mode on file operations (MOVE, COPY) is explicitly declared to be that of failed file operations of Win98 on a FAT32 filesystem [1,2] Alternatively, you dont go for bulk operations, neither on a multiple jobs, or on multiple properties of a job (remember, WS-RF doesn't declare atomic/transacted property operations, so all you do here is increase the window of instability, a window that already exists). Instead you just stream a series of operations over the same HTTP1.1 connection -assuming that everything is accessible at the same far-end host, and get a series of (potentially out of order, we are talking HTTP1.1) responses. This could be efficient, and you could do better handling of failure. But you do need a SOAP stack that can keep an HTTP1.1 channel open for multiple requests. Axis doesnt, even if you get httpclient to do the HTTP work; I don't know about .NET/WSE. You also need developers to model the communication correctly. Manipulating JAXRPC proxies as if they represent remote objects is *clearly* the wrong way to do it. You'd almost want to model a queue of requests waiting to be POSTed, a queue you can fill up then push out. Something like this, in your Java-era language of choice :- //different queues for SOAP, REST Queue q=new Soap12RequestQueue(); q.add(new StatePut(job1.uri,Job.LIFECYCLE,Job.SUSPENDED)); //let the queue reorder stuff if it wants to q.add(new StatePut(job2.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_OPTIMAL); q.add(new StatePut(job3.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_LAST); q.setEventHandler(this); q.nonBlockingSubmit(); No, there is no code behind this example, and I am avoiding any hints as to what the even handler would look like. I think the key point is that once you embrace remote operations as async actions, then you can model the manipulations differently. Note also that I am representing job suspension not as an explicit suspend() operation, but as a request to put a job into the suspended state. This API could work with our friend REST just as easily as with WS-RF... Anyway Savas, to conclude: do you have any evidence that a single document is suboptimal compared to a sequences of requests over an open HTTP/1.1 connection? That is, assuming we ignore the SHOULD in the HTTP1.1 specification " Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods" [3] -Steve [1] WebDav http://www.ietf.org/rfc/rfc2518.txt S8.9.2 "after encountering an error moving a non-collection resource as part of an infinite depth move, the server SHOULD try to finish as much of the original move operation as possible." [2] http://lists.w3.org/Archives/Public/w3c-dist-auth/1997JulSep/0177.html [3] RFC2616 HTTP1.1

Steve's note raises a key point for me: do we really want to force the user (as Savas seems to be advocating) to keep track of jobs running at a remote site? I'd rather send a request "kill all my jobs" or "kill all my jobs that have run for more than a day" to the factory than carefully keep track of all jobs that I have active, and how long they have been running, so that I can send the big document (or stream) discussed below. Ian. At 02:10 PM 4/5/2005 +0100, Steve Loughran wrote:
Savas Parastatidis wrote:
Dear all, I think something needs to be clarified with regards to handling multiple jobs with one message. The beauty of document-oriented interactions is that you can do things like... <job-details-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-001</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-010</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-029</job-id> </job-details-request> Or <job-suspend-request> <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-005</job-id> <job-id>urn:ogsa:job:guid:bla-bla-bla-008</job-id> </job-suspend-request> The schema for the above document can allow anything from 0 to N number of <job-id> elements.
the trouble with any bulk operation is you have to handle partial failure. You need either atomic operations (not long lived transactions over HTTP Savas, I wouldn't be that daft), or a way of indicating that only a bit went wrong
Hence the 207 Multi-Status response in WebDav, the "something failed, look in the message". WebDav is still single instance (here a RESTy URL), but you can set >1 property and so have partial failure.
SOAP just has SOAPFault and extensions; no explicit multiple failure response. WS-RF-ResourceProperties has a similar problem with SetResourceProperties, but a different failure model in which any failure to set can result in a WS-BaseFault, indicating which failed, but providing no apparent information on which worked.
It seems to me that if you want to bulk stuff, you do need ways of (a) handling partial failure and (b) declaring what happens on partial failure. For the curions, WebDav's failure mode on file operations (MOVE, COPY) is explicitly declared to be that of failed file operations of Win98 on a FAT32 filesystem [1,2]
Alternatively, you dont go for bulk operations, neither on a multiple jobs, or on multiple properties of a job (remember, WS-RF doesn't declare atomic/transacted property operations, so all you do here is increase the window of instability, a window that already exists). Instead you just stream a series of operations over the same HTTP1.1 connection -assuming that everything is accessible at the same far-end host, and get a series of (potentially out of order, we are talking HTTP1.1) responses.
This could be efficient, and you could do better handling of failure. But you do need a SOAP stack that can keep an HTTP1.1 channel open for multiple requests. Axis doesnt, even if you get httpclient to do the HTTP work; I don't know about .NET/WSE. You also need developers to model the communication correctly. Manipulating JAXRPC proxies as if they represent remote objects is *clearly* the wrong way to do it. You'd almost want to model a queue of requests waiting to be POSTed, a queue you can fill up then push out. Something like this, in your Java-era language of choice :-
//different queues for SOAP, REST Queue q=new Soap12RequestQueue();
q.add(new StatePut(job1.uri,Job.LIFECYCLE,Job.SUSPENDED)); //let the queue reorder stuff if it wants to q.add(new StatePut(job2.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_OPTIMAL); q.add(new StatePut(job3.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_LAST);
q.setEventHandler(this); q.nonBlockingSubmit();
No, there is no code behind this example, and I am avoiding any hints as to what the even handler would look like. I think the key point is that once you embrace remote operations as async actions, then you can model the manipulations differently. Note also that I am representing job suspension not as an explicit suspend() operation, but as a request to put a job into the suspended state. This API could work with our friend REST just as easily as with WS-RF...
Anyway Savas, to conclude: do you have any evidence that a single document is suboptimal compared to a sequences of requests over an open HTTP/1.1 connection? That is, assuming we ignore the SHOULD in the HTTP1.1 specification " Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods" [3]
-Steve
[1] WebDav http://www.ietf.org/rfc/rfc2518.txt S8.9.2
"after encountering an error moving a non-collection resource as part of an infinite depth move, the server SHOULD try to finish as much of the original move operation as possible."
[2] http://lists.w3.org/Archives/Public/w3c-dist-auth/1997JulSep/0177.html
[3] RFC2616 HTTP1.1
_______________________________________________________________ Ian Foster www.mcs.anl.gov/~foster Math & Computer Science Div. Dept of Computer Science Argonne National Laboratory The University of Chicago Argonne, IL 60439, U.S.A. Chicago, IL 60637, U.S.A. Tel: 630 252 4619 Fax: 630 252 1997 Globus Alliance, www.globus.org

On Apr 05, Ian Foster loaded a tape reading:
Steve's note raises a key point for me: do we really want to force the user (as Savas seems to be advocating) to keep track of jobs running at a remote site?
I'd rather send a request "kill all my jobs" or "kill all my jobs that have run for more than a day" to the factory than carefully keep track of all jobs that I have active, and how long they have been running, so that I can send the big document (or stream) discussed below.
Ian.
Isn't that a red herring though? We're talking about services here, and not user interfaces. Surely some user application or client-side configuration must keep track of which remote sites need cleanup? A user isn't going to broadcast to all sites/services "please kill anything I might have left there". That's what expiration times are for. :-) [sorry, couldn't resist] I agree that there are an unbounded number of interesting and compact expressions selection criteria one might use. I think this alone is justification for something more than enumerated resource IDs. Should this go in application-specific document payload? In the WS-A "To:" field or some other header? Some combination of both? Unfortunately, that last option usually seems to be the right one in these situations. But all of this seems to be going in circles... the underlying questions we need answered when building and operating the infrastructure are: which messages are applicable to this resource, and where do I send them? Are we somehow addressing this while we argue about whether the open parenthesis goes before or behind the message target, and I just cannot see it? karl -- Karl Czajkowski karlcz@univa.com
participants (4)
-
Ian Foster
-
Karl Czajkowski
-
Savas Parastatidis
-
Steve Loughran