
On Feb 14, Takuya Araki (Biglobe) loaded a tape reading:
Alain, Karl:
Thank you for the excerpts!
So Karl, let me confirm your opinion: Are you thinking of using the method which is implemented in WS-GRAM? If so, I agree with that it increases the reliability of the system, but it doesn't seem to be able to replace asynchronous operations completely.
Yes, that is my suggestion and no I do not completely agree with your assessment... I am attaching a longer description of my position which was drafted as part of a different activity. I think it addresses this topic well enough so I will only paste it and then add a few more comments specifically about WS-Agreement. I apologize for the length. RELIABILITY: One issue that arises in job management is the potentially high stakes of errors in the management protocol. Specifically, while a client may well be expected to tolerate rejection and even execution failure, it is undesireable to have job management state "disappear" or get duplicated due to message-layer, client, or provider failures. We assert that the simple creation pattern is sufficient because there are multiple binding options available to get reliability. One approach is to have transactional bindings which use commit and rollback to make sure a creation occurs exactly once or not at all. However, not all Internet deployments will have transactional messaging, so another approach is to use WS-Addressing MessageID header to get idempotent invocation. This mechanism allows the client to resend an application message in case it missed the response message, and the provider must resend the response without duplicating actions such as actual execution. This "at most once" semantics is sufficient for real world EMS scenarios, as the client will eventually learn whether the execution was accepted or not. GT 4.0 GRAM optionally uses a proprietary message-level concept that is equivalent to the WS-Addressing MessageID in order to work across any binding and because it was designed before the WS-Addressing mechanism was fully clarified by its authors. In this variant, the idempotent ID is sent as an extra field in the input message and interpreted early in the request processing to avoid duplication. The idempotent operation style optimizes the success case by not requiring additional message exchanges unless there is an error condition or timeout. In contrast, a transactional approach requires more exchanges to setup and commit (or cancel) the invocation. ASYNCHRONY: Another concern is how much delay may be encountered in the creation pattern. The WS architecture makes no statements about the relative duration of an "in-out" message exchange, as that is essentially a binding issue. Two camps seem to dislike long message delays for different reasons which are both somewhat inconsistent with the WS architecture model. First, one camp confuses the WSDL message protocol with an API specification, so they believe that a WSDL "in-out" message must mean a blocking procedure call in their client bindings. They are uncomfortable with the implication that a long delay cannot be processed asynchronously by their application. We believe this is a mistaken viewpoint to take when designing protocols, because the asynchrony of the client can be addressed simply by using an appropriate tooling strategy. They should move to better tooling if their existing client stubs are indeed this limiting. For example, the C language WS tooling in GT 4.0 generates synchronous and asynchronous stubs for each WSDL operation, so our GT 4.0 GRAM client tool is able to perform the creation message exchange using asynchronous "post message" and "response callback" programmatic interfaces. The newest JAX RPC revision also is said to have better support for asynchronous invocation. The second dissenting camp is concerned that long response delays will be fragile because some bindings cannot tolerate the delay. For example, a SOAP over HTTP binding may not be able to wait long enough for a response before the TCP connection is lost. Because an "in-out" message pattern addresses the response implicitly via binding-level context, it is not as durable as an explicit peer-to-peer message exchange using "in only" messages sent to explicit endpoints at both peer sites. Unfortunately, this style is also difficult in constrained binding environments because SOAP over HTTP is often valued specifically for being asymmetric and allowing simple NAT/firewall traversal from "anonymous" clients to well-known providers. If we render a peer-to-peer interface model in order to support fragile bindings, we create obstacles for these other common deployment environments. [Please note, WS-Agreement is meant to support this peer-to-peer pattern optionally, but I admit that we may need to make some technical cleanup on the spec before completion... it seems to have lost some details in the time I have been absent from the workgroup discussions. See the optional initiator's EPR field in the create call. What is missing, I think, is clear normative text on how this will be used by the responding party and how/if it should appear in the Agreement context for correlative purposes.] A third solution which happens to address both camps simultaneously is to render explicit "post" and "poll" interfaces to initiate the logical operation and then hold the response at the provider until the client can reconnect and retrieve the result. This supports fragile bindings in NAT/firewall environments and also yields an asynchronous interface with naive tooling that generates synchronous stubs for "in out" message exchanges. However, it complicates the application-level modelling and lifts transport-level message buffering into the application-level service implementation. We argue that a simple "in-out" message exchange in combination with idempotent ID mechanisms can equally well satisfy the fragile bindings and NAT/firewall asymmetry without significant impact on the application protocol. It still retains state at the provider, but rather than adding post/poll operations to the WSDL it simply uses message send for "post" and message resend for "poll". This also means that the application logic can be written using the more natural "in-out" pattern and a simple buffering layer at (or slightly above) the binding code can handle the resends at the provider. This model supports asynchrony because the polling exchange can "block" at the messaging level until the binding times out. In other words, a client who logically iterates with: ID = new_identifier while is_non_response ( result = EPR->create(ID, input content) ) repeat will not "spin" but rather post a new copy of the idempotent create message at the frequency at which the binding signals an error, e.g. a closed connection. While the binding is still functioning, the underlying protocol such as SOAP over HTTP will provide for asynchronous delivery of the response message. (Note of course that the above snippet could be written in a longer psuedo-code format by using an asynchronous post/callback model such as we use in GT4 C bindings. The sychronous create call is nothing more than a post followed by a conditional wait on the callback monitor.) This approach does not permit visibility as to WHY the response is taking so long, but merely visibility as to WHETHER the response has been issued yet. There is no lifecycle model in WS-Agreement for the decision making that the Agreement provider performs while considering an Agreement creation request, nor should we take lightly the burden of trying to develop such a model.
(By the way, it seems that GRAM has "batch mode" as an application level asynchronous operation. That's why the current method is enough for GRAM, I think.)
No, actually our "globusrun" tool's batch mode is not about asynchronous submission. It simply turns off the subscription and state monitoring that the tool normally does after submission. The submission step itself is roughly equivalent to the WS-Agreement createAgreement operation. karl -- Karl Czajkowski karlcz@univa.com