
Peter, et al, I'me time constrained right now, but let me give a quick response.
-----Original Message----- From: Peter G Lane [mailto:lane@mcs.anl.gov] Sent: Monday, May 22, 2006 6:32 PM To: Andrew Grimshaw Cc: ogsa-bes-wg@ggf.org Subject: Re: [ogsa-bes-wg] Updates to document
Some questions/comments...
I'm concerned about this term "container" as used in the document. This is very confusing as this term is already used to refer to the web service hosting infrastructure (i.e. Tomcat, GT's "standalone container", or figure 1's "service container"). [Andrew Grimshaw]
We've been using this term for some time - and indeed if you look at the larger EMS picture the "services" started may in fact be stateful web services. The sense is that the "service" - or legacy activity in the case of BES, is logically "contained" by its execution environment.
In section 2 under the "Naming" bullet. What exactly is being named?
Is the last bullet under section 2 saying that a BES-compliant service cannot support logical file names? Does this include converting file paths to, say, GridFTP URLs that might differ in the file path with respect to the GridFTP server? Does this mean that variables cannot be used in file paths?
[Andrew Grimshaw] There is a huge history on naming. Section 2 is referring to the activities that are being started - that their "names", "handles", "addresses", or whatever you want to call it will be EPR's We are referring to the "type" so to speak of the endpoints, some may be WS-Names, some may be renewable references, some may be managed jobs - the spec is silent on that, but recognizes that there are many choices.
I don't know much about WS-Names (anybody have a pointer to documents?), but it seems problematic for interoperability that in section 3 the method of exposing metadata about the AED type used is deliberately left up to the implementation. This makes it impossible to automatically find this information without either a defacto standard or some mapping of implementation to metadata discovery method. Why not at least specify how this metadata is discovered? A standard resource property would do it. [Andrew Grimshaw] The WS-Names documents can be found in grid forge off of the OGSA-Naming WG. The issue about how can you find meta data is crucial, but may be different in different renderings of the standard, e.g., WSRF versus WS-I, versus who knows what. This has been a massive political issue, and the result is a compromise. For example, in WSRF they are resource properties.
I have issues with the hardware and software resource properties. Do these describe the capabilities of the machine the service is running on or a cluster/SMP machine sitting behind the service? If the former, this is only useful if the activity runs on the same machine as service and seems like it should be optional. Also, what about the metadata describing the cluster or SMP machine? If the latter, then it is limiting in that a cluster may have machines with different OS, CPU configuration, available memory, architecture, etc...
[Andrew Grimshaw] The resource that the BES container is representing - not the machine it is on. Recall that a BES container may "front" a whole set of other containers. I agree with your comment about "limiting" if there are hetero resources behind. The resource model at this point in time does not address multiple entries for any given field.
In section 5.1.1 I find the IsAcceptingNewActivities RP rather useless. The client will find out just as easily if the service is accepting new activties depending on whether a NotAcceptingNewActivities fault is thrown or not. Checking the RP first both creates an extra WS operation call and cannot be guaranteed since there is a potential race condition between when the RP value is returned and the submission call is sent. If there was some negotiation protocol incorporated into BES I might buy this, but otherwise I don't see the point. [Andrew Grimshaw] There are always race conditions in DS's. What if I want to discover whether you are taking activities - but not necessarily start an activity, for example to construct a candidate set of BES containers?
In section 6.1, I feel it's superfluous to suffix the operation name with "FromJSDL". Are there plans to support something other than JSDL? If not, why not just make this "CreateActivity"?
[Andrew Grimshaw] BES is the simplest, most basic, "type" of a service container in the EMS architecture. Rather than assume that for all time only JSDL will be used seemed overly restrictive. In BES there are no plans for anything else - but ...
Section 6.1.1: "Document" suffix seems also superfluous. Of course it's a document. What else would it be? Also, I don't see why "createInSuspendedState" needs to be outside the JSDL document. This should be an extension to JSDL especially considering that the state model doesn't have a base "suspended" state. So if a service doesn't support suspended state extensions, this flag is meaningless.
[Andrew Grimshaw] I'll look at this one later -
Section 6.1.2: absolutely unique idempotence IDs are impossible to guarantee. I'm of the opinion that these should be valid only for the life of a job. Besides being ridiculous to assume that the service should keep track of these IDs to make sure it is never ever used again, doing so serves no real purpose. [Andrew Grimshaw] I happen to agree with you - certainly for all time is difficult, and depending on exactly what semantics you think you're getting, impossible. However, this is in there as a result insistence and a long discussion on input (ESI document) from the Globus and Unicore teams.
Section 6.1.3: If there is an optional subscription request element in the input, then there needs to be an optional subscription reference element either in the output (preferred to avoid an RP query round trip) or a resource property.
[Andrew Grimshaw] I'm not sure I follow - you mean you want a handle to the subscription in the return?
6.1.4: The names of the fault types isn't consistent. Either suffix with "Fault" or don't.
[Andrew Grimshaw] Yes, should be fiexed Sorry got to go now. Will get to rest later.
Spelling typo in section 6.2, page 19, 1st paragraph: "proceeding" -> "preceding".
6.2: I would prefer that the "*Status" elements not have "state" and "laststate" attributes but instead have two child elements "state" and "lastState" (or something similarly named). These elements would in turn have a child element that has a well defined (i.e. in this document) enumerated state representing one of the states in the general state diagram (i.e. New, Pending, Running, Canceled, Failed, or Finished) and one optional xsd:anyType element (or just an xsd:any) that specifies the state extension if one exists (i.e. StageIn, Suspended, etc...). At very least something needs to be changed to account for the new state model.
6.3: I was liking the new state model until I saw this operation. I don't like this idea at all of allowing the user to monkey with the state machine directly. I would be much happier if there were specific operations to inject specific inputs that the state machine should consider. The state machine should never be open to direct manipulation, but instead should decided for itself what state and when to transition to that state based on the current state and events. At *very* least this operation should have a fault that means "too bad, I'm not doing that". If the desire is to allow for, say, reseting of the job back to New so that it will run again or canceling without destroying the job, then specific operations can be added to the spec as is. Since sub-states like "suspended" are treated as extensions now, a "SuspendActivities" operation would seem more appropriate as an extension as well.
6.4 and 6.5: Could someone provide a use case for wanting to start and stop the acceptance of new activities via the WS interface.
6.6: This should be a resource property not a new operation.
I'm already commenting past page 20, so I'll stop here. One last observation, though. I noticed that this document makes no attempt at defining an activity resource. Is this intentionally out of scope?
Peter
On Mon, 2006-05-22 at 14:58 -0400, Andrew Grimshaw wrote:
All,
I have done the updates to the document discussed in the last two phone calls, and in the EMS meeting in Japan.
At a high level this includes modifications to adopt text/issues from the ESI document and the HPC profile working document. Note that this is NOT a final document.
These changes revolve around:
1: Changing CreateActivityFromJSDL to
a) take additional optional arguments, specifically notification arguments and an "idempotent" argument lifted from ESI.
b) An extension to JSDL suggested in ESI to support libraries
2: Adding a resource information model section. Originally to be from 4.1 of the ESI document - but post discussion with Snelling, Stokes, et al a modified 4.1.
3: State model. This is one of the biggest jobs. At GGF in Japan the HPC profile group introduced a nice, simple, extensible state model that would allow supporting a variety of state machines while still allowing basic clients to understand what they were getting. After discussions with many people I have decided to adopt that simpler state model. CLEARLY THIS WILL NEED DISCUSSION BY INTERESTED PARTIES. I AM NOT TRYING TO PULL A FAST ONE. I have taken the text verbatim from the HPC profile paper. Note that I do not yet have permission from their lawyers - so we may need to yank this and retype the same info. I hope not.
4: Refer to managedjob "type" of EPR, just as we refer to "WS-Name" "type" of EPR.
I have not even tried to modify the WSDL and renderings. Once we decide on the rest of the content we can do that. So don't read past page 20!
I am not sure of the status of the weekly call this week. I will synch with Darren and Steven and send mail tomorrow.
I'm also including some slides from the discussion of BES in the EMS session.
A
Andrew Grimshaw
Professor of Computer Science
University of Virginia
434-982-2204
grimshaw@cs.virginia.edu