RE: [ogsa-bes-wg] Updates to document

24 May 2006

      Peter, et al,
I'me time constrained right now, but let me give a quick response.
...
-----Original Message-----
From: Peter G Lane [mailto:lane@mcs.anl.gov]
Sent: Monday, May 22, 2006 6:32 PM
To: Andrew Grimshaw
Cc: ogsa-bes-wg@ggf.org
Subject: Re: [ogsa-bes-wg] Updates to document
Some questions/comments...
I'm concerned about this term "container" as used in the document. This
is very confusing as this term is already used to refer to the web
service hosting infrastructure (i.e. Tomcat, GT's "standalone
container", or figure 1's "service container").
[Andrew Grimshaw]
We've been using this term for some time - and indeed if you look at the
larger EMS picture the "services" started may in fact be stateful web
services. The sense is that the "service" - or legacy activity in the case
of BES, is logically "contained" by its execution environment.
...
In section 2 under the "Naming" bullet. What exactly is being named?
Is the last bullet under section 2 saying that a BES-compliant service
cannot support logical file names? Does this include converting file
paths to, say, GridFTP URLs that might differ in the file path with
respect to the GridFTP server? Does this mean that variables cannot be
used in file paths?
[Andrew Grimshaw] 
There is a huge history on naming. Section 2 is referring to the activities
that are being started - that their "names", "handles", "addresses", or
whatever you want to call it will be EPR's 
We are referring to the "type" so to speak of the endpoints, some may be
WS-Names, some may be renewable references, some may be managed jobs - the
spec is silent on that, but recognizes that there are many choices.
...
I don't know much about WS-Names (anybody have a pointer to documents?),
but it seems problematic for interoperability that in section 3 the
method of exposing metadata about the AED type used is deliberately left
up to the implementation. This makes it impossible to automatically find
this information without either a defacto standard or some mapping of
implementation to metadata discovery method. Why not at least specify
how this metadata is discovered? A standard resource property would do
it.
[Andrew Grimshaw] 
The WS-Names documents can be found in grid forge off of the OGSA-Naming WG.
The issue about how can you find meta data is crucial, but may be different
in different renderings of the standard, e.g., WSRF versus WS-I, versus who
knows what. This has been a massive political issue, and the result is a
compromise. For example, in WSRF they are resource properties.
...
I have issues with the hardware and software resource properties. Do
these describe the capabilities of the machine the service is running on
or a cluster/SMP machine sitting behind the service? If the former, this
is only useful if the activity runs on the same machine as service and
seems like it should be optional. Also, what about the metadata
describing the cluster or SMP machine? If the latter, then it is
limiting in that a cluster may have machines with different OS, CPU
configuration, available memory, architecture, etc...
[Andrew Grimshaw] 
The resource that the BES container is representing - not the machine it is
on. Recall that a BES container may "front" a whole set of other containers.
I agree with your comment about "limiting" if there are hetero resources
behind. The resource model at this point in time does not address multiple
entries for any given field.
...
In section 5.1.1 I find the IsAcceptingNewActivities RP rather useless.
The client will find out just as easily if the service is accepting new
activties depending on whether a NotAcceptingNewActivities fault is
thrown or not. Checking the RP first both creates an extra WS operation
call and cannot be guaranteed since there is a potential race condition
between when the RP value is returned and the submission call is sent.
If there was some negotiation protocol incorporated into BES I might buy
this, but otherwise I don't see the point.
[Andrew Grimshaw] 
There are always race conditions in DS's. What if I want to discover whether
you are taking activities - but not necessarily start an activity, for
example to construct a candidate set of BES containers?
...
In section 6.1, I feel it's superfluous to suffix the operation name
with "FromJSDL". Are there plans to support something other than JSDL?
If not, why not just make this "CreateActivity"?
[Andrew Grimshaw] 
BES is the simplest, most basic, "type" of a service container in the EMS
architecture. Rather than assume that for all time only JSDL will be used
seemed overly restrictive. In BES there are no plans for anything else - but
...
...
Section 6.1.1: "Document" suffix seems also superfluous. Of course it's
a document. What else would it be? Also, I don't see why
"createInSuspendedState" needs to be outside the JSDL document. This
should be an extension to JSDL especially considering that the state
model doesn't have a base "suspended" state. So if a service doesn't
support suspended state extensions, this flag is meaningless.
[Andrew Grimshaw] 
I'll look at this one later -
...
Section 6.1.2: absolutely unique idempotence IDs are impossible to
guarantee. I'm of the opinion that these should be valid only for the
life of a job. Besides being ridiculous to assume that the service
should keep track of these IDs to make sure it is never ever used again,
doing so serves no real purpose.
[Andrew Grimshaw] 
I happen to agree with you - certainly for all time is difficult, and
depending on exactly what semantics you think you're getting, impossible.
However, this is in there as a result insistence and a long discussion on
input (ESI document) from the Globus and Unicore teams.
...
Section 6.1.3: If there is an optional subscription request element in
the input, then there needs to be an optional subscription reference
element either in the output (preferred to avoid an RP query round trip)
or a resource property.
[Andrew Grimshaw] 
I'm not sure I follow - you mean you want a handle to the subscription in
the return?
...
6.1.4: The names of the fault types isn't consistent. Either suffix with
"Fault" or don't.
[Andrew Grimshaw] Yes, should be fiexed

Sorry got to go now. Will get to rest later.
...
Spelling typo in section 6.2, page 19, 1st paragraph: "proceeding" ->
"preceding".
6.2: I would prefer that the "*Status" elements not have "state" and
"laststate" attributes but instead have two child elements "state" and
"lastState" (or something similarly named). These elements would in turn
have a child element that has a well defined (i.e. in this document)
enumerated state representing one of the states in the general state
diagram (i.e. New, Pending, Running, Canceled, Failed, or Finished) and
one optional xsd:anyType element (or just an xsd:any) that specifies the
state extension if one exists (i.e. StageIn, Suspended, etc...). At very
least something needs to be changed to account for the new state model.
6.3: I was liking the new state model until I saw this operation. I
don't like this idea at all of allowing the user to monkey with the
state machine directly. I would be much happier if there were specific
operations to inject specific inputs that the state machine should
consider. The state machine should never be open to direct manipulation,
but instead should decided for itself what state and when to transition
to that state based on the current state and events. At *very* least
this operation should have a fault that means "too bad, I'm not doing
that". If the desire is to allow for, say, reseting of the job back to
New so that it will run again or canceling without destroying the job,
then specific operations can be added to the spec as is. Since
sub-states like "suspended" are treated as extensions now, a
"SuspendActivities" operation would seem more appropriate as an
extension as well.
6.4 and 6.5: Could someone provide a use case for wanting to start and
stop the acceptance of new activities via the WS interface.
6.6: This should be a resource property not a new operation.
I'm already commenting past page 20, so I'll stop here. One last
observation, though. I noticed that this document makes no attempt at
defining an activity resource. Is this intentionally out of scope?
Peter
On Mon, 2006-05-22 at 14:58 -0400, Andrew Grimshaw wrote:
...
All,
I have done the updates to the document discussed in the last two
phone calls, and in the EMS meeting in Japan.
At a high level this includes modifications to adopt text/issues from
the ESI document and the HPC profile working document. Note that this
is NOT a final document.
These changes revolve around:
1: Changing CreateActivityFromJSDL to
a)    take additional optional arguments, specifically notification
arguments and an "idempotent" argument lifted from ESI.
b)    An extension to JSDL suggested in ESI to support libraries
2: Adding a resource information model section. Originally to be from
4.1 of the ESI document - but post discussion with Snelling, Stokes,
et al a modified 4.1.
3: State model. This is one of the biggest jobs. At GGF in Japan the
HPC profile group introduced a nice, simple, extensible state model
that would allow supporting a variety of state machines while still
allowing basic clients to understand what they were getting. After
discussions with many people I have decided to adopt that simpler
state model. CLEARLY THIS WILL NEED DISCUSSION BY INTERESTED PARTIES.
I AM NOT TRYING TO PULL A FAST ONE.  I have taken the text verbatim
from the HPC profile paper. Note that I do not yet have permission
from their lawyers - so we may need to yank this and retype the same
info. I hope not.
4: Refer to managedjob "type" of EPR, just as we refer to "WS-Name"
"type" of EPR.
I have not even tried to modify the WSDL and renderings. Once we
decide on the rest of the content we can do that. So don't read past
page 20!
I am not sure of the status of the weekly call this week.  I will
synch with Darren and Steven and send mail tomorrow.
I'm also including some slides from the discussion of BES in the EMS
session.
A
Andrew Grimshaw
Professor of Computer Science
University of Virginia
434-982-2204
grimshaw@cs.virginia.edu