RE: [ogsa-wg] Paper proposing "evolutionary vertical design efforts"

23 Mar 2006

      ...
Hi;
I have no doubt that it would be relatively easy to add transactional
semantics to most, if not all job schedulers.  In a separate email to
Ian and this mailing list I talk about the potential challenge of
doing
so in a manner that is efficient enough to support
"ultra-high-throughput" HPC use cases that I'm aware of.  ASSUMING
...
it is indeed difficult to support these existing use cases then I
argue
it's better to support transactional job submission semantics as an
almost universally used extension than to simply exclude the use case
by
requiring those semantics in the base case.
As I point out in the email, my assumption may be wrong and in fact
Hi;

You are right about the concern of having too many combinations of
extensions, which make sensible interaction with an array of job
schedulers problematic.  One of the reasons why I want to identify
common use cases is because I imagine that extension profiles will
likely be explicitly created for these common use cases.  Clients would
then mostly look for these specific (common) extension profiles rather
than for lists of the specific combinations of extensions that they
represent.  "Component" extensions would exist to (hopefully) allow
reuse of common interactions patterns that show up in more than one
common use case -- and to enable non-common combinations of extensions.

I also agree that we need to have an explicit means of identifying what
extensions a scheduler implements.  Having to guess or use other
ill-defined out-of-band means would be terrible.

Concerning common HPC use cases vs. common HPC grid use cases, what I
want to ideally achieve is the following: 
- I want to allow a scheduler vendor/supplier to implement a common use
base case that doesn't require them to deal with the "additional
federation and distribution complexities".
- I want to enable that vendor/supplier to implement the additional
complexities as extensions that they can provide later without having to
start over from scratch.

The reason for wanting this is because it's a lot easier to ask a
vendor/supplier to implement a staged approach to meeting a spec,
especially when the base case is the one that is their current business.
If we start with the additional federation and distribution complexities
being in the base case then the vendor/supplier can't service his
existing community until much later than in the staged approach.  That
makes creation of an "interim non-grid solution" for the non-complex
base case dangerously tempting.

So, I agree with your definition of the grid use case as being the
common HPC use case plus additional federation and distribution
complexities. :-)

I'm also mostly in agreement with your last paragraph.  The only thing I
would add is that coming up with a clean, simple -- and hence probably
restricted -- extension mechanism/approach is also a key issue.
Otherwise we risk having to deal with non-composable extension
combinations.

Marvin.

-----Original Message-----
From: Karl Czajkowski [mailto:karlcz@univa.com] 
Sent: Tuesday, March 21, 2006 8:07 PM
To: Marvin Theimer
Cc: Carl Kesselman; humphrey@cs.virginia.edu; ogsa-wg@ggf.org
Subject: Re: [ogsa-wg] Paper proposing "evolutionary vertical design
efforts"

Trying to wrap several tangents back together here...

I agree that at-most-once submission could be supported by an "almost
universally used extension" and everyone should be happy on that
front. Having a few extensions such as the "hold+release" and
"annotation" might even be workable if less than ideal.

What is less desirable is having many different extensions that "sort
of" provide it, so that a typical heterogeneous-environment
client/metascheduler must mode switch with every remote scheduler to
try to get the same QoS, because they each have a different mechanism.
The thing I fear more would be when the client/metascheduler cannot
even detect the presence of extensions and must utilize detailed
out-of-band knowledge to perform this feat of re-synthesizing reliable
submission.

As for your other comments, I suspect we are all being too abstract to
communicate.  I cannot imagine what it is about the "common HPC use
cases" that is mutually exclusive of the "grid HPC use cases", so I
have trouble understanding the dichotomy you are implying.  In my
world-view, the Grid case is the common use case with additional
federation and distribution complexities.

I think the crux of the issue for practical interoperability is
defining the extension discovery mechanism(*) AND getting
community-based standardization for some actual everyday extensions,
and it sounds like we may agree on that point.  If this part is not of
primary focus, I believe interop will fail and the rest of the
discussion would be pointless.

karl

(*) Extension discovery problem: to determine what extensions are
supported by a service endpoint in order to customize the request, not
to be confused with the harder and almost as critical extension-aware
discovery problem: to choose an appropriate service endpoint based on
the availability of extensions one wishes to employ.

On Mar 21, Marvin Theimer modulated:
that
the
...
main scheduler vendors/suppliers may all (or mostly all) say that
supporting transactional semantics is either something they already do
or would have no objection to adding.  In that case, we should
definitely add this requirement to the base case and happily move
forward.
Regarding your concern that I'm trying to define as small-as-possible
a
base case, I'm not sure how to respond.  An important thing to keep in
mind is that I want to define an HPC profile that covers the common
HPC
use cases, not just the common HPC grid use cases.  If the HPC grid
profile doesn't cover the common "in-house" use cases then a second
set
of web service protocols will be needed to cover those cases
(interoperability among heterogeneous clusters within an organization
is
definitely a common case).  If that happens then we risk almost
certain
failure because vendors will not be willing to support two separate
protocol sets and the in-house use cases are currently far more common
than the grid use cases.  Vendors will extend the in-house protocol
set
to cover grid use cases and "grid-only" protocols will very likely get
ignored.
That said, I agree with your last paragraph about the requirements for
a
design, namely the need for an interoperable interface subset plus a
robust extensibility mechanism that covers the topics you listed.  But
I
will argue that transactional semantics are not a REQUIREMENT for
interoperability -- merely something that in MOST cases is enormously
useful.
Marvin.
-- 
Karl Czajkowski
karlcz@univa.com