Re: [saga-rg] Fwd (hupfeld@zib.de): Re: SAGA Strawman API Version 1.0

20 Jun 2005


      ...
Hi, the quoted conversation its just one side of a conversation in AIM.
  As such, I think many of these statements are likely to get
misinterpreted because it does not include Andre's side of the
conversation (every other paragraph was an exchange between us).
So, in order to avert any flame wars on this list, the focus here is
1)   Its probably inappropriate for SAGA to definine a consistency
model,
    given it is only an API. However, it should be able to accomodate
consistency models
    that exist in the underlying implementations. This is not to say
consistency models are not important,   but that the API can't be used
to impose a consistency model.  Its the other way around... the
consistency model is going to impose conditions on the API.
2) At GGF it would be worthwhile to walk through some existing
consistency models and find out how they map into the API (are there
error states or features that are not accomodated?)
3) Felix is correct that such a model potentially exposes the
programmer to the worst-case scenario in terms of consistency models.
This is one of the risks involved in anything that carries the moniker
"simple".
I see the point that it is hard guarantee a specific consistency model in SAGA 
or make it a configuration option to choose one. But if you completely ignore
On Saturday 18 June 2005 17:21, John Shalf wrote:
this topic in the specs you will end up with a good spec of the syntax but 
without semantics.

My 0.02€,
Thorsten
...
On Jun 18, 2005, at 4:26 AM, Andre Merzky wrote:
...
Hi,
in a chat with John Shalf, he offered following opinion to
the topic of consistence, which he agreed to let me post to
the list.  The paper John refers to is the paper cited by
Jon McLaren (see quoting below).  SSI here means Single
System Image.
John:
I think SAGA has no business defining a consistency
   model, but it should be able to accomodate consistency
   models that exist in the underlying implementations.
SAGA is just the API.  We should lay out some existing
   consistency models and make sure the error handling
   supports it.  The problem with POSIX+NFS that was pointed
   out in that paper was that they could not extend the
   error codes that were available to POSIX.
Felix is right on that account.
However, most grid technologies are trying their best to
   create an SSI consistency model.  The existence proof is
   there (WAN-GPFS provides the same consistency model
   semantics as a local FS connection).
So I reject any claim that a "Grid" filesystem neccessitates
   some bizzaro consistency model that is not the same as
   Single System Image (SSI).
I think any remote-filesystem strategy should have SSI
   as its goal ultimately.  I think it is somewhat of a
   religious battle as to whether a remote consistency
   model must be radically different than the local one.
   Certainly, we need to deal with different kinds of
   failures (eg. an open filehandle that suddenly becomes
   unavailable).
It should be discussed at GGF [...].
But just to keep the discussion focused, the core comment
   is "SAGA has no business defining a consistency model,
   but it should be able to accomodate consistency models
   that exist in the underlying implementations." and also
   to say that "Felix is right about the programmer
   potentially having to worry about the worst-case."
[...]
Best Regards,
Andre.
Quoting [Jon MacLaren] (Jun 15 2005):
...
...
...
<snip>
If SAGA choses to give single-system like guarantees, this must be
explicititely stated. All interfaces that deal with data are
unusable
without a
specification of consistency guarantees.
I don't believe that it is possible, or even desirable, to try to
make distributed systems look like they are not distributed.  For
example, I don't think you should provide POSIX behaviour on a
distributed filesystem.  If you look at AFS, it doesn't fit the POSIX
model.  Most people write code that ignores what the filesystem might
be, and assume POSIX.  How many people check the failure status on a
file close?  With AFS, you can get "Host not found" when you do a
file close.  You can wait, and try again.  If you quit, your changes
are lost.  (As a library writer, you can try and "squash" the errors
by putting a clever layer of code between the app and the filesystem
that know tricks like this.  The Condor people do this, I seem to
recall.)
The point here isn't that developers should never assume a POSIX
filesystem, it is that they should know what kind of filesystem they
are dealing with, so that they can write appropriate code.  When you
go distributed, there are a whole new set of error conditions that
can occur.  I don't think that there is anything to be gained from
pretending that remote objects are the same as local objects, so that
people's code can stay the same.  If the code doesn't know it's
dealing with something that is remote, rather than local, then at
best (i.e. if there is lots of error checking) it will fail far more
often.  Probably though, it won't be robust.
It might be worth looking at the following paper, which says
eloquently what I'm grasping for.
"A note on distributed computing" by Jim Waldo et al., available
from: http://research.sun.com/techrep/1994/abstract-29.html
Cheers,
Jon.
--
+-----------------------------------------------------------------+
| Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science         | mail: merzky@cs.vu.nl       |
| De Boelelaan 1083a                | www:  http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands    |                             |
+-----------------------------------------------------------------+

Re: [saga-rg] Fwd (hupfeld@zib.de): Re: SAGA Strawman API Version 1.0

Thorsten Schuett