Re: [SAGA-RG] Service Discovery spec updated at last ...

18 Dec 2008

      2008/12/18 Andre Merzky <andre@merzky.net>:
...
Quoting [Steve Fisher] (Dec 18 2008):
...
...
In the attached version, I tried to address the things we
agreed upon, and changed the service types/names table as
proposed.  Could you give it another pass, please?
I left the Constructors untouched for now, until we agreed
on something, ok?
Also, there is a question from me at the bottom of page 5:
"Note: Why not TimeOut, AuthorizationFailed, etc??"
For the discoverer constructor, I guess the Authentication exception
can be thrown if you don't pass in a session or if you pass in what
used to be a valid session that has since expired. How would you class
that one?
A session cannot expire - its lifetime is defined by the
lifetime of the objects which use that session.  If a
context (or credential) attached to that session expired,
you can expect to see a lot of AuthenticationFailed
exceptions, whenever that credential is being used.
Sorry that is what I meant. The session has not expired but it is
inoperable because credentials have expired
...
...
Unfortunately quite a lot of the allocation of exceptions is
implementation dependent. It would be better to be able to just throw
a Security exception - but that is not an option as it is not in the
core spec. If you need to get authenticated this could result in a
timeout if contact with some remote server is required.
Well, than that is a TimeOut exception :-))  In general, it
is of course difficult to map backend exceptions to SAGA
exceptions in a meaningful way, but that cannot be avoided
really, if the end user is to stay away from middleware
exceptions.
...
In our current implementation, list-services does most of the work and
so can throw timeouts and could throw security errors except that we
happen to connect to a non-secured information service. However the
subsequent calls (such as get_data) can throw the same exceptions as
they could choose to go back to the information service to pick up
information that was not originally needed when making the selection.
If we don't allow the possibility to throw these exceptions the
implementation is needlessly restricted. This is why I like the
NoSuccess everywhere. In other words I would use NoSuccess for those
conditions which are implementation dependent - which in this case is
quite a lot.
Good point, so NoSuccess may make sense in all calls.  I
expected all implementations to simply fetch complete
service descriptions, not only parts, but that may have been
to limiting.
But, is that really the way you expect implementations to
work: to fetch some part of the service description, but,
e.g., the name and URL just later on, on the get_attribute
calls?  That adds an awful latency overhead, which easily
kills any bandwidth savings, as long as you stay below x.000
returned service descriptions - which is the dominating use
case I dare to say.
Which approach is optimal depends upon the information system. However
there are advantages in making sure that list_services gets everything
loaded into memory because as you have pointed out it makes sure that
the subsequent get methods cannot fail.
...
Requiring the implementation to fetch complete desciptions
is a certain limitation, sure, but allows more tightly
defined semantics for the application and end user...
So, that is up to you (I do not know the implementations),
but please consider the argument.
Yes I agree it would be best to get all the service info in one go.
Then it either succeeds or fails
...
...
For example you have added DoesNotExist for the
constructor - implying that the API would contact the underyling
information service. However there is no reason why it should do so at
this stage.
We have the same problem with the RPC package - there the
Constructor has the following note:
- according to the GridRPC specification, the constructor
   may or may not contact the RPC server; absence of an
   exception does not imply that following RPC calls will
   succeed, or that a remote function handle is in fact
   available.
and the call() method has the following one:
- according to the GridRPC specification, the RPC server
   might not be contacted before invoking call(). For this
   reason, all notes to the object constructor apply to the
   call() method as well.
Also, call() has all exceptions of the Constructor, to allow
for delayed throws.
These notes are, as noted, a result of an underspecification
of the GridRPC standard.  I would really like to avoid to
sprinkle such notes all over the place, as it makes a tight
semantic definition of the API rather impossible.
I think, if you have no really hard constraints from an
underlying standard (as we had with RPC), or from a
absolutely dominating implementation, it is not too much to
ask from an implementation to ensure in the Constructor if
the given (or chosen) URL actually points to a service which
is alive and usable - what do you think?
This time I don't agree. For someone looking for one service it would
mean two calls to the info system rather than one, also even if you
find the service is available in the constructor it may have died by
the time you do list_services. So you need the exceptions in
list_services to cover lost credentials, timeouts/dead services,
authz, I would be happy to assume that the constructor only contacts
another service as part of the session setup and does not contact the
info system.

Actually I don't think that much use will be made of the url
parameter. Rather I expect the sysadmin to make sure that the desired
set of adapters is installed and each will probably have its own
configuration file
...
Otherwise, indeed, you need to repeat all exception on all
calls (which makes catching and error recovery tedious for
the application), or map them all to NoSuccess later on
(which also gives the application no useful means to
recover, e.g. an application does not get the info that a
different URL or context may have helped).