Re: [SAGA-RG] Fwd (mathijs@cs.vu.nl): Suboptimal things in SAGA

19 Oct 2009

      Quoting [Mathijs den Burger] (Oct 19 2009):
...
Hi Andre,
Pfew, these mails tend to get long. Here we go:
On Sat, 2009-10-17 at 22:25 -0600, Andre Merzky wrote:
...
...
...
1. Exception handling in engines with late binding is a pain.
Agree, it is painful.  But what can you do?  At best, the engine is
able to emply some heuristics to extract the most relevant exception
and push that to the top level.  Your application should then only
print that message to stderr, by default.
The only real 'solution' would be to disable late binding...  Or do
you see any other way?
Mainly: restrict the number of backends tried as much as possible (see
below). Furthermore, catch generic errors in the engine instead of in
each adaptor separate (e.g. illegal flags, negative port numbers etc) so
the user gets one exception instead of 10 identical ones from all
adaptors tried.
Yes, that is on our todo list.  Right now, the C++ engine does not
have the infrastructure for doing parameter checks on package level
(instead of adaptor level).  Is that implemented in Java?
...
...
The generic way is to create a session with those contexts (aka
credentials) attached which you want to use.  Say, you want to limit
the set of active adaptors to the globus adaptors, do
saga::session s;
  saga::context c ("globus");
  s.add_context (c);
saga::filesystem::file f (s, url);
This should get you only the globus adaptor - all others will bail
out, right? (sorry if my answer is a repetition from above)
Not really. The other adaptors will still throw an exception in their
constructor. Say the Globus adaptor fails for some reason: the user then
still has to wade through all the other exceptions to find the one that
matters. That's confusing and annoying.
Fair enough.
...
...
...
...
3. Sessions with multiple contexts of the same type should be forbidden.
Trying them all may have weird and unwanted side-effects (e.g. creating
files as a different user, or a security lockout because you tried to
many passwords). It confuses the user. This issue is related to point 2.
This is a tough one.  The problem here is that a context type is not
bound to a backend type.  Like, both glite and globus use X509
certs.  Both AWS and ssh use openssl keypairs.  Both local and ftp
use Username/Password, etc.  I don't think this is something one can
enforce.
We had the proposal to have the context types not bound to the
backend *technology* (x509), but to the backend *name* (teragrid).
This was declined as it makes it difficult to run your stuff on a
different deployment using the same cert.
Hmm, in your adaptor-selecting example you do exactly that: using a
context type specific to a single backend ("globus") to select a
specific adaptor. If the context should have a type "x509", how do I
then select only the Globus adaptor? And how do I differentiate between
multiple Globus adaptors for different versions of Globus? There should
be a better way of selecting adaptors...
There is, but not on API level.  If you know your backends in
advance, or know that specific backends are prefered on some host,
then you should configure your SAGA accordingly, i.e. disable all
other backends by default.  Most people compile and install all
adaptors, and leave all enabled by default - it should be the task
of the admin (the installing person) to make a sensible choice here.

Well, thas is my/our approach to adaptor pre-selection anyway...
...
...
...
...
4. URL schemes are ill-defined. Right now, knowing which schemes to use
is implementation-dependent voodoo (e.g. what is the scheme for running
local jobs? Java SAGA uses 'local://', C++ SAGA used 'fork://'). There
is no generic way of knowing these schemes other than 'read the
documentation', which people don't do. Essentially, these schemes create
an untyped dependency of a SAGA app to a SAGA implementation, causing
SAGA apps not to be portable across implementations unless they all have
the same adaptors that recognize the same schemes.
Correct.  Schema definition is not part of the spec.  I argue it
should not be either, as that can only be a restrictive
specification, which would break use cases, too.  Only solution
right now is to create a registry - simply a web page which lists
recommendations on what scheme to use for what backend.  Would that
make sens to you?
That would certainly help to bring the various SAGA implementations
closer together.
However, the more general problem is that SAGA users should be able to
limit the adaptors used in a late-binding implementation. The two main
reasons are:
- speed (always trying 10 adaptors takes time)
- clarity (limit the amount of exceptions)
The current two generic mechanisms are context types and URL schemes.
Both are not very well suited. Each adaptor would have to recognize a
unique context type and scheme to allow the selection of individual
adaptors. Even then, selecting two adaptors is already hard: you cannot
have two schemes in a URL, and using two contexts only works if both
adaptors recognize a context in the first place.
A solution could be to add some extra functionality to a Session. A user
should be able to specify which adaptor may be used, e.g. something
similar to the Preferences object in JavaGAT. Ideally, you could also
ask which adaptors are available. Specifying this in the API prevents
each implementation from creation its own mechanism via config files,
system properties, environment variables etc.
Yeah, I was expecting you to come up with JavaGAT preferences :-P

I myself really don't think its a good idea to add backend
inspection/control to the SAGA API (backend meaning SAGA
implementation in this case).  Also, we already threw this out of
the API a couple of times.   

I see your point of having an implemention independent mechanism.
For C++, there are not too many implementations around (or expected
to be around) to have a real problem here.  Don't fix it if it ain't
broken, right?  So, we can try to make that more formal when we in
fact have multiple implementations.

For Java, you guys added properties already, and as far as I can see
the exact properties which are available are undefined.  I don't
like this to be honest, but that seems the Java way, right?  So, do
you already use that for adaptor pre-selection?

One think I could see implemented universally is to require a
context to be present for a backend to get activated at all: that
would allow to get reid of the majority of exceptions I think.
...
...
...
...
5. Bulk operations are hard to implement and clumsy to use. Better would
be to include bulk operations directly in the API where they make sense.
It's much simpler to implement adaptors for that, and much easier for
users to use and comprehend.
Oops - bulk ops were designed to be easy to use!  Hmmm...
About the hard to implement: true, but iff they are easy to use,
then that does not matter (to the SAGA API spec).
Why were bulk ops not explicitely added to the spec is obvious: it
would (roughly) double the number of calls, and would lead to some
pretty complex call signatures:
list <list <url> > listings = dir.bulk_list (list <url>);
  list <int>         results  = file.bulk_read (list <buffer>, list <sizes>);
Further, this would lead to even more complex error semantics (what
happens if one op out of a bulk of ops fails?).
This all is avoided by the current syntax
foreach url in ( list<url> )
  {
    tc.add_task (dir.list <Async> (url));
  }
  tc.wait (All);
Not that difficult to use I believe?
First, how do I figure out which list came from which URL? The
get_object() call of each task will only return the 'dir' object, but
you need the 'url' parameter to make sense of the result.
Yes, you need to track tasks on API level - but you need to do the
same in the other case as well, explicitely or implicitely, via some
list index or map.
...
Doesn't this make the current bulk ops API useless for all methods that
take parameters?
No, not really, as that is rather simple on API level (pseudocode):

  foreach url in ( list<url> )
  {
    saga::task t = dir.list <Async> (url));
    tc.add_task (t);
    task_map[t] = url;
  }

  while ( tc.size () )
  {
    saga::task t = tc.wait (Any);
    cout << "list result for " 
         << task_map[t] 
         << " : " 
         << t.get_result <list <url> > ();
  }

The code for explicit bulk operations would not look much different
I assume.
...
Second, does each bulk operation requires the creation of another task
container? If I want to do dir.get_size(url) and dir.is_directory(url)
for all entries in a directory, can I put all these tasks in one
container, or should I create two separate containers? The programming
model does not restrict me in any way. An engine will have a hard time
analyzing such task containers and converting them to efficient adaptor
calls...
Again, it is not about ease of engine implementation.  Also, we did
implement it, and as long as you have task inspection (on
implementation level), that analysis step is not too hard:

  foreach task in task_container
  {
    task_operation_type_map[task.operation_type].push_back (task);
  }

  foreach task_operation_type in task_operation_type_map
  {
    task_operation_type.call_adaptor_bulk_op (task_operation_type);
  }

If an adaptor can't do the complete bulk op, it returns (in our
implementation) those tasks it cannot handle, so the next adaptor
can try (IIRC).  If all adaptors fail, the individual ops are done
one-by-one.  If the adaptor does not have a bulk interface, the ops
are  done one-by-one anyway.  So, its actually like (sorry for the
long names, but you JAVA guys like that, don't you?  ;-) :

  while ( ! task_operation_type_map.empty () )
  {
    // try bulk ops for each adaptor
    foreach task_operation_type in task_operation_type_map
    {
      foreach adaptor in adaptor_list
      {
        task_container todo     = task_operation_type_map[task_operation_type]
        task_container not_done = adaptor.bulk_op  (todo);
        task_operation_type_map[task_operation_type] = not_done;
      }
    }

    // handle all not_dones
    foreach task_operation_type in task_operation_type_map
    {
      task_container todo = task_operation_type_map[task_operation_type]
      forach task in todo
      {
        foreach adaptor in adaptor_list
        {
          adaptor.serial_op (task) && break;
        }
      }
    }

    // all tasks are done, or cannot be done at all.
  }

So, that is really it (modulo technical decorations, which can
always be non-trivia of course).

Supporting a complete set of bulk ops on implementation and adaptor
level is not really a much simplier solution I think, and  gives you
less flexibility.

Cheers, Andre.

-- 
Nothing is ever easy.