[saga-rg] Use Case(s) feedback....

5 Oct 2005

      Appended below is feedback on the API from six Use Case authors, (and some
SAGA counter comments -- some of which were discussed in the SAGA-RG
session on Mon 03 Oct).

Use case  authors were asked the following:
======

Subject: SAGA API specification feedback
Hi, 

This mail reaches you because you are listed as contact for a Use Case [1]
submitted to the SAGA Research Group [2] at GGF [3].

The SAGA API v0.2 specification which is based on the submitted use cases
(and other input) has stablized sufficiently over the past few months and
is now rapidly converging toward a GGF submission.

We would be grateful if you as potenital 'clients' of the SAGA API could
review the current draft, verify that it indeed serves your use cases.
Please also, let us know your frank opinion if you think the current spec
satisfies the all important "S" -- Simple -- in SAGA. Any feedback on what
you might like to see done differently will also be very useful.

Also, we would like to invite you to a dedicated session at GGF15 [4]
(exact time and date to be announced) to discuss the mappings of the
API to the use cases.

The Use Case collection (including your use case) can be found at [5].  
The SAGA API draft is available at [6], a short version of the spec
containing the API only can be found at [7].  For general information on
the SAGA group, please check [8] and [9].

With best regards, 

 SAGA-RG.

[1] http://wiki.cct.lsu.edu/saga/space/start/use-cases.pdf
[2] http://forge.gridforum.org/projects/saga-rg/
[3] http://www.ggf.org/
[4] http://www.ggf.org/ggf_events_ggf15.htm
[5] http://wiki.cct.lsu.edu/saga/space/start/use-cases.pdf
[6] http://wiki.cct.lsu.edu/saga/space/start/strawman-api-v0.2.pdf
[7] http://wiki.cct.lsu.edu/saga/space/start/strawman-api-v0.2.short.pdf
[8] http://forge.gridforum.org/projects/saga-rg/
[9] http://wiki.cct.lsu.edu/saga/space/start
====

And here is are their responses:

Grid SuperScalar:
-----------------

The SAGA API version 0.2 is useful for our requirements but we find some things
missing. We detail more each part of the API:

 - Session and Context: 

   We don't have any special requirements in this area.

 - Error: 

   No information given in the specification.

 - Task: 

   As it is supposed to be the asyncronous version of each SAGA API method, we
   think that this may be also useful for us in order to do asyncronous job
   submission, like the one achieved with

     globus_gram_client_register_job_request 

   in contrast with

     globus_gram_client_job_request.

 - Attributes and  Name spaces: 

   No special comments.

 - Files and Logical Files: 

   This two interfaces are very good if you want to access to a remote file in
   the first case, and to work with a replica location system in the second

   case, but nothing is provided in order to copy files from one machine to
   another if you don't want to use a replica system.  In our use case, our
   run-time is aware of the location of files, so an easy mechanism in order to
   copy files between machines must be provided. This is not easy to do as the
   API is now. A possible solution is to include copy, move or erase methods for
   files in the File API.

   <<<
     Comment Andre:
     --------------

     I think there is a misunderstanding.  The file packages is inheriting 
     from the NameSpace package.  That is, a saga::file implements the
     saga::name_space_entry interface, and a saga::directory implements the
     saga::name_space_directory interface.

     All operations which are agnostic to content of the file (such as create,
     copy, move, open, rename, list, ...) are defined in the namespace
     interface.  That interface is also inherited by the logical_file package,
     so the same methods are available there.

     Hence, the logical_file and the file packages only have those methods
     defined, which distinguish them from simple name spaces.
...
...
...
- Jobs: 

   It covers our needs in order to describe a job (also in terms of adding
   restrictions to a job) and we think the job state diagram is complete. The
   only thing we find is missing is a call for waiting till a notification
   arrives (whatever the notification is). We submit several jobs at the same
   time, and we need to receive notifications of the states of those jobs in
   order to take actions in our run-time. So we follow a notification model
   instead of a polling model (we wait for notifications for arrive, instead of
   polling for state changes from our run-time). In our opinion the API would be
   more complete if it includes both models for job state control (polling and
   notification model), so this gives more freedom to the API user.

 - Streams: 

   This API also is useful for us, in order to achieve communication between the
   workers and the master in an easy way to exchange a reduced amount of
   information.

======

Pascal Kleijer:
---------------

Here are some pointers on what can be done for the future revisions.

 - Naming. The different APIs do not respect all the same naming pattern. Some
   like files are a direct UNIX style command line translation (i.e. "ls" should
   be "list"). I would recommend using a uniform method, attributes and constant
   naming. If you use OO design then stick with OO paradigm. Use full naming
   instead of acronyms or abbreviations, unless they are commonly known and used
   like URL, HTTP, CPU, etc. SAGA_NumCpus should be SAGA_NumberCpus or
   SAGA_CPUCount. This makes it much easier to read in the source code then
   cryptic names.

 - The use of all upper case of lower case in naming is subject to discussion.
   But by habit all constants are in upper case, attributes and methods in lower
   case unless they are composite names.

 - Use of "_" in names is C style programming. In OO it is only used if upper or
   lower case mix naming cannot be used, for example a constant.  So
   "byte_written" becomes "byteWritten" or "SAGA_JobCmd" would become
   "SAGA_JOB_COMMAND". Depending on who write each API you can see the
   writer\222s main coding language influence.

   For more information about code convention, SUN Microsystems has a good
   tutorial: http://java.sun.com/docs/codeconv/. Yes it is for Java but it can
   be applied to any OO based language or procedural language.

 - Typos: well there a number of typos to be removed. OK it is still a v0.2 ;)

 - In the stream API for the "write" and "read" methods. Why not add an 'offset'
   attribute to the calls? This might be language specific, but in Java for
   example you can not just shift the initial pointer like in C/C++ so the data
   always has to start at 0. Forcing to buffers to be used at index 0 all the
   time might not be welcome and additional programming overhead will be
   necessary to use the API.

======

GridLab: Application Migration
----------------------

+-----------------------------------------------------------------+

The SAGA API allows to migrate any job it can handle with the
job class, using the migrate method.  That provides an easy
solution for the GridLab migration use case if supported by the
implementation/middleware/backend:

--------------------------------------------------------------
  #include <saga.hpp>
  #include <vector>
  #include <string>

  using namespace std;

  int main ()
  {
    saga::job_server js;
    saga::job j = js.run_job ("remote.host.net", "my_app");

    job_definition jd = j.get_job_definition ();

    vector <string> hosts;
    vector <string> files;

    hosts.push_back (string ("near.host.net"));
    files.push_back (string
         ("http://remote.host.net/file > http://near.host.net/file"));

    jd.set_vector_attribute ("SAGA_HostList",     hosts);
    jd.set_vector_attribute ("SAGA_FileTransfer", files);

    j.migrate (jd);

    cout << "Heureka!" << endl;

    return (0);
  }
--------------------------------------------------------------

(Question: does the SAGA migrate call move checkpoint files
 automatically, or do they need to be specified in the new
 job description as above?)

However, for the complete use case to be implemented on
application level, a number of steps cannot be implemented in
SAGA.  The call sequence would be:

  In the application instance which performs the migration on
  the other job:
      - trigger migration for the remote job
      - discover new resource
      + move checkpoint data to new resource
      + schedule application on new resource
      + continue computation (and discontinue old job)

  In the application instance which gets migrated
      - get triggered from checkpointing
      = perform application level checkpointing
      - report checkpoint file location(s)

Items marked with 
      + possible   to implement in SAGA
      - impossible to implement in SAGA
      = (currently) not related to SAGA.

For the complete implementation of the use case, SAGA misses:

  1) means to communicate with the remote application instance
  2) means to discover new resources

Notes:
  1) means of communcation are actually given, but not per se
     usable for this use case.  E.g. streams are a definite
     overkill for signalling checkpointing requests.  Signals
     (as in job.signal (int signal)) would work, but only if the
     remote job uses signal handling as a checkpoint trigger.
     That also might be difficultato use if the job is running
     in a wrapper script, or in a virtual machine etc - that
     might not be transparent to SAGA, and would require direct
     communications.  Also, the signalling method misses
     feedback about success of the operation, and cannot return
     information such as the location of checkpoint files.

  2) the current SAGA API covers job submission to specific
     hosts, or lets the middleware choose a suitable host for
     submission.  However, the brokering result is not exposed
     on API level, as would be neccessary for this specific use
     case, and possibly for other dynamically active Grid
     applications.

     One way to implement that is to provide a direct interface
     to Grid information systems, and on that way expose
     information about available resources.  That would actually
     be more flexible, as is e.g. also allows the discovery of
     specific services, but would also require additional
     semantic knowledge on application lelvel.

+-----------------------------------------------------------------+

======

Univ of Vienna: Rainer Schmidt:
---------------

  I had a look at the short version of the SAGA spec. and send you some
  comments: I think the first tier covers all (and even more) of the general
  aspects of our use cases. Especially the file and job interfaces could be
  (partially) mapped well against our middleware. We require basic remote file
  handling, job submission, monitoring, and state inquiry.

  I'm not yet sure if I got Session and Context right but I think it would be
  feasible to integrate with our API and very helpful. I also like the idea of
  having an asynchronous API and the task interface.

  Of course, we would need to extend the API for accessing our VGE middleware
  specific services for e.g. service discovery, QoS negotiation or resource
  reservation. Hope this helps a litte!

  <<<
   use GridRPC?
...
...
...
=====

SCOOP [LSU]
------

  really like the Session API and Logical Files API... would help a lot.

  Everything looks ok and should be extremely valuable to SCOOP.

  For SCOOP ver 2 our job submission interface was from 
  Gram-job-manager -> Condor Master -> Condor Pool.  
  Reason : Portal was limited to GRAM submissions only. 
  One of the main difficulties we faced was interpreting 
  of error codes. It was very confusing what the exact error 
  code meant since there were layers of RMs. 

  I wonder if SAGA can help resolve such complexities.  :)

  <<< 
    Comment Andre:
    --------------

    Yes and no - if a SAGA implementation reports errors badly (maybe because it
    does not get good error messages from the middleware), we cannot do much
    apart from saying "Uh, bad!".

    However, at least you will have one and only one consistent and reliable way
    of error reporting on application level.

    The submission chain will be completely hidden.  However, if your grid
    requires the chain, you still need to implement it _somewhere_, be it in a
    saga adaptor, or behind a gatekeeper...
...
...
...
Feedback from Anonymous: 
----------

We will not being using the SAGA API for our use case.  SAGA evolved into
somethign very different from what we originally found potentially useful for
our use case (three calls- authorize(), copy_file(), and run_job()).  We aren't
interested in POSIX-like file semantics, threads, or the complexity of the API.

  <<<
    Comment Andre:
    --------------

    Hmm, I checked their use case.  It is not trivial (e.g.  includes data base
    access, steering and visualization, at least to some extent).

    Also, a 'three call auth' we have (actually, we have zero call minimum and 5
    call maximum).  run_job and copy_file we have as well, both need exactly 2
    calls in total:

    {
      saga::directory dir;
      dir.copy (src, target);

      saga::job_server server ();
      server.run_job ("remote.host.net", "/bin/myjob");
    }

    So, it might be that they percieve SAGA as too complex, and that should be
    kept in mind, but the criticism is at least not well formulated I think.
...
...
...
======

[saga-rg] Use Case(s) feedback....

Shantenu Jha