[SAGA-RG] document updates, phone call cadence, and more

28 Apr 2010

      Hi all, 

Quoting [Andre Merzky] (Apr 04 2010):
...
From: Andre Merzky <andre@merzky.net>
To: Thilo Kielmann <kielmann@cs.vu.nl>
Cc: SAGA RG <saga-rg@ogf.org>
Subject: Re: [SAGA-RG] notes from the OGF28 session on 15/03, 16:00-17:30
attached is another revision of the SAGA Core API Experience
document, which contains changes as discussed at OGF28.  I hope the
changes reflect the discussion points.
I just wanted to let you know that both the advert API extension and
the Core experience document have been submitted to the OGF editor,
and both docs should be entering public comment sometime soon.  That
means that the Core API (including errata) is now definitely frozen,
unless the public comments require additional changes.  The
submitted documents can be found in

  https://svn.cct.lsu.edu/repos/saga-ogf/trunk/documents/saga-package-advert/t...
  https://svn.cct.lsu.edu/repos/saga-ogf/trunk/documents/saga-core-experience/...
  https://svn.cct.lsu.edu/repos/saga-ogf/trunk/documents/saga-core/tags/v1.1rc...
...
So, a couple of additional errata from the Naregi group have been
applied to the Core API - hopefully the last ones.  However, there
remains one item unresolved:
appearently we never considered to add a flush() method to the
saga::file instance.  As is, our API implies that all writes are
immediately flushed.  While that is certainly valid, the question
remains if we should consider an explicit flush() method, which
would, amongst others, allow implementations to perform client side
caching of write operations.  Iff that is considered useful, one
could further discuss if that should be introduced on namespace
level, so that other namespace derived packages (replica, advert,
etc) can also benefit from flush().  FWIW, a close() should always
imply a flush() IMHO.
So, please voice your opinion!
There was not much feedback on this item, so I added it to the list
of open items for SAGA 2.0.  As of now, caching behaviour on write
remains undefined, and the safest assumption (for SAGA implementors)
is to always flush after write, even if that is costly in terms of
performance.

That opens the question on when, and if at all, we should start to
discuss a next version of the core API.  FWIW, I appen the current
list of open issues to this mail.

We did not have a phone call since OGF28.  There are a number of
open TODO items however, and I am not sure that any calls are useful
at that point, beyond iterating that those items need to be dealt
with :-P

So, I suggest to suspend the calls until at least some of these
items are handled:

  - CPR package needs to be finalized
  - message API examples need to be rendered in different versions,
    to come to a conclusion on the general design approach.
  - Python bindings need to be shown to be functional on both Java
    and C++
  - the SAGA rendering of GridRPC.v2 needs to be synced with the
    final version of GridRPC.v2

If anybody has other items to discuss, please let me know, and I'll
schedule the calls.  Also, the above items are obviously open for
input from all of you, so, please feel free to contribute in any
form.

Finally, the conversion of our CVS repository to SVN is completed.
CCT support did not manage to make the CVS repository ReadOnly, but
please don't commit there anymore.  The new SVN url is, as you
probably guessed from above, 

  https://svn.cct.lsu.edu/repos/saga-ogf/trunk

That repository should be world-readable.  Please let me know if you
would like to have write permissions.

Best, Andre.

Current known open issues for SAGA Core v2.0
--------------------------------------------

  - file / stream server / rpc could have state (Unknown, New,
    Open, Closed).

  - task: get_task_description

    just like job desc, would give you information about what
    the task does, e.g.
      - "method"   = "copy"
      - "args"     = "internet.txt" "internet.bak" (vector attrib (type??))
      - "started"  = "11:35pm 12/22/2006"
      - "finished" = "11:35pm 12/22/2007"

    inspection would be useful to get type and return type of
    task after getting it from a task_container.

  - I/O tasks could have a get_buffer() method, to free
    application from keeping/tracking I/O buffers.  That would
    return a shallow copy of the buffer object which was given
    as inout parameter.  Method would need to be templetized for
    the different buffer classes we have in the spec (or limited
    to the buffer base class)

  - make state transitions less prone to race conditions.  E.g.,
    allow suspend() also on jobs in Suspend state, and cancel() 
    on jobs in a final state (state remains the same).  Needs 
    some thought...

  - what error is thrown on incorrectly formatted attributes,
    and when?

  - wait() to also report on other state changes, like 
    suspend/resume (see DRMAA-II).  

  - add inspection: job.list_interfaces ()
      - monitorable
      - attributable
      - steerable?
      - checkpointable?  
    to provide seemless integration of extensions, which then
    can define additional interfaces for core classes (see cpr).

  - add resource assignment to job description, e.g.:
      //   name:  CPUID
      //   desc:  CPU id to assign the process thread to
      //   mode:  ReadWrite, optional
      //   type:  Int
      //   value: '1'
      //   notes: - if supported, the process is guaranteed to 
      //            run on the CPU identified by the id.
      //          - id starts at 1
      //          - not supported by JSDL, DRMAA.v1
      //           
      //   name:  CPUCoreID
      //   desc:  CPu core id to assign the process thread to
      //   mode:  ReadWrite, optional
      //   type:  Int
      //   value: '1'
      //   notes: - if supported, the process is guaranteed to 
      //            run on the CPU core identified by the id.
      //          - id starts at 1
      //          - not supported by JSDL, DRMAA.v1

    This could also go into a resource management package,
    obviously, together with 'queue' attribute btw (see mailing
    list discussion with Sylvain, and discussion about DRMAA.v2.

  - session.list_contexts (string type = "");
    returns all contexts of that type.  Also works on default 
    session!  If no type is given, all contexts are returned.

  - trigger metrics should have a value of 0 or 1, to allow
    polling for triggers.  So, in fact Trigger metrics should be
    Boolean.

  - we have mtime for ns entries - there is no reason not to
    have ctime, or even atime, even if that is not widely
    supported.  So what.
    Now we have to add ctime to the cpr package...  Messy.

  - properties which are available via get_xyz() and is_abc()
    should generally also be expressed as attributes (see
    get_size(), get_mtime(), but also is_file() etc)

  - attributes and metrics should be unified.  Either a metric
    IS-A attribute, or even better, callbacks can be added to
    attributes - no metric needed anymore.

  - file::dir should inherit file::entry *and* ns::dir.  Makes
    in particular sense for advert and cpr ns derivates, which
    then don't need to duplicate methods anymore.  Language
    bindings may not allow/encourage multiple inheritance, but
    it would make the spec (IDL) simpler.

  - we are not sticking to SIDL syntax anyway, so probably
    should remove references to it, and define our own *blush*.
    See attributes, metrics, c'tors, multiple inheritance, etc.

  - reconsider to split the core into \LF and API packages :-/

  - reconsider file.get_fd(), for example for checkpoint
    writing/reading, where apps often have their own native IO
    routines.  But of course, if they get a saga::fs::file, they
    can just close() it, and reopen the location natively...

  - file.flush is missing :-(  Same for replica etc.  Not sure
    if it makes sense on the ns::entry though.

-- 
Nothing is ever easy.