Hi,
these are notes from the ad-hoc meeting at SC-05 about a
message oriented communication API. The API might be
considered for inclusion at into the GGF SAGA API spec at
some point - for now it is only supposed provide coherent
discussion and development in the interested groups.
As a reminder, material about saga can be found at:
http://wiki.cct.lsu.edu/saga/
Meeting Participants:
---------------------
- Jason Leigh (EVL)
- Venkatram Vishwanath (EVL)
- Andrei Hutany (LSU)
- John Shalf (LBNL)
- Andre Merzky (VU/LSU)
Definition:
-----------
Message: chunk of data which is potentially larger than a
network package.
Several independend sets for property flags have been
identified:
Reliability Requirements:
- - - - - - - - - - - - -
- Reliable all messages are received exactly once. If
received, messages are complete
- Unreliable messages are either received or not. If
received, messages are complete
- AtLeastOnce optional, as for Reliable, but messages
can be received more than once
Correctness Requirements:
- - - - - - - - - - - - -
- ByteErrors received messages MAY contain byte errors
- NoByteErrors received messages MUST NOT contain byte errors
Ordering Requirement:
- - - - - - - - - - -
- Ordered messages MUST be received in order
- NotOrdered messages MAY be received out of order
API considerations:
-------------------
- it was felt that a BSD like connection setup is most
useful
- asynchroneous recieving of complete messages is needed
(viz use cases!)
- striping/multicasting on application level is not
considered for now (multiple senders/receivers)
API proposal:
-------------
- establish connection:
- bsd like: listen/accept/connect
- port range should be specifiable
- properties should be specified as flags (changable at
runtime)
- write:
write (buffer, size, BLOCK | NO_BLOCK)
message is written completely or not at all (if possible)
- read: two step mechanism:
(int handle, int size) = query_size ();
(char* buffer) = read (handle, size);
- handle can be zero, if size is known (one step read)
- buffer needs to be allocated by application.
- if size is zero, the buffer is allocated by the
implementation and returned to be freed by the
application (one step read)
- if size is smaller then the real message size, message
gets truncated, remainder gets lost (read again not
possible)
- if size is larger, buffer gets patted by 0
- asynchroneous method calls:
- as in the SAGA task model, with callbacks
- connection shutdown:
- as in bsd (close ())
Please feel free to send corrections, comments etc.
Thanks, Andre.
--
+-----------------------------------------------------------------+
| Andre Merzky | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science | mail: merzky(a)cs.vu.nl |
| De Boelelaan 1083a | www: http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands | |
+-----------------------------------------------------------------+
Hi Pascal,
there have been more models in discussion in Boston, true
enough. However, I can't really figure out what model you
refer to. Some example code would be helpful :-)
Cheers, Andre.
Quoting [pascal.kleijer(a)knd.biglobe.ne.jp] (Nov 03 2005):
>
> Hello all,
>
> Lets enter the arena if it is not too late, sorry for that.
> I do agree with many things that have been discussed here.
> Being myself a POP (pattern oriented programming) and OOP
> (Object Oriented programming) guy I tend to favor nicely
> encapsulated models.
>
> As mentioned by Tom, the template is possible in Java as
> well. But that forces the usage of Java 5.0 (aka 1.5) at
> least. This isn't recommended since many programs
> still run with 1.4. Using them would probably create a very
> different binding in Java, C++ or Python, which would end up
> in 2 different SAGA implementations/usage, one for OO
> languages and one for old fashioned procedurals.
>
> As to exposing the different implementation to the user I
> think this could be avoided in OO languages, or at least it
> should not be as obvious as in 4a2. During the GGF meeting
> their was a last minute discussion about a `g' solution
> which is pretty much pure OO design with a simple method
> call whatever model is used. This is I think a good choice
> for an OO point of view. You can derivate an abstract class
> to specialized classes with synchronous or asynchronous
> implementation. This would even open the room for sub
> classing it to add custom code for debugging for example or
> other treatment without affecting the initial code.
> Depending on how the programming is tackled factories aren't
> necessary. The problem might come when implementation is
> made in procedural languages, it might be necessary to add
> additional parameters to the constructor or the calls.
>
> I don't have the time this morning to make a sample
> code. But I hope you got the idea.
>
> Best Regards,
> Pascal Kleijer
>
> >
> > My vote is for 4a1. (Which was option D in our original discussion I
> > believe.)
> >
> > Cheers,
> >
> > Tom
> >
> > On Thu, 3 Nov 2005, Andre Merzky wrote:
> >
> > >
> > > Hi John, Thilo, Group,
> > >
> > > Hartmut and I would like to argue about the C++ bindings.
> > > We would like to come to an agreement soon, as our
> > > implementation is dealing with the async part right now.
> > >
> > > As reminder:
> > >
> > > ---------------------------------------------------------------------
> > > Example 4a: more versions in C++
> > >
> > > 4a1: d.mkdir ("test/");
> > > d.mkdir_sync ("test/"); // same
> > > saga::task t_1 = d.mkdir_async ("test/");
> > > saga::task t_2 = d.mkdir_task ("test/");
> > >
> > > 4a2: d.mkdir ("test/");
> > > d.sync .mkdir ("test/"); // same
> > > saga::task t_1 = d.async.mkdir ("test/");
> > > saga::task t_2 = d.task .mkdir ("test/");
> > >
> > > 4a3: d.mkdir ("test/");
> > > d.mkdir <sync> ("test/"); // same
> > > saga::task t_1 = d.mkdir <async> ("test/");
> > > saga::task t_2 = d.mkdir <task> ("test/");
> > > ---------------------------------------------------------------------
> > >
> > >
> > > Quoting [John Shalf] (Nov 02 2005):
> > >>
> > >>> Q5) Any comments to 4a1, 4a2 or 4a3? (not part of the Strawman!)
> > >>
> > >> I prefer 4a1 because it is more readable and the implementation would
> > >> be quite straightforward.
> > >
> > > You favour 4a1. We think that implementation is very straight
> > > forward for all three versions. Readability seems not much
> > > different to me, and might largely be a matter of taste.
> > >
> > >
> > >> It is also a familiar paradigm for any MPI
> > >> programmers and anyone who has played with various proprietary Async
> > >> I/O implementations. (its a very familiar and conventional approach)
> > >
> > > Well, these are C-API's. Is there a C++ binding for MPI,
> > > and does it look the same?
> > >
> > >
> > >> I kind of like 4a2 as well from the standpoint of a C++ programmer
> > >> (even with Andre's syntax corrections). However, the resulting
> > >> bindings will not be very consistent with the approach we would take
> > >> for Fortran or C bindings (eg. those would likely look more like
> > >> 4a1).
> > >
> > > But, well, that is the idea of the language binding! I agree
> > > that C and Fortran would look more like 4a1, of course. But that
> > > is no reason that C++ should look like that as well, or Java,
> > > Perl etc.
> > >
> > >
> > >> It is not really much more readable than 4a1. Therefore, I'm
> > >> uncertain if it is worth fragmenting our approach to bindings in
> > >> different languages when there is not a clear benefit in terms of
> > >> readability or implementation complexity.
> > >
> > > I think that 4a2/4a3 actually allow nicer implementations, as it
> > > allows to have the different async parts somewhat separate from
> > > the sync parts. We think its nicer :-)
> > >
> > >
> > >> I do a lot of C++ programming, but I find the 4a3 option a bit
> > >> obscure both in terms of readability and any advantages it might
> > >> confer in terms of implementation.
> > >
> > > Hehe - I thought the same :-) Hartmut likes that version very
> > > much. To me it appealed after pondering over it for a couple of
> > > days. Now I think it is cute, and quite expressive.
> > >
> > >
> > >> It would certainly be easier to
> > >> create multi-language bindings for a single code base if we stick
> > >> with something more conventional like 4a1.
> > >
> > > I think that C and Fortran bindings for this part are straight
> > > forward anyway, there is no need to reflect that in C++...
> > >
> > >
> > >> Each approach is equally readable to me (less so for 4a3). I'm
> > >> certainly open to any additional information on how the 4a2 and 4a3
> > >> approaches could simplify implementation.
> > >
> > > The main point really is that the object itself has only the sync
> > > method calls, and the async calls can be put into separate
> > > header/sources easily, and build (by default) on top of the sync
> > > calls. Of course, you can do that with all three versions, but
> > > its not as obvious in 4a1.
> > >
> > >
> > >> If the other approaches
> > >> can offer some implementation benefits, then maybe I'd give them
> > >> extra consideration, but otherwise, I would prefer a more
> > >> conventional approach like 4a1.
> > >
> > > I would vote for 4a2 or 4a3 (in that order), but 4a1 would be ok
> > > if the majority likes that most, of course. Basically its
> > > a matter of taste I think. I am happy that the general point
> > > seems accaptable to all so far: having sync, async, and task
> > > versions of the calls, w/o explicit task factory.
> > >
> > >> The only implementation I'm outright
> > >> against is the 4b example.
> > >
> > > Good! :-)
> > >
> > > Thanks, Andre.
> > >
> > >
> > >> -john
> > >
> > >
> > >
> > >
> >
> >
--
+-----------------------------------------------------------------+
| Andre Merzky | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science | mail: merzky(a)cs.vu.nl |
| De Boelelaan 1083a | www: http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands | |
+-----------------------------------------------------------------+
Hi Haresh,
sorry for the late answer...
Quoting [Haresh Bhatt] (Nov 14 2005):
>
> Hi Andre and All,
>
> Sorry for delayed response as I was out of touch to my mailbox.
>
> I will revert back on Job states. I have some comment on your response with
> respect to task handling.
>
> >>For the task states: tasks in SAGA are just handles to
> >>asynchroneous operations really, so fairly simple things.
>
> I feel it will nice to have state of Suspend and Resume for task
> (asynchronous operations). This is will be useful while handling several
> issues especially during fault recovery/tolerance time period as well as
> synchronization among asynchronous tasks. We have faced such problems in
> real life environment and solved using Suspend and resume mechanism.
Could you give us an example for such situations?
Also, I am somewhat unsure a bout the semantics of a task
suspend. Asume you do a remote seek operation
asynchroneously. In terms of implementation that would
potentially look like that:
main thread:
01 saga::file f (url);
02 saga::task t = f.seek_async (100, saga::file::SEEK_SET);
03
04 sleep (2);
05 t.suspend ();
right?
Now the saga implementation would, at line 2, spawn a
separate thread. That thread would issue for example
a single soap call to a remote site to change the state of a
remote file handle (EPR or so). the thread would then
do a blocking wait for the answer soap message or so.
Now, what would a suspend do, e.g. if issued after sending
the request, and before receiving the answer?
It could keep the answer undelivered to the application
while in suspended state, but I don't see the value of that
(the application could just as well ignore the fact that the
task is finished).
Or it could refuse to accept the answer if in suspend state,
but that might well mean that the connection drops, and a
later resume would be impossible or very difficult.
The most useful semantics I could imagine is to talk to the
remote side again, and to request the remote seek to be
halted (that might make more sense for a remote file
transfer or so...). But assuming that arbitrary remote
operations are suspensible is very optimistic (apart from
file transfer, I could not think of any, really).
> >>For most implementations, a async call (i.e. task) will
> >>probably spawn a thread which performs a operation (e.g.
> >>remote file copy), and watch that thread. So a 'failed
> >>task' will mean that the file copy failed, not that the
> >>thread was killed -- FAILED so always would mean a user
> >>handled error.
>
> Take a case: Remote file copy is not supporting RESUME and network
> connectivity is failed. Fault tolerance mechanism (to switch over to
> alternate path) takes little more time than the Remote File Copy time out
> period. This may cause remote file copy to fail and waste the complete time
> period of the partial file transfer. In such a case, one would like to
> suspend the Remote File copy thread and will resume back when network is
> restored back. Thus the Suspend and Resume states (as well as API facility
> to suspend and resume) will help to make environment more and effective
> fault tolerant. These states will also help in proper time-accounting.
I don't think that evaluation of network status belongs on
the application level. Instead, a clever saga
implementation could be able to handle network
drops/switches transparently.
So IMHO the task should continue to run on a network drop,
w/o the application seeing any problem really. So state
would stay Running. The implementation however would detect
the network drop, wait for the network replacement to come
up, and use a restart marker or such to continue operation.
Would that make sense to you?
Cheers, Andre.
--
+-----------------------------------------------------------------+
| Andre Merzky | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science | mail: merzky(a)cs.vu.nl |
| De Boelelaan 1083a | www: http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands | |
+-----------------------------------------------------------------+