Fwd (andre@merzky.net): Re: Fwd (andre@merzky.net): Re: Fwd (andre@merzky.net): Re: [saga-rg] context problem

16 Jul 2006


      [damned, majordomo seems really broken - forward to the list
again]

----- Forwarded message from Andre Merzky <andre@merzky.net> -----
...
Date: Sun, 16 Jul 2006 19:38:54 +0200
From: Andre Merzky <andre@merzky.net>
To: Thilo Kielmann <kielmann@cs.vu.nl>
Cc: Andre Merzky <andre@merzky.net>
Subject: Re: Fwd (andre@merzky.net): Re: Fwd (andre@merzky.net): Re: [saga-rg] context problem
Quoting [Thilo Kielmann] (Jul 16 2006):
...
Merging 2 mails from Andre:
...
very good points, and indeed (1) seems cleanest.  However,
it has its own semantic pitfalls:
saga::file f (url);
  saga::task t = f.write <saga::task::Task> ("hello world", ...);
f.seek (100, saga::file::SeekSet);
t.run  ();
  t.wait ();
If on task creation the file object gets copied over, the
subsequent seek (sync) and write (async) work on different
object copies.  In particular, these copies will have
different state - seek on one copy will have no effect on
where the write will occur.
I cannot see a problem here: With object copying, you will simply have the
same file open twice. And given the operations you do, this might even be
the right thing...
This example is very academic: can you show an example where the sharing of
state between tasks is useful, actually?
The problem here is, that I at a user would expect the write
to happen at byte 100, but it will happen at byte 0: the
seek happens on a different object than the write.
What might be a more obvious example, which goes wrong along
the same lines:
f.write ("line 1\n");
  f.write ("line 2\n");
  f.write ("line 3\n");
That will result in a file
line 1
  line 2
  line 3
whereas the coed
saga::task t1 = f.write ("line 1\n"); t1.run (); t1.wait ();
  saga::task t2 = f.write ("line 2\n"); t2.run (); t2.wait ();
  saga::task t3 = f.write ("line 3\n"); t3.run (); t3.wait ();
will result in a file
line_3
the last write will start on 0, as the previous write
operated on a different file pointer.  In general, you
cannot execute any two tasks on a single object, at least
not if any state is of concern, such as file pointer, pwd,
replica name, stream server port, job id, ...
That is a no-go in my opinion, as it is counter-intuitive,
and breaks a large number of use cases.  And is incosistent
with the syncroneou method calls.
Yes, you can wreak havoc with state as well:
saga::task t1 = f.write ("line 1\n");
  saga::task t2 = f.write ("line 2\n");
t1.run (); 
  t2.run ();
t1.wait ();
  t2.wait ();
will likely result in
linline 2
  e 1
or such - the user does need to think when doing multiple
async ops at once.  I don't see a way around that (and don't
see a need for it either: we want to make the Grid stuff
easy, but not revolutionize programming styles).
...
...
I should have added that I'd prefer 3:
...
...
3. when creating a task, all parameter objects are passed "by reference"
   + no enforced copying overhead
   - all objects are shared, lots of potential error conditions
The error conditions I could think of are:
- change state of object while a task is running, hence
    having the task doing something differently than
    intended
...
Change of state,
That is intentional - see above.
...
like destruction of objects
Well, that is what we discuss :-)  3 would delay destruction
until its save (state is not needed anymore).
...
or change of objects.
What doe you mean here?
...
Not to speak of synchronization conditions: supposed you
have non-atomic write operations (which is everything that
writes more than a single word to memory): do you thus
also enforce object locking by doing this?
If not, you can have inconsistent object state that can be
seen by one task, just because another task is halfway
through writing the object...  (all classical problems of
shared-memory communication apply)
See above.  You are right, but I don't see a way around
that, without causing more harm than good (child and bathtub
come to my mind for some reason...).
BTW: the bulk optimization we have now assumes that tasks
which run at the same time are, by their very definition,
independent from each other, do not depend on any specific
order of execution, and do not depend from each other in
respect to object state.  That are the very points we talk
here about - I think its a very sensible assumption.  I have
the same behaviour on a unix shell BTW:
touch file
  date >> file &
  date >> file &
I would not be able to make assumptions about the file
contents... (well, here I could make a save bet, but you
know what I mean).
...
...
- limited control over resource deallocation
this is the same thing as above
The problem really is that there is no "object lifecycle"
defined.  There is no way to define which task or thread
might be responsible or even allowed to destroy objects or
change objects. Is it???
Yes, that is what I mean with limited control.
We had a discussion on this list and in Tokyo about the
semantics of cancel(), which touches the same problem:
should task.cancel() block until resources are freed?  As we
might talk about remote resources, and Grids are unreliable,
we might block forever.  That does not make sense, at least
not always.
The resolution we came up with is that cancel() is advisory,
so non-blocking, but can also use a timeout parameter (with
-1 meaning forever) to block until resources are freed.
Timeouts do not make sense on destructors I believe, but
'advisory destruction' does, IMHO.
...
...
The advantages I see:
- no copy overhead (but, as you say, that is of no
  concern really)
ok, but minor point.
right.  Lets forget that from now on.
...
...
- simple, clear defined semantics 
no, it is the the most dangerous of the three versions
Well, see above - I think its the most sensible semantics
:-)
...
...
- tasks    keep objects  they operate on alive -
    objects  keep sessions they live in    alive -
    sessions keep contexts they use        alive
what is the maening of "alive" here???  Now that you have
outruled memory management...
see above: resources get freed if not needed anymore.
...
...
- sync and asyn operations operate on the same
    object instance.
Let's forget about "sync" here: it is the task that is
running in the current thread, so multiple tasks share
object instances.
Well, it would be nice to have same semantics for sync and
async, don't you think? :-)
...
...
Either way (1, 2 or 3), we have to have the user of the
API thinking while using it - neither is not idiot
proof.
Well, we should strive to limit the mental load on the
programmer as much as possible...
...
I think (2) is most problematic, if I understant your
'hand-over' correctly: that would mean you can't use the
object again until the task was finished?
No, it means you will never ever again be allowed to use
these objects.  (hand over includes the hand over of the
responsibility to clean up...)
Right.  So you can never do a async read, and then a sync
seek, and then a async read again.  At least not with
sensible results.
Also, I need to create 100 file instances to do 100 reads?
Remember that opening a file is a remote op in itself,
potentially.  Then we don't need the task model anymore.
That is broken IMHO.
Cheers, Andre.
...
Thilo
-- "So much time, so little to do..."  -- Garfield
----- End forwarded message -----

-- 
"So much time, so little to do..."  -- Garfield