Re: [saga-rg] proposal for extended file IO

13 Jun 2005

      ...
...
But this could take a *long* time, e.g. hours (you have to sort  
through 1TB
of data, which is on a disk).  How would a client be able to tell  
what was
going on?
Yes, that can take a long time.  Hoever, the tasks have a
state attached, they are either:
Pending
  Running
  Finished
  Cancelled
That state can be queried, so you know at least if the task
is still alive.  I could imagine specific tasks to give more
detaild state or progress information, but thats not
specified in the strawman currently.  For example, we have
been discussing progress of file transfer: would be nice if
the task tells you how much of the file is transfered, or
even with what throughput.  But that falls more into the
domain of monitoring, which was left out of the strawman
intentionally, for now.
Is that what you would expect in terms of feedback?  If not,
can you give an example?
It's not a question about functionality.  More a comment about  
language design, and semantics.  You are potentially hiding a large  
amount of processing behind a file read.  I don't find that  
intuitive.  Should I put code around all eReads to allow for this?

With the explicit prepare, I might send a message to a service to so  
the prepare, then start/queue a batch job once the processing was  
complete.  If I am sitting on a file read for an hour on a  
supercomputer, it's expensive.  That's why I think the decoupling is  
better.

But I suppose that I could implement the decoupled prepare/read  
outside of the SAGA API, which is maybe where it belongs.  And the  
API you have is certainly fine for smaller files.

Perhaps that is what you are suggesting at the end of your reply....
...
<snip>
If the first preperation takes an hour...?
The again, middleware like data cutter can benefit from
preprocessed data (do indexing before, or create octree
structure before) - that could be done by creating a task
beforehand, which prepares the data, and then do the read
afterwards.  Would that do what you need?
// warning: Pseudo Pseudo Code...
  Job  job  ("host_A", "/bin/subsample /data/hige_file_A /tmp/ 
small_file_B");
// wait for job completion
  // read prepared data
  File file ("gridftp://host_A//tmp/small_file_B");
  file.read (100, buffer, &out);
I guess we are agreeing...

Jon.

Re: [saga-rg] proposal for extended file IO

Jon MacLaren