But this could take a *long* time, e.g. hours (you have to sort through 1TB of data, which is on a disk). How would a client be able to tell what was going on?
Yes, that can take a long time. Hoever, the tasks have a state attached, they are either:
Pending Running Finished Cancelled
That state can be queried, so you know at least if the task is still alive. I could imagine specific tasks to give more detaild state or progress information, but thats not specified in the strawman currently. For example, we have been discussing progress of file transfer: would be nice if the task tells you how much of the file is transfered, or even with what throughput. But that falls more into the domain of monitoring, which was left out of the strawman intentionally, for now.
Is that what you would expect in terms of feedback? If not, can you give an example?
It's not a question about functionality. More a comment about language design, and semantics. You are potentially hiding a large amount of processing behind a file read. I don't find that intuitive. Should I put code around all eReads to allow for this? With the explicit prepare, I might send a message to a service to so the prepare, then start/queue a batch job once the processing was complete. If I am sitting on a file read for an hour on a supercomputer, it's expensive. That's why I think the decoupling is better. But I suppose that I could implement the decoupled prepare/read outside of the SAGA API, which is maybe where it belongs. And the API you have is certainly fine for smaller files. Perhaps that is what you are suggesting at the end of your reply....
<snip> If the first preperation takes an hour...?
The again, middleware like data cutter can benefit from preprocessed data (do indexing before, or create octree structure before) - that could be done by creating a task beforehand, which prepares the data, and then do the read afterwards. Would that do what you need?
// warning: Pseudo Pseudo Code... Job job ("host_A", "/bin/subsample /data/hige_file_A /tmp/ small_file_B");
// wait for job completion // read prepared data File file ("gridftp://host_A//tmp/small_file_B"); file.read (100, buffer, &out);
I guess we are agreeing... Jon.