
Hi all, below are some notes for mapping the current SAGA API spec to one of the use cases, i.e. the GridLab use case for application migration. I attach the use case for reference as well. Cheers, Andre. +-----------------------------------------------------------------+ The SAGA API allows to migrate any job it can handle with the job class, using the migrate method. That provides an easy solution for the GridLab migration use case if supported by the implementation/middleware/backend: -------------------------------------------------------------- #include <saga.hpp> #include <vector> #include <string> using namespace std; int main () { saga::job_server js; saga::job j = js.run_job ("remote.host.net", "my_app"); job_definition jd = j.get_job_definition (); vector <string> hosts; vector <string> files; hosts.push_back (string ("near.host.net")); files.push_back (string ("http://remote.host.net/file > http://near.host.net/file")); jd.set_vector_attribute ("SAGA_HostList", hosts); jd.set_vector_attribute ("SAGA_FileTransfer", files); j.migrate (jd); cout << "Heureka!" << endl; return (0); } -------------------------------------------------------------- (Question: does the SAGA migrate call move checkpoint files automatically, or do they need to be specified in the new job description as above?) However, for the complete use case to be implemented on application level, a number of steps cannot be implemented in SAGA. The call sequence would be: In the application instance which performs the migration on the other job: - trigger migration for the remote job - discover new resource + move checkpoint data to new resource + schedule application on new resource + continue computation (and discontinue old job) In the application instance which gets migrated - get tirggered from checkpointing = perform application level checkpointing - report checkpoint file location(s) Items marked with + are possible to implement in SAGA, items marked with - aren't. The item marked with '=' is (currently) not related to SAGA. For the complete implementation of the use case, SAGA misses: 1) means to communicate with the remote application instance 2) means to discover new resources Notes: 1) means of communcation are actually given, but not per se usable for this use case. E.g. streams are a definite overkill for signalling checkpointing requests. Signals (as in job.signal (int signal)) would work, but only if the remote job uses signal handling as a checkpoint trigger. That also might be difficultato use if the job is running in a wrapper script, or in a virtual machine etc - that might not be transparent to SAGA, and would require direct communications. Also, the signalling method misses feedback about success of the operation, and cannot return information such as the location of checkpoint files. 2) the current SAGA API covers job submission to specific hosts, or lets the middleware choose a suitable host for submission. However, the brokering result is not exposed on API level, as would be neccessary for this specific use case, and possibly for other dynamically active Grid applications. One way to implement that is to provide a direct interface to Grid information systems, and on that way expose information about available resources. That would actually be more flexible, as is e.g. also allows the discovery of specific services, but would also require additional semantic knowledge on application lelvel. +-----------------------------------------------------------------+ -- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+
participants (1)
-
Andre Merzky