SAGA Use Case Template: ======================= Name of use case: Application Migration Contact: Andre Merzky 1. General Information: ----------------------- This section consists of check-boxes to provide some context in which to evaluate the use case. 1.1 Which best describes your organisation: Industry [ ] Academic [x] Other [ ] Please specify: ................................ 1.2 Application area: Astronomy [ ] Particle physics [ ] Bio-informatics [ ] Environmental Sc. [ ] Image analysis [ ] Other [ ] Please specify: astrophisics, but the use case is generic 1.3 Which of the following apply to or best describe this use case Multiple selections are possible, please prioritize with numbers from 1 (low) to 5 (high): Database [ ] Remote steering [3] Visualization [1] Security [1] Resource discovery [5] Resource scheduling [5] Workflow [3] Data movement [5] High Throughput Computing [ ] High Performance Computing [1] Other [ ] Please specify: ................................ 1.4 Are you an: Application user [ ] Application developer [ ] System administrator [ ] Service developer [ ] Computer science researcher [ ] Other [ ] Please specify: Middleware Developer (higher levels) 2. Introduction: ---------------- 2.1 Provide a paragraph introduction to your use case. Background to the project is another alternative. (E.g. 100 words). One of the major scenarios targeted by the GridLab project is the ability to migrate a running application in a VO. The migration process may get triggered by various means: - running out of time on the original resource - a more powerful resource comes available - a resource with more memory or local disk space is needed - user prefers a different resource and triggers migration - migration as part of a larger work flow scenario The migrations includes following well defined steps: - trigger migration - discover new resource - perform application level checkpointing - move checkpoint data to new resource - schedule application on new resource - continue computation (and discontinue old job) Several of these operations need to be done on application level - the use cases specifically describes those operations in respect to an Grid API. 2.2 Is there a URL with more information about the project ? http://www.gridlab.org/ 3. Use Case to Motivate Functionality Within a Simple API: ---------------------------------------------------------- Provide a scenario description to explain customers' needs. E.g. "move a file from A to B," "start a job." Please include figures if possible. If your use case requires multiple components of functionality, please provide separate descriptions for each component, bullet points of 50 words per functionality are acceptable. Following the list from 2.1: - trigger migration If the application triggers the migration process itself, it needs means to communicate with the resource management system it got started with, or with any other one which knows about its execution environment requirements (exe, input files, output files... -> job description). The request basically is: rms = Grid.getResourceManagementSystem (); self = rms.getMyJobDescription (); // perform checkpoint // save state If the application migration gets triggered from outside the application, the application needs to have means to getified about this - it needs to know when to perform checkpointing and to shut down. There are many ways to do that - application steering like mechanisms seem the most convenient ones: sub mycallback (userdata) { // perform checkpoint // save state } result = Grid.announceCheckpointCallback (mycallback, userdata); - discover new resource If that operation is not performed by the resource management system itself, the application needs to discover new resources where itself can run on. It needs to provide its own job description. host = GriResourceManager.discoverNewHost (self); - perform application level checkpointing The checkpointing process itslef does not need Grid support per se, but the application needs to be able to announce the location of it's checkpoint files. These could be put into a replica catalog, or onto a global file system - but the resource manager needs to know about them, in order to make them available on the new resource: app.checkpoint (filename); grid.replicaCatalog.addFile (replicaname, filename); rms.announceCheckpointFile (replicaname); - move checkpoint data to new resource If that operation is not performed by the resource management system itself, the application needs to be able to mograte its checkpoint files to the new resource: grid.copyFiles (filename, host); or grid.replicate (replicaname, host); - schedule application on new resource If that operation is not performed by the resource management system itself, the application needs to be able to start a copy of itself on the remore resource: copy = GriResourceManager.runJobOnHost (self, host); - continue computation (and discontinue old job) Both are straight forward. 4. Customers: ------------- Describe customers of this use case and their needs. In particular, where and how the use case occurs "in nature" and for whom it occurs. E.g. max 40 words The cusomers of the use case are scientific communities with jobs a) running for a very long time (~weeks) b) with varying comuting demands (peeks requiring more powerful resource, or more disk space) c) which are part of larger dynamic systems Grand Challenge Simulations are specific target applications for that use case. 5. Involved Resources: ---------------------- 5.1 List all the resources needed: e.g. what hardware, data, software might be involved. - compute resources - data storage systems - resoure management systems - data replication/movement systems - remote steering or monitoring systems 5.2 Are these resources geographically distributed? potentially yes. 5.3 How many resources are involved in the use case? E.g. how many remote tasks are executing at the same time? minimum: 2, maximum: unlimited, only one compute resource at the same time. 5.4 Describe your codes and tools: what sort of license is available, e.g. open or closed source license; what sort of third party tools and libraries do you use, and what is their availablility; do you regularly work from source code, or use pre-compiled applications; what languages are your applications developed in (if relevant), e.g. Fortran, C, C++, Java, Perl, or Python. Application: C/Fortran code, open source http://www.cactuscode.org API: C api binding to Grid Services, open source http://www.gridlab.org/gat/ Services: C and Java Services, open source, mostly basing on globus http://www.gridlab.org/ 5.5 What information sources do you require, e.g. certificate authorities, or registries. Resource Discovery and state preservation (repolica systems or similar) are the main requirements to information management. 5.6 Do you use any resources other than traditional compute or data resources, e.g. telescopes, microscopes, medical imaging instruments. No. 5.7 Please link all the above back to the functionalities described in the use case section where possible. ... 5.8 How often is your application used on the grid or grid-like systems? [ ] Exclusively [ ] Often (say 50-50) [x] Ocassionally on the grid, but mostly stand-alone [ ] Not at all yet, but the plan is to. The application is actually used in Grids, but does not make full use of Grid capabilities (as the one described here). 6. Environment: --------------- Provide a description of the environment your scenario runs in, for example the languages used, the tool-sets used, and the user environments (e.g. shell, scripting language, or portal). Users work mostly on shells, portals are uder development. Programmers work on open source solutions, unix only, C, C++, Fortran. 7. How the resources are selected: ---------------------------------- 7.1 Which resources are selected by users, which are inherent in the application, and which are chosen by system administrators, or by other means? E.g. who is specifying the architecture and memory to run the remote tasks? Compute Resources are selected manually or automatically (job description by users). 7.2 How are the resources selected? E.g. by OS, by CPU power, by memory, don't care, by cost, frequency of availability of information, size of datasets? OS, Architecture, Memory, disk space, runtime (when, how long) 7.3 Are the resource requirements dynamic or static? Vary from run to run, but mostly static, sometimes dynamic. In the future more dynamic. 8. Security Considerations: --------------------------- 8.1 What things are sensitive in this scenario: executable code, data, computer hardware? I.e. at what level are security measures used to determine access, if any? Data should get only accessed by owner or group. Resources are not to be compromised of course. --> standard academic security requirements. 8.2 Do you have any existing security framework, e.g. Kerberos 5, Unicore, GSI, SSH, smartcards? GSI for all communication and resource access. 8.3 What are your security needs: authentication, authorisation, message protection, data protection, anonymisation, audit trail, or others? authentication, authorisation, basic data protection 8.4 What are the most important issues which would simplify your security solution? Simple API, simple deployment, integration with commodity technologies. simple deployment 9. Scalability: --------------- What are the things which are important to scalability and to what scale - compute resources, data, networks ? The scenario is not bound by scalability (the application of course is). 10. Performance Considerations: ------------------------------- Explain any relevant performance considerations of the use case. Full time to migrate to a better must result in a benefit if compared to having the computation simply continue on the old resource. However, on ocasions where simply continuation is not possible, performance penalties are acceptable. In general: performance requirements depend on specific application/simulation. 11. Grid Technologies currently used: ------------------------------------- If you are currently using or developing this scenario, which grid technologies are you using or considering? - globus based services from the GridLab project - Grid Application Toolkit from the GridLab project 12. What Would You Like an API to Look Like? -------------------------------------------- Suggest some functions and their prototypes which you would like in an API which would support your scenario. An example of a migtration in GAT is included in the GAT release. 13. References: --------------- List references for further reading. http://www.gridlab.org/gat/