
Dear SAGA members, Based on the API document currently in public comment phase, we have implemented a very simple version of the Job Management API. Basically we stripped the underlying SAGA model and when directly to Job Management API. See the attachment UML graph for details. The reason was that we did not have enough time to implement the full SAGA core to support the API and we focus on the NAREGI Super Scheduler (SS). Consideration and simplifications: - We do not support the suspended state at this time (see below for details). - The wait method is reserved in java for signal synchronization and is not used in the same way; we did not implement a real wait method since we are not interested in synchronization of jobs at the moment. This might be equivalent of the Thread.join() method. - Metrics handling has not been added. This might come with future incarnation of the package. - Job_self is not supported, also we could pretend it is the same as job in java. - We do not include all the methods of the job_service so far in the factory, might come in later incarnations and rename the factory in service. - Checkpoint and migrate are not supported for now. In two models it has no sense and the SS seems not to support it yet. - Signal is not supported, internally some implementations have it. But again the SS does not. - Many attributes are not supported at the moment. - Session and security model ignored (the SS has his own model) others don't care. - A job description can take strings, collections and a string arrays as arguments. Other formats are allowed if the caller knows how to manipulate them. Properties that are known to use string arrays have direct assessors to facilitate their access. All properties can be stored as a single string to allow serialization. Since we are working in Java we decided to go pure pattern oriented. So an application has access to the factory, job pattern and a Job description. The concrete Job stubs implementations are not supposed to be exposed (but are accessible since the class is public). Now the design is made so that we can later on hock the SAGA core classes below the current API without breaking (to much) the code (assuming we hold on our design pattern approach). We have three concrete implementations of Jobs: Local, SSH and Super Scheduler. - The local job uses the java process object and handles a job on the same machine as the JVM. This job type does not support suspended mode at all. This is a fully synchronous job since all actions are taken on the spot, unless you submit the job on a queuing system. - The SSH is a remote job incarnation, the job can run on any machine that has the SSH daemon running, this can be a synchronous job, unless you submit the job on a queuing system. This job type does not support suspended mode for the moment and only POSIX systems can be used to launch the job. - The Super Scheduler is NAREGI specific and uses NAREGI’s middleware. This is an asynchronous job. Suspended mode cannot be directly handled even if the state exists in the SS, so this is still pending. This job produces internally WSDL documents; the necessary methods are private however. General comments and questions. Might be some meat for the public comments as well: Now we stumbled upon the state machine of the API. The "Unknown" and "New" state are unclear to us. In our opinion when you create a job either with the factory or directly with the constructor of the specific incarnation, we enter the "New" state. The "Unknown" state is now reserved for the very short time the object is instantiated but we directly switch to "New" once the constructor is finished. The principle in OO programming is to have a stable object once you finish constructing it and calling method; if the constructor is not enough to have a stable object you need a factory. So when you get an object back it should be in a stable state, thus the "Unknown" state is superficial in our opinion. Some metrics or attributes or the Job are useless since they come directly from the descriptor, Example: "ExecutionHosts", "WorkingDirectory" or "CPUTimeLimit". Unless you consider that these values might be different from the job description. Or if the job description don't mention them the job can have this values assigned by the back-end. Either case the API documentation should clarify this. The run_job from the service will not follow the API contract if implemented. Only one parameter can be returned in java. Also the streams are available thought the Job pattern. In the document section 3.8.8 Examples the example at line 16 and 17 is wrong (or the method is overwritten). There should be no string argument. The host should be set in the descriptor. -- Best regards, Pascal Kleijer ---------------------------------------------------------------- HPC Marketing Promotion Division, NEC Corporation 1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan. Tel: +81-(0)42/333.6389 Fax: +81-(0)42/333.6382