Dear all, The March 31th conference call decided upon the following strategy regarding job state model extension: --- snip
4. TERMINATED vs. FAILED state discussion: http://www.ogf.org/pipermail/drmaa-wg/2009-March/001012.html
Option 2 from the original mail is now highly preferred. TERMINATED state should express that an external entity (e.g. user or DRM system) stopped the job before finishing. For POSIX-aligned systems, this could be formulated as reception of a signal by "the job". In contrast, FAILED state now expresses that the application stopped on its own before finishing. For POSIX-aligned systems, this could be formulated as reception of a signal "by the job's application process". We ask for comments from PBS and LSF experts (FedStage ?!?). Do these systems provide enough error information to distinguish between these two states ? For SGE and Condor, Dan and Peter already agreed. --- snip Piotr from FedStage informed me that the proposed distinction seems not to be implementable in PBS. One solution could be to detect the 'requested' termination only in the DRMAA library. Dan already expressed that this would not reflect the original idea. An intentional job termination by another user would then lead to FAILED instead of TERMINATED. Since we already rejected Option 1 and 3 in the last phone calls, we come out with Option 4 as last solution: There will be no new TERMINATED state. The new job sub-state concept will allow to express the job failure details, but only in a DRM-specific way. We will finally vote about this in the next call. Best regards, Peter.