Dear all,
I'd suggest adding a simple test case for the drmaa_wifaborted() confusion I bumped into recently. The test could:
Great idea, I did that by extending two existing test cases (ST_SUBMIT_IN_HOLD_DELETE and ST_SUBMIT_KILL_SIG). The test suite version is therefore now 1.6.0.
- submit a job in hold state, - drmaa_control(TERMINATE) and drmaa_wait(), - assure that drmaa_wifaborted() == true, drmaa_wifexited() == drmaa_wifsignaled() == drmaa_wifcoredumped() == false,
- submit a long job (e.g. /bin/sleep 3600), - wait (polling) for it to start, - drmaa_control(TERMINATE) and drmaa_wait(), - assure that drmaa_wifsignaled() == true, drmaa_wifexited() == drmaa_wifaborted() == drmaa_wifcoredumped() == false,
wifexited() must be 0 for the first case, and !=0 for the second case. GFD.133 is (finally) very clear about that: "Evaluates into 'exited' a non-zero value if stat was returned for a job that either failed after running or finished after running" wifexited() should tell you if the job has an exit code, which is only possible if it ever was executed.
I'm not 100% sure, but I guess SGE fails the second test...
I tested the latest Condor for Windows and Linux. It fails now with the first test, since it returns wifexited()!=0 even though the job was terminated before running. I will fix that for the next Condor release, if we agree on my interpretation of the spec. Somebody else (Andreas ?) need to check SGE. Thanks, Peter.