I checked the IDL recommendation, which is the latest revision of DRMAA semantics. The argumentation is basically the same. A job can only be "aborted" if it was not running so far. During the "running" state, a job can stop to work intentionally (leading to wif_exited=true), without a known reason (wif_exited=false), or due to a signal (wif_signalled=true). wif_aborted should be "false" in all these cases. I did a quick check of the Condor DRMAA sources. This implementation returns "wif_aborted=true" only if the job was rejected during submission. It therefore seems to work as expected. Regards, Peter. Piotr Domagalski wrote:
Hi Roger,
On Thu, Aug 7, 2008 at 3:51 PM, Roger Brobst <rogerb@cadence.com> wrote:
Without commenting on any specific implementation, I feel comfortable that the section of the DRMAA spec pertaining to drmaa_wif{exited,signalled,aborted} is written as intended.
In particular, once a process is started, it should eventually end by exiting or by being signalled. If the former case, the exit value should be accessible. In the latter case, the signal should be accessible.
Once a process has been started by the DRM (and enters the 'running state') wifaborted should never be true for the job.
OK. I totally agree -- that does make sense (besides the fact that there's no equivalent of wifaborted in POSIX). I'm curious about other's opinions, especially Andreas' as he will probably know the details of SGE implementation.
When we agree on the consensus, I'll have to look into our LSF and PBS implementations to have this checked/fixed. As far as I remember, they're behaving like SGE, i.e. aborted = true for all terminated jobs whatever their state was.
P.S. I though it'd be appropriate to move this discussion back to drmaa-wg@ogf.org