On Tue, 8 Feb 2005, Roger Brobst wrote:
As described in http://forge.gridforum.org/tracker/?aid=309 it would be useful for drmaa to provide a mechanism that an application can use to determine why a job was aborted.
The existing proposal is to add a string output argument to drmaa_wait() that provides text information in case drmaa_wifaborted()==true
My counter-proposal is to introduce a new function that can be called after drmaa_wifaborted()==true to obtain the reason the job was aborted. (Similar to how drmaa_wexitstatus() can be called after dramm_wifexited()==true to obtain the actual exit status)
int drmaa_wabortreason( char* abort_reason, /*OUT*/ size_t abort_reason_len, /*IN*/ int stat, /*IN*/ char* error_diagnosis, /*OUT*/ size_t error_diag_len /*IN*/ )
We should discuss what the function should output if the drmaa implementation does not have any information about why the job was aborted.
I agree this would fix the #309 problem. I doubt however it is possible to extract 'char* abort_reason' from a 'int stat'... I think we need to introduce a new opaque datatype that is to be returned by drmaa_wait(). The new datatype would have to serve as a wrapper for stat, rusage and abort_reason and drmaa_wif* functions would simply operate on the new data type instead of 'stat'. Regards, Andreas