Hi Roger, On Wed, Nov 5, 2008 at 10:41 PM, Roger Brobst <rogerb@cadence.com> wrote:
However if WIFSIGNALED output zero (false), calling WTERMSIG is undefined. [...] However, If WIFEXITED output zero (false), calling WEXITSTATUS is undefined.
Yes, I totally agree with that. I just oversimplified this a bit in my listings.
Yes, if a DRM uses a shell to start the client-specified program and the shell uses the convention of conveying that the child was terminated by exiting with 128+sigNum, then the DRM may not be able to distinguish between a child exit(137) and being terminated by sigNum=9. This is a DRM implementation issue.
I was actually hoping for some discussion as to how should DRMAA implementation should look like in this case. And also (that's mainly for Peter), what should the test suite look like. For example, now it tests exit statuses 0..255 which would obviously fail if we wanted to assume that drmaa_wifexited is true only for 0..128 and use the remaining values for signal numbers.
Yes, since a given process cannot exit itself and be terminated, WIFEXITED and WIFSIGNALED should never both be true (non-zero).
Are you talking about DRMAA's job here or just a general unix process? Because in the former case, there seems to be a differenet assumption, probably because of the "failed after running" case in wifexited. The thing is that current test suite (again, Peter?) tests whether a signalled DRMAA's job was both wifsignaled and wifexited. That kind of puzzled me.
Historically, a zero exit status from a Unix process meant "exited successfully". I believe the "failed after running" clause in the below excerpt is intended to mean exited with a non-zero value
"Evaluates into 'exited' a non-zero value if stat was returned for a job that either failed after running or finished after running"
The problem is that, as far as I understood Peter's intentions in the test suite, this "failed after running" clause is interpreted differently. -- Piotr Domagalski