Since I won't make the meeting, here's my feedback. Peter Tröger wrote:
2. Voting about "UNDETERMINED" job state - keep it as own job state ?
Yes. Undetermined stays as a state, but it is redefined to mean permanently undetermined. Trying again later will yield the same result.
- Means permanent or temporary problem ?
To represent the temporarily undetermined state, we expand the TryAgainLaterException to apply to drmaa_job_ps() as well.
3. Voting about separate "TERMINATED" vs. "FAILED" state - Semantics
A job that exits via the terminated state has the potential to succeed if resubmitted. It entered the terminated state due to an action taken by the job owner, an administrator, or the DRM system itself, possibly on behalf of the terminated job. A job that exits via the failed state is unlikely to succeed if resubmitted. It entered the failed state due to an error in the job or a misconfiguration of the machine on which it ran. There is a problem with my clean could-succeed/won't-succeed division. What if a job failed because the machine it ran on was wonky? That is clearly a failure, not a termination, but if the job were resubmitted and landed on any other machine, it would succeed. In that case, do we actually care if there was a difference between failure and termination?
- Resulting new job state transitions
There's one more thing we may want to consider. In SGE, a job can exit one of four ways. It can succeed. It can fail, which includes termination. It can request to be rescheduled. And it can be set into error state. The first two are handled fine by drmaa_wait(). The third can be recognized by drmaa_job_ps(), but it's not ideal. The fourth is completely unknowable from DRMAA. To the DRMAA client, it will look like the job was requeued to be rescheduled, but is never actually scheduled to run again. We might want to consider supporting some additional states, such as rescheduled or error, or maybe those states are something that the state/substate model would enable. I vote for making the substate as generic as possible. I think forcing it to be an integer in unnecessarily limiting. Taking some Java APIs as examples, sometimes the substates are really just text messages that explain what's going on. I think that's valid and something we should allow.
4. Further DRMAA2 discussion
See the attached email from a few weeks ago. Daniel