Re: [drmaa-wg] DRMAA TEST SUITE

21 Mar 2006

      ...
Sorry, I do not agree. In the DRMS context, job life cycle comprises all the 
job execution stages since the job enters the DRM system. In this sense, 
whenever a job is submitted there should be a termination (either it actually 
ran or not). I can give you an example, if you submit a job (qsub) and then 
you kill it (qdel), it is obvious that the job terminated abnormally (it has 
been killed), although the job never entered the running state.
This is one possible interpretation, I agree. The DRMAA spec is aligned 
to POSIX semantics here - it is only possible to have something 
terminated which was running (== executed) before.
...
There is no relation between if the job terminated normally and if there is no 
further information from the DRM. In the previous example (a job that has 
been killed) could or could not be more information from the DRMS.  But in any 
case, it is clear that the job terminated abnormally.
drmaa_wifexited description should concentrate in one aspect since there is no 
obvious (or general) relation between job termination and getting further 
information from DRM.
You are right. The main intention of drmaa_wifexited() is to tell you if 
additional information about the job execution ending is available. The 
final status of the job is provided by drmaa_job_ps(), and nothing else.

The confusion might eventually be solvable by a slight reformulation of 
the first sentences in the drmaa_wif...() descriptions, in order to 
avoid the word "termination". This would not lead to a change of semantics.

I have no good proposal - DRMAA group ?
...
...
( Note: The testsuite assumes here that unusable input files are
detected by the DRM before the job starts. This  seems to be realistic,
since file staging operations are usually not part of the job execution.)
I do not think so. Usually job preparation stages are part of the job 
execution, for example:
...
Therefore I suggest removing the ST_ERROR_INPUT_FAIURE, ST_ERROR_FILE_FAILURE 
and  ST_ERROR_FILE_FAILURE from the official test suite. In the previous DRMs 
at least, you can submit a job with output file /etc/passwd or an unusable 
input file , the job is queued, runs and fails.
During the last phone call, the group went through the code. We agree to 
your impression that the 3 tests are currently not sufficient. The 
descriptions for "input / output / error stream" job template parameters 
says that an invalid value should result in the job state 
DRMAA_PS_FAILED - and nothing more. There is no description of what that 
means for drmaa_wif...() calls, but the testsuite expects a particular 
behavior. If you look at DRMAA section 2.6, it is clearly shown that 
DRMAA_PS_FAILED is possible both for queued and running jobs.

Our proposal is to remove the call of drmaa_wifaborted() for 
ST_INPUT_FILE_FAILURE / ST_ERROR_FILE_FAILURE / ST_OUTPUT_FILE_FAILURE. 
The drmaa_wait() call does not hurt (since all submitted jobs must be 
waitable), but the crucial part is the testing for the result of 
drmaa_synchronize(). After this change, I would expect the test cases to 
be successful also on your system. In case of malicious input / output / 
error files, the DRMAA implementation would only be expected to state a 
job failure. This should work for all GridWay-supported systems, right ? 
Could you accept this proposal ?

BTW: Condor is one example for a system where the existence of input 
files is checked before the job is started. But at least your GRAM 
example convinced me that the opposite is also true ;-) ...
...
Sure. The problem is that the code is not clear either. From DRMAA 1.0 C 
bindings example:
...
From this code it seems that a signaled job should end with a zero exited 
value from wifexited (as if it did not terminate normally), as opposed to 
your comments in the previous mails and the code in the DRMAA test suite.
You are right, as already said above. drmaa_wifexited() mainly indicates 
the availability of additional information.

Regards,
Peter.

Re: [drmaa-wg] DRMAA TEST SUITE

Peter Tröger