Hi,
Sorry, no. Since the job never ran, there is also no termination - normal or abnormal. The ST_INPUT_FILE_FAILURE test case checks for a non-zero result from drmaa_wifaborted(), which indicates that the job was 'ended' before it entered the running state. I would expect that drmaa_wifexited() returns a zero value for "exited" in this case, since no further exit information can be available.
Sorry, I do not agree. In the DRMS context, job life cycle comprises all the job execution stages since the job enters the DRM system. In this sense, whenever a job is submitted there should be a termination (either it actually ran or not). I can give you an example, if you submit a job (qsub) and then you kill it (qdel), it is obvious that the job terminated abnormally (it has been killed), although the job never entered the running state. There is no relation between if the job terminated normally and if there is no further information from the DRM. In the previous example (a job that has been killed) could or could not be more information from the DRMS. But in any case, it is clear that the job terminated abnormally. drmaa_wifexited description should concentrate in one aspect since there is no obvious (or general) relation between job termination and getting further information from DRM.
( Note: The testsuite assumes here that unusable input files are detected by the DRM before the job starts. This seems to be realistic, since file staging operations are usually not part of the job execution.)
I do not think so. Usually job preparation stages are part of the job execution, for example: PBS: performs rcp o scp of input/ouput files. The existence of these files are not checked at submission (i.e. the job is queued). In the situation of the ST_ERROR_*_FAILURE the jobs go through the following states: Q->R->E. This is, even if the input file does not exist the job goes through the running state. SGE: Also does not check input or output file existence or permissions. In fact, in the situation of the ST_ERROR_*_FAILURE the jobs go through the following states: qw->r->Eqw. This is, even if the input file does not exist the job goes through the running state. In this case you can use qalter command to redirect output paths. Globus: includes stage-in/stage-out steps in the GRAM protocol, file existence or not check at submission either. Therefore I suggest removing the ST_ERROR_INPUT_FAIURE, ST_ERROR_FILE_FAILURE and ST_ERROR_FILE_FAILURE from the official test suite. In the previous DRMs at least, you can submit a job with output file /etc/passwd or an unusable input file , the job is queued, runs and fails. This assumption should be agreed as it is not the default behavior of DRMs.
The DRMAA job status monitoring functions are a little bit confusing, sometimes even for the group members ;-) ... The best thing is to look at the code example from the C binding, which should explain most of the intended use cases for DRMAA functions.
Sure. The problem is that the code is not clear either. From DRMAA 1.0 C bindings example: ... drmaa_wifexited(&exited, stat, NULL, 0); if (exited) { drmaa_wexitstatus(&exit_status, stat, NULL, 0); printf("job \"%s\" finished regularly with exit status %d\n", all_jobids[pos], exit_status); } else { drmaa_wifsignaled(&signaled, stat, NULL, 0); if (signaled) { ... From this code it seems that a signaled job should end with a zero exited value from wifexited (as if it did not terminate normally), as opposed to your comments in the previous mails and the code in the DRMAA test suite.
Best regards, Peter.
Best Regards, Ruben -- +-----------------------------------------------------------+ Dr. Ruben Santiago Montero Assistant Professor Dpto. Arquitectura de Computadores y Automatica Facultad de Informatica Universidad Complutense phone : +34 91 394 75 38 28040 Madrid fax : +34 91 394 75 27 Spain email : rubensm@dacya.ucm.es http://asds.dacya.ucm.es/ +-----------------------------------------------------------+ GridWay, The Way to Grid! http://www.gridway.org