[Fwd: [GE users] Drmaa_synchronise() and qsub-launched jobs]
Here's another bit that could be clearer in the spec. We don't actually specify what happens on drmaa_synchronize() if the jobs don't exist. In the SGE implementation, we return immediately, as though the jobs are all finished. The drmaa_wait() call only implies that it should fail if the job doesn't exist. Both routine descriptions should have a very clear statement describing the behavior when the specified job(s) doesn't exist. We also should determine what the behavior for drmaa_synchronize() should be. I vote for the same as for drmaa_wait(), i.e. waiting for a list of jobs that contains invalid (or already waited for) job ids will fail. The exception would be waiting for all jobs, but we've already had that conversation. Daniel -------- Original Message -------- Subject: [GE users] Drmaa_synchronise() and qsub-launched jobs Date: Tue, 12 Jul 2005 15:00:46 -0700 From: Anthony Metzidis <Anthony.Metzidis@pricegrabber.com> Reply-To: users@gridengine.sunsource.net To: users@gridengine.sunsource.net Hi, Does drmaa_synchronise() wait on jobs launched using qsub? I'm launching many jobs with qsub, and then later collecting the corresponding job ids and passing those to drmaa_synchronise(). Although drmaa_synchronise() returns without error, it returns before the jobs are complete. Am I to assume that drmaa_synchronise() only works with jobs launched with drmaa_run_job()? == Code Sample (PERL) == # for writing to qsub use IPC::Open2; # for timing use Time::HiRes; use Schedule::DRMAAc qw/ :all /; my $job = "md5sum /etc/fstab"; @alljobs = (); ( $error, $diagnosis ) = drmaa_init( undef ); die drmaa_strerror( $error ) . "\n" . $diagnosis if $error; foreach $i(0..5){ open(QSUB_IN); open(QSUB_OUT); my $pid = open2(\*QSUB_OUT, \*QSUB_IN, 'qsub', '-cwd', '-S', '/bin/bash', '-N', 'The job' ); print QSUB_IN $job; close(QSUB_IN); my $out = <QSUB_OUT>; close(QSUB_OUT); ($job_id, $job_name) = ($out =~ /job\s+(\d+)\s+\("(.*)"\)/); push @alljobs, $job_id; } print 'JOBS: [', join '][', @alljobs, "]\n"; ( $error, $diagnosis ) = drmaa_synchronize( \@alljobs, $DRMAA_ERRNO_EXIT_TIMEOUT, 0 ); die drmaa_strerror( $error ) . "\n" . $diagnosis if $error; --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@gridengine.sunsource.net For additional commands, e-mail: users-help@gridengine.sunsource.net -- *************************************************** * Daniel Templeton ERGB01 x60220 * * Staff Engineer, Sun N1 Grid Engine * *************************************************** * "Roads? Where we're going we don't need roads." * * -Dr. Emmett Brown * * Back to the Future (1985) * ***************************************************
Consistency is good, so I support your suggestion. Tracker item is here: https://forge.gridforum.org/tracker/?aid=1568 Regards, Peter. Daniel Templeton schrieb:
Here's another bit that could be clearer in the spec. We don't actually specify what happens on drmaa_synchronize() if the jobs don't exist. In the SGE implementation, we return immediately, as though the jobs are all finished. The drmaa_wait() call only implies that it should fail if the job doesn't exist. Both routine descriptions should have a very clear statement describing the behavior when the specified job(s) doesn't exist. We also should determine what the behavior for drmaa_synchronize() should be. I vote for the same as for drmaa_wait(), i.e. waiting for a list of jobs that contains invalid (or already waited for) job ids will fail. The exception would be waiting for all jobs, but we've already had that conversation.
Daniel
-------- Original Message -------- Subject: [GE users] Drmaa_synchronise() and qsub-launched jobs Date: Tue, 12 Jul 2005 15:00:46 -0700 From: Anthony Metzidis <Anthony.Metzidis@pricegrabber.com> Reply-To: users@gridengine.sunsource.net To: users@gridengine.sunsource.net
Hi,
Does drmaa_synchronise() wait on jobs launched using qsub? I'm launching many jobs with qsub, and then later collecting the corresponding job ids and passing those to drmaa_synchronise(). Although drmaa_synchronise() returns without error, it returns before the jobs are complete.
Am I to assume that drmaa_synchronise() only works with jobs launched with drmaa_run_job()?
== Code Sample (PERL) == # for writing to qsub use IPC::Open2; # for timing use Time::HiRes; use Schedule::DRMAAc qw/ :all /; my $job = "md5sum /etc/fstab";
@alljobs = (); ( $error, $diagnosis ) = drmaa_init( undef ); die drmaa_strerror( $error ) . "\n" . $diagnosis if $error; foreach $i(0..5){ open(QSUB_IN); open(QSUB_OUT);
my $pid = open2(\*QSUB_OUT, \*QSUB_IN, 'qsub', '-cwd', '-S', '/bin/bash', '-N', 'The job' );
print QSUB_IN $job; close(QSUB_IN);
my $out = <QSUB_OUT>; close(QSUB_OUT); ($job_id, $job_name) = ($out =~ /job\s+(\d+)\s+\("(.*)"\)/);
push @alljobs, $job_id;
}
print 'JOBS: [', join '][', @alljobs, "]\n"; ( $error, $diagnosis ) = drmaa_synchronize( \@alljobs, $DRMAA_ERRNO_EXIT_TIMEOUT, 0 ); die drmaa_strerror( $error ) . "\n" . $diagnosis if $error;
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@gridengine.sunsource.net For additional commands, e-mail: users-help@gridengine.sunsource.net
participants (2)
-
Daniel Templeton -
Peter Troeger