RE: [drmaa-wg] perl DRMAA, SGE and working directory
Hi, If we look at the string /home/msarachu/.showdb.04.12.09:17.37.42 and consider the following expression (we need to add this syntax into the spec for the working directory as well) [hostname]:file_path then it is not surprising for the runtime to look for directory 17.37.42. Unfortunately, the second error, not being able to find host /home/msarachu/.showdb.04.12.09 is not displayed. Hope this helps from the standard point of view. Regards, -Hrabri -----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Martín Sarachu Sent: Tuesday, December 14, 2004 8:28 AM To: drmaa-wg@ggf.org Subject: [drmaa-wg] perl DRMAA, SGE and working directory Dear list, I'm using Schedule-DRMAAc-0.81 and SGE to be able to queue jobs from a web interface. Here's my problem: When launching a job with something like /home/msarachu as the $DRMAA_WD it runs ok, but when using a directory like /home/msarachu/.showdb.04.12.09:17.37.42 as $DRMAA_WD the script does not run and the error reported by SGE is "28 : changing into working directory". I also passed the directory "escaped" (\.showdb\.04\.12\.09\:17\.37\.42) and got the same error, although passing the string "/home/msarachu/.showdb.04.12.09:17.37.42/job.sh" to the $DRMAA_REMOTE_COMMAND argument works fine because the job is sent to the queue. Is there any way to mask this directory so it changes ok to the working directory? Below is an email from a failed job I tried to run with DRMAA_WD = /home/msarachu/wProjects/tope/.showdb.04.12.14:16.30.47 Look at the sheperd error, apparently is truncating the dir just before the : If I submit the job from /home/msarachu/wProjects/tope/.showdb.04.12.14:16.30.47 with command 'qsub -cwd job.sh' it works ok. ----- Job 123 caused action: Job 123 set to ERROR User = msarachu Queue = all.q@pentiumIV.embnet-ar.org Host = pentiumIV.embnet-ar.org Start Time = <unknown> End Time = <unknown> failed changing into working directory:can't read usage file for job 123.1 Shepherd trace: 12/13/2004 16:31:04 [502:24622]: shepherd called with uid = 0, euid = 502 12/13/2004 16:31:04 [502:24622]: starting up 6.0u1 12/13/2004 16:31:04 [502:24622]: setpgid(24622, 24622) returned 0 12/13/2004 16:31:04 [502:24622]: no prolog script to start 12/13/2004 16:31:04 [502:24623]: pid=24623 pgrp=24623 sid=24623 old pgrp=24622 getlogin()=<no login set> 12/13/2004 16:31:04 [502:24623]: setosjobid: uid = 0, euid = 502 12/13/2004 16:31:04 [502:24623]: RLIMIT_CPU setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_FSIZE setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_DATA setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_STACK setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_CORE setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_RSS setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [500:24623]: closing all filedescriptors 12/13/2004 16:31:04 [500:24623]: further messages are in "error" and "trace" 12/13/2004 16:31:04 [502:24622]: forked "job" with pid 24623 12/13/2004 16:31:04 [502:24622]: child: job - pid: 24623 12/13/2004 16:31:04 [502:24622]: wait3 returned 24623 (status: 7168; WIFSIGNALED: 0, WIFEXITED: 1, WEXITSTATUS: 28) 12/13/2004 16:31:04 [502:24622]: job exited with exit status 28 12/13/2004 16:31:04 [502:24622]: reaped "job" with pid 24623 12/13/2004 16:31:04 [502:24622]: job exited not due to signal 12/13/2004 16:31:04 [502:24622]: job exited with status 28 12/13/2004 16:31:04 [502:24622]: now sending signal KILL to pid -24623 12/13/2004 16:31:04 [502:24622]: no tasker to notify 12/13/2004 16:31:04 [502:24622]: failed starting job 12/13/2004 16:31:04 [502:24622]: no epilog script to start Shepherd error: 12/13/2004 16:31:04 [500:24623]: error: can't chdir to :16.30.47: No such file or directory Shepherd pe_hostfile: pentiumIV.embnet-ar.org 1 all.q@pentiumIV.embnet-ar.org UNDEFINED ----- I sent this same email to Tim and also SGE users list. Tim also suggested to send it to this list. Thanks in advance. Best regards, Martin -- Martín Sarachu msarachu@biol.unlp.edu.ar EMBnet Argentina http://www.ar.embnet.org
So you can also specfiy a host in DRMAA_WD variable? Not only a directory?
Martin
Quoting "Rajic, Hrabri"
Hi,
If we look at the string /home/msarachu/.showdb.04.12.09:17.37.42 and consider the following expression (we need to add this syntax into the spec for the working directory as well)
[hostname]:file_path
then it is not surprising for the runtime to look for directory 17.37.42. Unfortunately, the second error, not being able to find host /home/msarachu/.showdb.04.12.09 is not displayed.
Hope this helps from the standard point of view.
Regards, -Hrabri
-----Original Message----- From: owner-drmaa-wg@ggf.org [mailto:owner-drmaa-wg@ggf.org] On Behalf Of Martín Sarachu Sent: Tuesday, December 14, 2004 8:28 AM To: drmaa-wg@ggf.org Subject: [drmaa-wg] perl DRMAA, SGE and working directory
Dear list,
I'm using Schedule-DRMAAc-0.81 and SGE to be able to queue jobs from a web interface.
Here's my problem: When launching a job with something like /home/msarachu as the $DRMAA_WD it runs ok, but when using a directory like /home/msarachu/.showdb.04.12.09:17.37.42 as $DRMAA_WD the script does not run and the error reported by SGE is "28 : changing into working directory". I also passed the directory "escaped" (\.showdb\.04\.12\.09\:17\.37\.42) and got the same error, although passing the string "/home/msarachu/.showdb.04.12.09:17.37.42/job.sh" to the $DRMAA_REMOTE_COMMAND argument works fine because the job is sent to the queue. Is there any way to mask this directory so it changes ok to the working directory?
Below is an email from a failed job I tried to run with DRMAA_WD = /home/msarachu/wProjects/tope/.showdb.04.12.14:16.30.47 Look at the sheperd error, apparently is truncating the dir just before the :
If I submit the job from /home/msarachu/wProjects/tope/.showdb.04.12.14:16.30.47 with command 'qsub -cwd job.sh' it works ok.
----- Job 123 caused action: Job 123 set to ERROR User = msarachu Queue = all.q@pentiumIV.embnet-ar.org Host = pentiumIV.embnet-ar.org Start Time = <unknown> End Time = <unknown> failed changing into working directory:can't read usage file for job 123.1
Shepherd trace: 12/13/2004 16:31:04 [502:24622]: shepherd called with uid = 0, euid = 502 12/13/2004 16:31:04 [502:24622]: starting up 6.0u1 12/13/2004 16:31:04 [502:24622]: setpgid(24622, 24622) returned 0 12/13/2004 16:31:04 [502:24622]: no prolog script to start 12/13/2004 16:31:04 [502:24623]: pid=24623 pgrp=24623 sid=24623 old pgrp=24622 getlogin()=<no login set> 12/13/2004 16:31:04 [502:24623]: setosjobid: uid = 0, euid = 502 12/13/2004 16:31:04 [502:24623]: RLIMIT_CPU setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_FSIZE setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_DATA setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_STACK setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_CORE setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [502:24623]: RLIMIT_RSS setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295) 12/13/2004 16:31:04 [500:24623]: closing all filedescriptors 12/13/2004 16:31:04 [500:24623]: further messages are in "error" and "trace" 12/13/2004 16:31:04 [502:24622]: forked "job" with pid 24623 12/13/2004 16:31:04 [502:24622]: child: job - pid: 24623 12/13/2004 16:31:04 [502:24622]: wait3 returned 24623 (status: 7168; WIFSIGNALED: 0, WIFEXITED: 1, WEXITSTATUS: 28) 12/13/2004 16:31:04 [502:24622]: job exited with exit status 28 12/13/2004 16:31:04 [502:24622]: reaped "job" with pid 24623 12/13/2004 16:31:04 [502:24622]: job exited not due to signal 12/13/2004 16:31:04 [502:24622]: job exited with status 28 12/13/2004 16:31:04 [502:24622]: now sending signal KILL to pid -24623 12/13/2004 16:31:04 [502:24622]: no tasker to notify 12/13/2004 16:31:04 [502:24622]: failed starting job 12/13/2004 16:31:04 [502:24622]: no epilog script to start
Shepherd error: 12/13/2004 16:31:04 [500:24623]: error: can't chdir to :16.30.47: No such file or directory
Shepherd pe_hostfile: pentiumIV.embnet-ar.org 1 all.q@pentiumIV.embnet-ar.org UNDEFINED -----
I sent this same email to Tim and also SGE users list. Tim also suggested to send it to this list.
Thanks in advance.
Best regards,
Martin
-- Martín Sarachu msarachu@biol.unlp.edu.ar EMBnet Argentina http://www.ar.embnet.org
-- Martín Sarachu msarachu@biol.unlp.edu.ar EMBnet Argentina http://www.ar.embnet.org
On Tue, 14 Dec 2004, Martín Sarachu wrote:
So you can also specfiy a host in DRMAA_WD variable? Not only a directory?
No you can't. I'd expect it's simply a bug with Grid Engine DRMAA implementation that characters past colons in DRMAA_WD are cut away. Andreas
participants (3)
-
Andreas Haas
-
Martín Sarachu
-
Rajic, Hrabri