Re: Patch to python DRMAA wrapper for thread usage
Hi Chuck, all I understand is that GIL is a multi-threading related Python monitor concept. So I fear you're asking the wrong person ;-) Yet, it might be of interest for you, that Grid Engine jobs get rescheduled automatically, if they return exit status 99 (see 'FORBID_RESCHEDULE' in sge_conf(5)). Note however, that use of exit 99 makes your solution Grid Engine dependent. Regards, Andreas On Mon, 10 Apr 2006, Chuck Fox wrote:
Hey Andreas & Enrico, I am doing a project at school where I have a bunch of jobs that will run asynchronously, and then need to get restarted immediately upon exiting. The way I found to do this was to enclose each job in a python thread and then just wait on it to return. Another method that might be a little more scalable would be to store the job IDs in a list and then do non-blocking polls over the list (or use the JOB_IDS_SESSION_ANY sting in a non-blocking way, haven't tried that yet). I'm a big fan of threads in general for self-contained items that don't require lots of interaction & locking with the rest of the program. Of course the GIL makes things a little more complicated but in Python I keep threads as low-intensity as possible so they are still useful for getting stuff to run asynchronously. Thanks for the heads up on the other thread safety issues in the wrapper. I had read about the underlying library and saw that it was threadsafe, but I definitely don't know enough about the wrapper and its thread issues. I'll let you & Enrico know more in case I run across anything else. I'm going to be using the interface pretty heavily so hopefully I'll find any outstanding issues. Thanks for your help -- Chuck
On 4/10/06, Andreas Haas <Andreas.Haas@sun.com> wrote:
Dear Chuck,
well, all I did was uploading the wrapper that stems from: Enrico Sirola.
As far as Grid Engine DRMAA library is concerned, I can approve the patch will work, since the lib itself is MT-safe. For the same reasons I would say that patch could be applied to drmaa_synchronize() and any other DRMAA library call.
Yet, I would assume there is a need to do further modifications with cDRMAA_wrap.c module. E.g. in SWIG_Python_ConvertPtr() it makes a of a static variable
static PyObject *SWIG_this = 0;
accessed through
if (!SWIG_this) SWIG_this = PyString_FromString("this");
that race condition that might crash the library.
To overcome this you could use a pthread_once() wrapping that does nothing but
if (!SWIG_this) SWIG_this = PyString_FromString("this");
if the wrapper is called via pthread_once() always before 'SWIG_this' is being accessed it would fix that problem.
Unfortunately I'm not familar with deep mysteries of SWIG-wrapping libraries, so I can't say whether that is actually sufficient.
Best regards, Andreas
On Sat, 8 Apr 2006, Chuck Fox wrote:
Hi Andreas, My name is Chuck Fox and I am doing some work with Python & SGE and I've been using your DRMAA wrapper (thanks for writing it!). I am in a situation where I run different jobs in different threads in a Python program and I often have threads that are doing blocking waits on jobs that I submitted. I noticed that when I ran a blocking wait, all the other threads in my program would block until the submitted job would come to an end... not what we want with multiple threads. I'm not the world's greatest expert on the Python C API but I know a little bit about the GIL and I tried wrapping your call to drmaa_wait in the cDRMAA_wrap.c file. I just added the macros: Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS around the wrapper call to drmaa_wait and this solved my problem. If there are other blocking calls you can think of with DRMAA (like synchronize) you may want to use this technique on them too, I haven't tested them out yet. I guess the biggest problem is that I modified the wrapper output of SWIG which I know is not considered the best way to do things. If you know how to get the GIL release macros into the .i file itself that might be the best method to use in the next release.
I have my simple patch below:
1753,1754c1753,1758 < result = (int)drmaa_wait((char const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8); < ---
/** Update: By Chuck: In case of a blocking wait a single wait can
hang all of python due to the GIL, we need
to use Python's macros to allow other threads to run while
this blocking wait occurs **/
Py_BEGIN_ALLOW_THREADS result = (int)drmaa_wait((char const
*)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8);
Py_END_ALLOW_THREADS
participants (1)
-
Andreas Haas