Re: [DRMAA-WG] drmaa python
Hello Cheng,
Thanks for getting an official python binding to drmaa! That's really great.
At some point I was using Enrico's python binding quite a lot, and wrote a wrapper on top of that to simplify some of the things that I did very often (attached). Basically I wanted a "generalized for loop", that would allow me to write more or less the same python code whether I was using the cluster or not.
The module uses the file system to transfer data between the submitter and the compute nodes, and it can also do local "parallel" computation using threading (however, I don't think threading is working particularly well at the moment).
Thanks for informing us about your efforts. In fact, we need every kind of feedback about implementations in the field. Adding a 'loopback mode' based on threads is of course a great idea for testing and debugging. Enrico somehow already agreed to give his updated implementation to the public. We thought about storing the sources on OGF resources. If this happens, you could simply jump in and add your extended features.
Also, I just recently moved institutions, so while my previous place was using gridengine, now my new place uses lsf (which unfortunately does not have a python binding to DRMAA yet). What do you think is the best strategy to implement a python binding? Is using swig on the C binding a good option?
Enrico's implementation will again use a given DRMAA C library, but in a more dynamic fashion without SWIG. This makes it a generic solution for every DRM providing a DRMAA C library. If you have enough detailed knowledge of LSF, you could also access LSF directly in a DRMAA Python implementation, either by command-line tools or some proprietary API. From a performance perspective, this might be a better solution than tunneling everything through DRMAA C. And it gives us a second independent implementation, which OGF would love to see.
Anyway, keep up the great work!
Thanks. I put the DRMAA list on CC, since this a community effort ;-) Best regards, Peter.
Hi all,
Also, I just recently moved institutions, so while my previous place was using gridengine, now my new place uses lsf (which unfortunately does not have a python binding to DRMAA yet). What do you think is the best strategy to implement a python binding? Is using swig on the C binding a good option?
Enrico's implementation will again use a given DRMAA C library, but in a more dynamic fashion without SWIG. This makes it a generic solution for every DRM providing a DRMAA C library.
If you have enough detailed knowledge of LSF, you could also access LSF directly in a DRMAA Python implementation, either by command-line tools or some proprietary API. From a performance perspective, this might be a better solution than tunneling everything through DRMAA C. And it gives us a second independent implementation, which OGF would love to see.
I don't have detailed knowledge of LSF, or for that matter any implementation of cluster systems. I know that FedStage has released a C DRMAA implementation which works on our system here. https://www.fedstage.com/wiki/FedStage_DRMAA_Guide I'll take a look at Enrico's python DRMAA implementation (when it is out) and see whether it will work with FedStage's C library. The suggestion about doing a direct implementation is interesting. I'm not sure I know enough about LSF or our customizations here to be able to do that. But it raises an issue that I've recently had. I imagine that many sites would have some sort of custom default resource requests (we definitely do here). My naive pythonic solution would be to have a base class which has the "basic" drmaa requests, and then a derived class which mirrors our local set of submit scripts so that jobs submitted via drmaa have identical rights (and default settings) to jobs submitted by command line. Would this be a good thing to do? I guess my question is more about "How does one customize drmaa for a particular site?". Or is it bad to have the class implementing "basic" drmaa requests, because the cluster system admins probably would like such jobs. Cheng
Hi,
I don't have detailed knowledge of LSF, or for that matter any implementation of cluster systems. I know that FedStage has released a C DRMAA implementation which works on our system here. https://www.fedstage.com/wiki/FedStage_DRMAA_Guide
Good to know that the FedStage library has such an active user community.
raises an issue that I've recently had. I imagine that many sites would have some sort of custom default resource requests (we definitely do here). My naive pythonic solution would be to have a base class which has the "basic" drmaa requests, and then a derived class which mirrors our local set of submit scripts so that jobs submitted via drmaa have identical rights (and default settings) to jobs submitted by command line. Would this be a good thing to do? I guess my question is more about "How does one customize drmaa for a particular site?". Or is it bad to have the class implementing "basic" drmaa requests, because the cluster system admins probably would like such jobs.
I think you problem is exactly solved by DRMAA job categories: "8.9 jobCategory This attribute allows an implementation-defined string specifying how to resolve site-specific resources and/or policies. Site administrators MAY create a job category suitable for an application to be dispatched by the DRMS; the associated category name SHALL be specified as a job submission attribute. The DRMAA implementation MAY then use the category name to manage site-specific resource and functional requirements of jobs in the category. Such requirements need to be configurable by the site operating a DRMS and deploying an application on top of it. " [GFD.130] The system-wide definition of available DRMAA job categories depends on the library implementation . In the Condor case, you can create a config file with category names and according submit file entries to be added. Best regards, Peter.
Hi Cheng, Il giorno 12/ott/08, alle ore 15:58, Cheng Soon Ong ha scritto:
I imagine that many sites would have some sort of custom default resource requests (we definitely do here). My naive pythonic solution would be to have a base class which has the "basic" drmaa requests, and then a derived class which mirrors our local set of submit scripts so that jobs submitted via drmaa have identical rights (and default settings) to jobs submitted by command line. Would this be a good thing to do? I guess my question is more about "How does one customize drmaa for a particular site?". Or is it bad to have the class implementing "basic" drmaa requests, because the cluster system admins probably would like such jobs.
I guess here you would have some solutions: 1. perform the configuration work outside drmaa (e.g. configuring different default parameters for different users/groups at the DRM level) 2. using the DRMAA library: * inheritance (this is what you outlined) here you derive a new job type and override some parameters, creating different job types for different environments. * composition create one (and only one) job type, put the various job configurations somewhere else in another data structure (e.g. dict) and pass it to the job in a standardized way (e.g. pass him a configuration dictionary at construnction time) Personally, I'd prefer 1 (zero-development). If this is not feasible, I'd go for composition: you are not changing the job behaviour, just its configuration parameters so I'd have these parameters sets stored somwhere (text files/relational DB/dicts in a python module - your choice) different from class attributes. Anyway all 3 approaches are effective, it just depends on your use case. Cheers, e.
Hi Peter, Il giorno 09/ott/08, alle ore 13:39, Peter Tröger ha scritto:
We thought about storing the sources on OGF resources. If this happens, you could simply jump in and add your extended features.
do you have a svn/git repository I can use? cheers, e.
participants (3)
-
Cheng Soon Ong -
Enrico Sirola -
Peter Tröger