Dear core members, in order to have any chance for some kind of finalized API until OGF28, we need to distribute the work now. I need volunteers to independently take care of the following topics. We mainly talk about man page crawling and text snippet writing: --- snip ---- New job template attributes. This includes the continuation of DRM system comparison we already started in the Google spreadsheet. It must also include the hunt for new job template placeholders, since this was one of the major wishes from the survey. File staging. We need the exhaustive description of how the new fileTransfers attribute should be used. Order of activities and security might be topics here. Ideally, somebody with a GridFTP background can also contribute. The initial idea was to copy from SAGA / LSF, so if you already know at least one of these systems, please volunteer. Advanced reservation. The API is in the IDL part, but we still need the detailed functions descriptions. Mariusz, could you take care of this ? Job states. There is a set of wishes regarding more job states and transitions. We also have a pending mapping to other peoples job stage models. Thread safety. Somebody with a strong DRMAA implementation background needs to scan his implementation for critical (and non-critical) code parts with respect to thread safety. All experiences should be persisted in the new spec. C binding. We spent some time in the OO world, somebody needs to try a C language binding. GFD.143 shows how such a document can look like. Roger, maybe something for you ? --- snip ---- All information is the Wiki. Start on this page, follow the links, and look for yellow boxes: http://wikis.sun.com/display/DRMAAv2/DRMAAv2+API I will continue to work as the integration / coordination point. Maybe I can also take care of one or the other specific problem, but definitely not for all of them. Our deadline is March 15th, then we will present to the other groups. Thanks, Peter.
Hi Peter, all, 2010/2/15 Peter Tröger <peter@troeger.eu>:
Dear core members,
in order to have any chance for some kind of finalized API until OGF28, we need to distribute the work now. I need volunteers to independently take care of the following topics. We mainly talk about man page crawling and text snippet writing:
--- snip ----
New job template attributes.
This includes the continuation of DRM system comparison we already started in the Google spreadsheet. It must also include the hunt for new job template placeholders, since this was one of the major wishes from the survey.
File staging.
We need the exhaustive description of how the new fileTransfers attribute should be used. Order of activities and security might be topics here. Ideally, somebody with a GridFTP background can also contribute. The initial idea was to copy from SAGA / LSF, so if you already know at least one of these systems, please volunteer.
I did some "research" of the file staging capabilities in the LSF, Torque and the PBS Pro systems (you can find a short summary in the Google Spreadsheet - the "File Staging" tab). Some thoughts: - most of the batch systems were designed to work with network shared file system. However one can imagine a situation when the home file system for some reasons is not shared among the worker nodes (in my opinion this is probably most relevant for the Condor, as it is focused on harvesting idle cpu cycles of workstations rather than managing dedicated cluster). Most of the system offers in this case some simple file staging capabilities (mostly based on rcp/scp). I guess that in Grid Engine this can be also implemented (if really needed) by proper prolog/epilog scripts (assuming that list of files to be staged is provided as environment variable in similar fashion as stdin/out/err staging could be realized in DRMAA 1.0) - i would prefer to keep this interface as simple as possible and thus handle the only case that can not be handled without interaction with DRMS: staging file from submission host to execution host, as the execution host is usually not know before a job starts. - staging files using some other protocols (e.g. ftp, webdav) would require passing credentials explicitly (except gridftp), what is "out of scope of the DRMAA spec". If needed user can in first step stage all files to submission host using some other tools, and finally, using DRMAA, to execution host. - in order to keep the interface really simply i would assume that file names (not necessary the absolute paths) are the same both on execution host and submission host. (again if not, the user can do easily some workaround by copying/moving the file on submission host). So my proposition of the DRMAA staging interface looks as follows: split "fileTransfers" attribute into two attributes (also of the OrderedStringList type): - stageInFiles - stageOutFiles which are simple list of files to be staged-in/staged-out (no URLs, only paths). The paths can be relative (to current working directory on submission host, and job working directory on execution host). I don't want to be blindfolded with batch systems use cases (there are one grid implementation for DRMAA 1.0), so if at least one person complain i have nothing against staying with the fileTransfers attribute which operates on full URLs.
Advanced reservation.
The API is in the IDL part, but we still need the detailed functions descriptions. Mariusz, could you take care of this ?
ok, i will handle the advance reservations part of the DRMAA spec.
Job states.
There is a set of wishes regarding more job states and transitions. We also have a pending mapping to other peoples job stage models.
Thread safety.
Somebody with a strong DRMAA implementation background needs to scan his implementation for critical (and non-critical) code parts with respect to thread safety. All experiences should be persisted in the new spec.
C binding.
We spent some time in the OO world, somebody needs to try a C language binding. GFD.143 shows how such a document can look like. Roger, maybe something for you ?
--- snip ----
All information is the Wiki. Start on this page, follow the links, and look for yellow boxes:
http://wikis.sun.com/display/DRMAAv2/DRMAAv2+API
I will continue to work as the integration / coordination point. Maybe I can also take care of one or the other specific problem, but definitely not for all of them. Our deadline is March 15th, then we will present to the other groups.
Thanks, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
Cheers, -- Mariusz
Hi,
Some thoughts: - most of the batch systems were designed to work with network shared file system. However one can imagine a situation when the home file
- i would prefer to keep this interface as simple as possible and thus handle the only case that can not be handled without interaction with DRMS: staging file from submission host to execution host, as the execution host is usually not know before a job starts.
This brings the new file staging approach closer to what we had in DRMAAv1. The unknown execution host is a very valid argument, since we implicitly assume that staging happens before job start. Your research also shows that at LSF can only copy from / to the submission host, which definitely kills the idea of free server transfers. I can live with that.
- in order to keep the interface really simply i would assume that file names (not necessary the absolute paths) are the same both on execution host and submission host. (again if not, the user can do easily some workaround by copying/moving the file on submission host).
Interesting. So the idea is to stage only whole directories ? What is if I only want to move the STDIN file, and nothing else ?
So my proposition of the DRMAA staging interface looks as follows: split "fileTransfers" attribute into two attributes (also of the OrderedStringList type): - stageInFiles - stageOutFiles which are simple list of files to be staged-in/staged-out (no URLs, only paths). The paths can be relative (to current working directory on submission host, and job working directory on execution host).
I like that, despite the fact that I would like to see single file / wildcard support as discussed last time in the phone call. Let's find some agreement in todays phone call. Best, Peter.
Hi, 2010/2/17 Peter Tröger <peter@troeger.eu>:
Hi,
Some thoughts: - most of the batch systems were designed to work with network shared file system. However one can imagine a situation when the home file
- i would prefer to keep this interface as simple as possible and thus handle the only case that can not be handled without interaction with DRMS: staging file from submission host to execution host, as the execution host is usually not know before a job starts.
This brings the new file staging approach closer to what we had in DRMAAv1. The unknown execution host is a very valid argument, since we implicitly assume that staging happens before job start. Your research also shows that at LSF can only copy from / to the submission host, which definitely kills the idea of free server transfers. I can live with that.
- in order to keep the interface really simply i would assume that file names (not necessary the absolute paths) are the same both on execution host and submission host. (again if not, the user can do easily some workaround by copying/moving the file on submission host).
Interesting. So the idea is to stage only whole directories ? What is if I only want to move the STDIN file, and nothing else ?
sorry, i should be more clear on this. I meant that we should support only the minimum functionality: * staging single files (LSF and Torque for e.g. do not support staging directories *by default*), so no directories and no wildard expressions would be supported ( it could be supported in sense of MAY[rfc2119] keyword ;-). * the file name must be the same both on execution and submission host (however absolute paths may differ, e.g. if relative paths are given) With this basic tool, user can implement (if needed, let me remind that in my opinion the non shared file system is quite rare case) - wildcards on stage-in (by evaluating the wildard before submission and giving explicit list of files) - directories on stage-in (by listing the directory on submission host and giving explicit list of files) - different file names on submission/execution host (e.g. we want the file name "foo" to be copied into "bar" file to the execution host - move/copy file priori the submission) - wildcards/directories on stage-out (one possible solution is to zip all results files in the job's script)
So my proposition of the DRMAA staging interface looks as follows: split "fileTransfers" attribute into two attributes (also of the OrderedStringList type): - stageInFiles - stageOutFiles which are simple list of files to be staged-in/staged-out (no URLs, only paths). The paths can be relative (to current working directory on submission host, and job working directory on execution host).
I like that, despite the fact that I would like to see single file / wildcard support as discussed last time in the phone call.
see above.
Let's find some agreement in todays phone call.
Best, Peter.
Cheers, -- Mariusz
participants (2)
-
Mariusz Mamoński -
Peter Tröger