Hi, 2010/2/17 Peter Tröger <peter@troeger.eu>:
Hi,
Some thoughts: - most of the batch systems were designed to work with network shared file system. However one can imagine a situation when the home file
- i would prefer to keep this interface as simple as possible and thus handle the only case that can not be handled without interaction with DRMS: staging file from submission host to execution host, as the execution host is usually not know before a job starts.
This brings the new file staging approach closer to what we had in DRMAAv1. The unknown execution host is a very valid argument, since we implicitly assume that staging happens before job start. Your research also shows that at LSF can only copy from / to the submission host, which definitely kills the idea of free server transfers. I can live with that.
- in order to keep the interface really simply i would assume that file names (not necessary the absolute paths) are the same both on execution host and submission host. (again if not, the user can do easily some workaround by copying/moving the file on submission host).
Interesting. So the idea is to stage only whole directories ? What is if I only want to move the STDIN file, and nothing else ?
sorry, i should be more clear on this. I meant that we should support only the minimum functionality: * staging single files (LSF and Torque for e.g. do not support staging directories *by default*), so no directories and no wildard expressions would be supported ( it could be supported in sense of MAY[rfc2119] keyword ;-). * the file name must be the same both on execution and submission host (however absolute paths may differ, e.g. if relative paths are given) With this basic tool, user can implement (if needed, let me remind that in my opinion the non shared file system is quite rare case) - wildcards on stage-in (by evaluating the wildard before submission and giving explicit list of files) - directories on stage-in (by listing the directory on submission host and giving explicit list of files) - different file names on submission/execution host (e.g. we want the file name "foo" to be copied into "bar" file to the execution host - move/copy file priori the submission) - wildcards/directories on stage-out (one possible solution is to zip all results files in the job's script)
So my proposition of the DRMAA staging interface looks as follows: split "fileTransfers" attribute into two attributes (also of the OrderedStringList type): - stageInFiles - stageOutFiles which are simple list of files to be staged-in/staged-out (no URLs, only paths). The paths can be relative (to current working directory on submission host, and job working directory on execution host).
I like that, despite the fact that I would like to see single file / wildcard support as discussed last time in the phone call.
see above.
Let's find some agreement in todays phone call.
Best, Peter.
Cheers, -- Mariusz