
Donal, Donal K. Fellows schrieb:
Gargi B Dasgupta wrote:
My question is: In case of a workflow, with multiple job steps, and data transferring in and out between these steps, what is the natural home for these data staging elements.
Is it in the workflow or in the individual JSDL elements. Will the Data Staging elements remain in JSDL and then transfers needed in the workflow be inferred from these JSDLs. Or will the staging be explicitly defined in the flow.
Excellent question.
Ideally, I'd imagine that they'd be in the flow, especially as transfer of a large dataset could take a long time (and considerable resources) in itself. JSDL deliberately doesn't do this, and this is because it doesn't assume the presence of an outer flow language (some use-cases specifically require data transfers in order for even a simple job to run, so they have to be there). If you've got such an outer language, do put the transfers there (though you could also use a JSDL document containing just a DataStaging section to describe the transfers, resulting in the flow language just joining up a bunch of JSDL docs).
yes, but handling stuff completely outside impairs the possibility of using JSDL filesystems, which is a very interesting feature especially for POSIX jobs along with using environment variables. Maybe it makes sense to handle file naming and referencing in JSDL while keeping track of workflow I/O dependencies outside, probably doubling something. Regards, -- Dipl.-Inform. Alexander Papaspyrou | 44221 Dortmund, NRW (Germany) Robotics Research Institute | phone : +49(231)755-5058 Information Technology Section | fax : +49(231)755-3251 Dortmund University | web : http://it.irf.de/~alexp