
Gargi B Dasgupta wrote:
My question is: In case of a workflow, with multiple job steps, and data transferring in and out between these steps, what is the natural home for these data staging elements.
Is it in the workflow or in the individual JSDL elements. Will the Data Staging elements remain in JSDL and then transfers needed in the workflow be inferred from these JSDLs. Or will the staging be explicitly defined in the flow.
Excellent question. Ideally, I'd imagine that they'd be in the flow, especially as transfer of a large dataset could take a long time (and considerable resources) in itself. JSDL deliberately doesn't do this, and this is because it doesn't assume the presence of an outer flow language (some use-cases specifically require data transfers in order for even a simple job to run, so they have to be there). If you've got such an outer language, do put the transfers there (though you could also use a JSDL document containing just a DataStaging section to describe the transfers, resulting in the flow language just joining up a bunch of JSDL docs). Donal Fellows.