Regarding Data Staging Elements in JSDL

Hi all: I have a question related to the Data Staging Element in JSDL.
From my understanding JSDL captures scheduling requirements for individual jobs. Also the Data Staging elements in JSDL specifies the input and output data needed/created by an application.
Since JSDL does not capture inter-job dependencies, to define a workflow I would be using a flow language. My question is: In case of a workflow, with multiple job steps, and data transferring in and out between these steps, what is the natural home for these data staging elements. Is it in the workflow or in the individual JSDL elements. Will the Data Staging elements remain in JSDL and then transfers needed in the workflow be inferred from these JSDLs. Or will the staging be explicitly defined in the flow. Would like to know the group's thoughts on this. Thanks & Regards, Gargi Gargi B Dasgupta IBM India Research Lab Plot No. 4, Phase II, Block C ISID Institutional Area Vasant Kunj, New Delhi 110070 Email:gdasgupt@in.ibm.com Phone:+91 11 51292192

Gargi B Dasgupta wrote:
My question is: In case of a workflow, with multiple job steps, and data transferring in and out between these steps, what is the natural home for these data staging elements.
Is it in the workflow or in the individual JSDL elements. Will the Data Staging elements remain in JSDL and then transfers needed in the workflow be inferred from these JSDLs. Or will the staging be explicitly defined in the flow.
Excellent question. Ideally, I'd imagine that they'd be in the flow, especially as transfer of a large dataset could take a long time (and considerable resources) in itself. JSDL deliberately doesn't do this, and this is because it doesn't assume the presence of an outer flow language (some use-cases specifically require data transfers in order for even a simple job to run, so they have to be there). If you've got such an outer language, do put the transfers there (though you could also use a JSDL document containing just a DataStaging section to describe the transfers, resulting in the flow language just joining up a bunch of JSDL docs). Donal Fellows.

Hi Gargi, if you've got a workflow framework to use, use it for the data staging (either in or out, or both). If you're using JSDL for atomic job submissions and not wrapping a bunch of them with a workflow, then use the data staging elements provided by JSDL. You could also use the JSDL data staging elements if you want to hide atomic job related data movement from the workflow layer. Cheers and take care, Ali Donal K. Fellows wrote:
Gargi B Dasgupta wrote:
My question is: In case of a workflow, with multiple job steps, and data transferring in and out between these steps, what is the natural home for these data staging elements.
Is it in the workflow or in the individual JSDL elements. Will the Data Staging elements remain in JSDL and then transfers needed in the workflow be inferred from these JSDLs. Or will the staging be explicitly defined in the flow.
Excellent question.
Ideally, I'd imagine that they'd be in the flow, especially as transfer of a large dataset could take a long time (and considerable resources) in itself. JSDL deliberately doesn't do this, and this is because it doesn't assume the presence of an outer flow language (some use-cases specifically require data transfers in order for even a simple job to run, so they have to be there). If you've got such an outer language, do put the transfers there (though you could also use a JSDL document containing just a DataStaging section to describe the transfers, resulting in the flow language just joining up a bunch of JSDL docs).
Donal Fellows. -- jsdl-wg mailing list jsdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/jsdl-wg

Thanks for both the responses. So if I have a atomic job, I could use the JSDL staging elements. For a workflow, I have the choice of doing it at either place. Is the JSDL-wg also looking at data requirements of jobs for other than file type data. For ex. is there scope of representing a database connection required by a job in JSDL. Thanks Gargi Gargi B Dasgupta IBM India Research Lab Plot No. 4, Phase II, Block C ISID Institutional Area Vasant Kunj, New Delhi 110070 Email:gdasgupt@in.ibm.com Phone:+91 11 51292192 Ali Anjomshoaa <aanjomshoaa@mobability.co.uk> 04/11/2007 02:26 PM To "Donal K. Fellows" <donal.k.fellows@manchester.ac.uk> cc Gargi B Dasgupta/India/IBM@IBMIN, jsdl-wg@ogf.org Subject Re: [jsdl-wg] Regarding Data Staging Elements in JSDL Hi Gargi, if you've got a workflow framework to use, use it for the data staging (either in or out, or both). If you're using JSDL for atomic job submissions and not wrapping a bunch of them with a workflow, then use the data staging elements provided by JSDL. You could also use the JSDL data staging elements if you want to hide atomic job related data movement from the workflow layer. Cheers and take care, Ali Donal K. Fellows wrote:
Gargi B Dasgupta wrote:
My question is: In case of a workflow, with multiple job steps, and data transferring in and out between these steps, what is the natural home for these data staging elements.
Is it in the workflow or in the individual JSDL elements. Will the Data
Staging elements remain in JSDL and then transfers needed in the workflow be inferred from these JSDLs. Or will the staging be explicitly defined in the flow.
Excellent question.
Ideally, I'd imagine that they'd be in the flow, especially as transfer of a large dataset could take a long time (and considerable resources) in itself. JSDL deliberately doesn't do this, and this is because it doesn't assume the presence of an outer flow language (some use-cases specifically require data transfers in order for even a simple job to run, so they have to be there). If you've got such an outer language, do put the transfers there (though you could also use a JSDL document containing just a DataStaging section to describe the transfers, resulting in the flow language just joining up a bunch of JSDL docs).
Donal Fellows. -- jsdl-wg mailing list jsdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/jsdl-wg

Gargi,
Is the JSDL-wg also looking at data requirements of jobs for other than file type data. For ex. is there scope of representing a database connection required by a job in JSDL.
As far as I'm aware, there are no future plans for explicit data resource requirements specifications, such as you suggest, in JSDL. The JSDL group are looking at the Resource Model development work in OGF for future resource requirements specifications, which may include data resource requirements. Ali
Thanks Gargi
Gargi B Dasgupta IBM India Research Lab Plot No. 4, Phase II, Block C ISID Institutional Area Vasant Kunj, New Delhi 110070 Email:gdasgupt@in.ibm.com Phone:+91 11 51292192
*Ali Anjomshoaa <aanjomshoaa@mobability.co.uk>*
04/11/2007 02:26 PM
To "Donal K. Fellows" <donal.k.fellows@manchester.ac.uk> cc Gargi B Dasgupta/India/IBM@IBMIN, jsdl-wg@ogf.org Subject Re: [jsdl-wg] Regarding Data Staging Elements in JSDL
Hi Gargi,
if you've got a workflow framework to use, use it for the data staging (either in or out, or both). If you're using JSDL for atomic job submissions and not wrapping a bunch of them with a workflow, then use the data staging elements provided by JSDL.
You could also use the JSDL data staging elements if you want to hide atomic job related data movement from the workflow layer.
Cheers and take care,
Ali
Donal K. Fellows wrote:
Gargi B Dasgupta wrote:
My question is: In case of a workflow, with multiple job steps, and data transferring in and out between these steps, what is the natural home for these data staging elements.
Is it in the workflow or in the individual JSDL elements. Will the Data Staging elements remain in JSDL and then transfers needed in the workflow be inferred from these JSDLs. Or will the staging be explicitly defined in the flow.
Excellent question.
Ideally, I'd imagine that they'd be in the flow, especially as transfer of a large dataset could take a long time (and considerable resources) in itself. JSDL deliberately doesn't do this, and this is because it doesn't assume the presence of an outer flow language (some use-cases specifically require data transfers in order for even a simple job to run, so they have to be there). If you've got such an outer language, do put the transfers there (though you could also use a JSDL document containing just a DataStaging section to describe the transfers, resulting in the flow language just joining up a bunch of JSDL docs).
Donal Fellows. -- jsdl-wg mailing list jsdl-wg@ogf.org http://www.ogf.org/mailman/listinfo/jsdl-wg
-- Ali Anjomshoaa PhD Managing Director Mobability Limited Edinburgh, Scotland www.mobability.co.uk

Gargi, Gargi B Dasgupta schrieb:
Thanks for both the responses. So if I have a atomic job, I could use the JSDL staging elements. For a workflow, I have the choice of doing it at either place.
FYI: you are not alone ;) We tackle the same problem for a german Grid project (http://www.c3grid.de), but haven't decided yet.
Is the JSDL-wg also looking at data requirements of jobs for other than file type data. For ex. is there scope of representing a database connection required by a job in JSDL.
I wouldn't see this too much in the responsibility of JSDL, but cover this by encoding different types of data sources by using distinct URLS. This is rather clumsy and not what I'd call human-readable, but works for a sufficient number of use cases. Regards, -- Dipl.-Inform. Alexander Papaspyrou | 44221 Dortmund, NRW (Germany) Robotics Research Institute | phone : +49(231)755-5058 Information Technology Section | fax : +49(231)755-3251 Dortmund University | web : http://it.irf.de/~alexp

Donal, Donal K. Fellows schrieb:
Gargi B Dasgupta wrote:
My question is: In case of a workflow, with multiple job steps, and data transferring in and out between these steps, what is the natural home for these data staging elements.
Is it in the workflow or in the individual JSDL elements. Will the Data Staging elements remain in JSDL and then transfers needed in the workflow be inferred from these JSDLs. Or will the staging be explicitly defined in the flow.
Excellent question.
Ideally, I'd imagine that they'd be in the flow, especially as transfer of a large dataset could take a long time (and considerable resources) in itself. JSDL deliberately doesn't do this, and this is because it doesn't assume the presence of an outer flow language (some use-cases specifically require data transfers in order for even a simple job to run, so they have to be there). If you've got such an outer language, do put the transfers there (though you could also use a JSDL document containing just a DataStaging section to describe the transfers, resulting in the flow language just joining up a bunch of JSDL docs).
yes, but handling stuff completely outside impairs the possibility of using JSDL filesystems, which is a very interesting feature especially for POSIX jobs along with using environment variables. Maybe it makes sense to handle file naming and referencing in JSDL while keeping track of workflow I/O dependencies outside, probably doubling something. Regards, -- Dipl.-Inform. Alexander Papaspyrou | 44221 Dortmund, NRW (Germany) Robotics Research Institute | phone : +49(231)755-5058 Information Technology Section | fax : +49(231)755-3251 Dortmund University | web : http://it.irf.de/~alexp
participants (4)
-
Alexander Papaspyrou
-
Ali Anjomshoaa
-
Donal K. Fellows
-
Gargi B Dasgupta