DataStaging concerns

Peter G.Lane

19 May 2005 19 May '05

8:45 p.m.

Hi all, Forgive me if I'm reiterating on a topic. I've only be reading up on JSDL since yesterday. I have a few concerns about the DataStaging section. Primarily, I'm wondering if it really makes sense to have it as part of the core schema. I think it would be better to have extensions like POSIXApplication for more specific DRM configurations. Here are some of my thoughts: 1) There's still controversy over whether staging should or should not be integrated into a DRM. As far as I can tell, for example, the BES doesn't have any plans to implement staging. DRMMA makes this optional. If BES ends up using JSDL, wouldn't this be a violation of the spec which requires each element to be supported in some way? 2) There's no distinction between a stage-in or stage-out flavor of the staging directives. I guess it's up to the service to decipher this so that it can perform the staging at the appropriate point in the life cycle. 3) I don't particularly like that the DataStaging sections include an option to remove the file at the end of the job. If I'm staging out data then this doesn't make a whole lot of sense. I'd much prefer a separate section which explicitly lists all the files that are to be removed from the submission machine after the job has been completed. This would also cover the case of data that is created rather than staged in but still needs to be removed after job completion. 4) Based on the current GRAM incarnation, it would be nice to let RFT's transfer request description extend a base staging schema and then use that in the JSDL document rather than adding a bunch of extensions to DataStaging. This is similar to how I'd want to go about using POSIXApplication. Thanks in advance for any responses... Peter

Attachments:

smime.p7s (application/pkcs7-signature — 2.7 KB)

Show replies by date

Michel Drescher

20 May 20 May

9:32 a.m.

Hi Peter, thanks for your suggestions. Here're some answers to some of your your issues: On 19 May 2005, at 21:45, Peter G.Lane wrote:

...

Hi all,

Forgive me if I'm reiterating on a topic. I've only be reading up on JSDL since yesterday. I have a few concerns about the DataStaging section. Primarily, I'm wondering if it really makes sense to have it as part of the core schema. I think it would be better to have extensions like POSIXApplication for more specific DRM configurations.

I think this is one of the candidates to ignite a religious war. No pun intended, but if the standard JSDL DataStaging doesn't fit you needs, then don't use it - its optional anyway - and introduce your own extension for your needs, as you suggested. Besides the war of syntactic sugar (SCNR) DataStaging is part of JobDescription to emphasize two things: a) A Job, as JSDL understands it, comprises of the phases Stage in, run application, and stage out. b) More complete, again as JSDL understands it, a job submitted to a job execution system comprises of data to operate on (or to produce), an application that executes these operations, and a set of resources that are used by the application.

...

Here are some of my thoughts:

1) There's still controversy over whether staging should or should not be integrated into a DRM. As far as I can tell, for example, the BES doesn't have any plans to implement staging. DRMMA makes this optional. If BES ends up using JSDL, wouldn't this be a violation of the spec which requires each element to be supported in some way?

You got a point there, and this will be addressed in the coming days (having F2F meetings of OGSA-BES *and* OGSA-JSDL in London at Imperial College).

...

2) There's no distinction between a stage-in or stage-out flavor of the staging directives. I guess it's up to the service to decipher this so that it can perform the staging at the appropriate point in the life cycle.

It is not really up to the service. A DataStaging element has either a Source child element, a Target child element, or both. A Source element being present clearly tags a DataStaging element to be processed in the stage in phase of a JSDL job. A Target child element, respectively, requires its DataStaging parent being processed in the stage out phase of the JADL job. Having both Source and Target elements indicates that the containing DataStaging element needs processing in both stage in and stage out phases of the job execution process. So in the end, it's rather indirectly encoded in DataStaging itself. Cheers, Michel

Peter G.Lane

3:16 p.m.

...

...
2) There's no distinction between a stage-in or stage-out flavor of the staging directives. I guess it's up to the service to decipher this so that it can perform the staging at the appropriate point in the life cycle.

It is not really up to the service. A DataStaging element has either a Source child element, a Target child element, or both. A Source element being present clearly tags a DataStaging element to be processed in the stage in phase of a JSDL job. A Target child element, respectively, requires its DataStaging parent being processed in the stage out phase of the JADL job. Having both Source and Target elements indicates that the containing DataStaging element needs processing in both stage in and stage out phases of the job execution process. So in the end, it's rather indirectly encoded in DataStaging itself.

Ah, I was assuming Source and Target meant something like "from URL" and "to URL", so that they would be specified in each DataStaging section. In that way I saw no difference between stage-in and stage-out. So I would argue that the element names are confusing, but unfortunately I don't have a better suggestion at the moment.

...

Cheers, Michel

Michel Drescher

3:28 p.m.

...

...
It is not really up to the service. A DataStaging element has either a Source child element, a Target child element, or both. A Source element being present clearly tags a DataStaging element to be processed in the stage in phase of a JSDL job. A Target child element, respectively, requires its DataStaging parent being processed in the stage out phase of the JADL job. Having both Source and Target elements indicates that the containing DataStaging element needs processing in both stage in and stage out phases of the job execution process. So in the end, it's rather indirectly encoded in DataStaging itself.

Ah, I was assuming Source and Target meant something like "from URL" and "to URL", so that they would be specified in each DataStaging section. In that way I saw no difference between stage-in and stage-out. So I would argue that the element names are confusing, but unfortunately I don't have a better suggestion at the moment.

Actually, Source and Target *are* specified in each DataStaging element, if applicable. From the viewpoint of the consuming service, the Source child of a DataStaging means something like "from URL" (to fetch from that URL in stage in), and the Target child of a DataStaging respectively means in deed something like "to URL" (to store at). In fact, the element names could be something different (i..e. "FetchFrom" and "StoreAt"), but then they may confuse other people. That's how standards are like, I presume. :o) Cheers, Michel

William Lee

21 May 21 May

7:59 a.m.

...

Actually, Source and Target *are* specified in each DataStaging element, if applicable. From the viewpoint of the consuming service, the Source child of a DataStaging means something like "from URL" (to fetch from that URL in stage in), and the Target child of a DataStaging respectively means in deed something like "to URL" (to store at). In fact, the element names could be something different (i..e. "FetchFrom" and "StoreAt"), but then they may confuse other people. That's how standards are like, I presume. :o)

Cheers, Michel

Taking the view from the thread, I'll try to make the meaning of <Source/> and <Target/> more precise. The consuming system will ensure the file identified <FileName/> will have its content originate from the source url, not necessarily fetched, it might be mounted / copied / cached. For <Target/>, the post-condition after the job is finished is that the file resource at the target URL will be identical to (or appended from) the file content on the execution system. The file might be staged-out *after* the job, or streamed during the job, the only guarantee is it will be available after the job has finished. I think the dilemma in pinning down the semantic of DataStaging is because JSDL does not imply the consuming system must be a job execution system. Even talking about "after the job is finished" might assume too much in the context of JSDL. William --- William Lee @ London e-Science Centre, Imperial College London -- --- Software Coordinator --- A: Room 380, Department of Computing, Imperial College London, Huxley Building, South Kensington campus, London SW7 2AZ, UK E: wwhl'@'doc.ic.ac.uk | william'@'imageunion.com W: www.lesc.ic.ac.uk | www.imageunion.com P: +44(0)20 7594 8251 F: +44(0)20 7581 8024 --- Projects ---------------------------- GridSAM: http://www.lesc.ic.ac.uk/gridsam Markets: http://www.lesc.ic.ac.uk/markets ICENI: http://www.lesc.ic.ac.uk/iceni -----------------------------------------

Donal K. Fellows

20 May 20 May

10:26 a.m.

Peter G.Lane wrote:

...

Forgive me if I'm reiterating on a topic. I've only be reading up on JSDL since yesterday. I have a few concerns about the DataStaging section. Primarily, I'm wondering if it really makes sense to have it as part of the core schema. I think it would be better to have extensions like POSIXApplication for more specific DRM configurations.

We suspect that there's going to be quite a bit of extension in that area, and welcome feedback for post-1.0 (we're very very unlikely to change anything for JSDL 1.0 now; it doesn't do everything, but it does a useful fraction and too many people need something - anything! - now).

...

1) There's still controversy over whether staging should or should not be integrated into a DRM. As far as I can tell, for example, the BES doesn't have any plans to implement staging. DRMMA makes this optional. If BES ends up using JSDL, wouldn't this be a violation of the spec which requires each element to be supported in some way?

"Supported" has a very particular meaning within a JSDL context, and the effective meaning could include a definite response "I don't know how to do data staging, man!" We discussed data staging quite a few time (around a year ago IIRC) and what we came up with is a minimum to allow processing of jobs on a number of different systems including domains like cross-cluster deployment where everything has to be shipped in first. If we'd had a proper workflow language too, we'd have done data staging differently. But there wasn't something suitable already existing (BPEL does something else) and if we'd have had to develop our own, we'd still be arguing about it now.

...

3) I don't particularly like that the DataStaging sections include an option to remove the file at the end of the job. If I'm staging out data then this doesn't make a whole lot of sense. I'd much prefer a separate section which explicitly lists all the files that are to be removed from the submission machine after the job has been completed. This would also cover the case of data that is created rather than staged in but still needs to be removed after job completion.

The "remove data at the end of the job" applies after staging it out (it'd be a bit silly otherwise). It is also the case that you can list a file in the data staging section and not have it staged in or out, but just deleted at end-of-job.

...

4) Based on the current GRAM incarnation, it would be nice to let RFT's transfer request description extend a base staging schema and then use that in the JSDL document rather than adding a bunch of extensions to DataStaging. This is similar to how I'd want to go about using POSIXApplication.

I don't fully grasp what you mean here. The following is legal (modulo namespaces) according to draft-18: <jsdl:DataStaging> <jsdl:FileName>example</jsdl:FileName>  <jsdl:CreationFlag>jsdl:overwrite</jsdl:CreationFlag> <jsdl:Source>  <rft:SynchFileWithSomewhere>...</rft:SynchFileWithSomewhere> </jsdl:Source> </jsdl:DataStaging> (Hmm, the non-normative examples in the data-staging section of d18 seem out of step. Bother.) Given that the above is legal, what's the problem? Donal.

Karl Czajkowski

11:29 a.m.

Donal: did we lose the xsd:any child of the top-level JSDL document element? If we decided to have a WS-GRAM dialect of JSDL where we just transliterated some of the biggies like our staging clause, I would expect there to be, for example, a single (or three) "RFT element" at the top-level as a peer to the POSIX application and resource sections and zero jsdl:FileStaging elements. The problem Peter is having, I think, is that it would be awkward to try to break the RFT element into separate pieces for placement inside of specific separate FileStaging elements per file. He would like to feel comfortable completely ignoring the JSDL staging model and putting in the WS-GRAM one instead. There are sufficient differences in how WS-GRAM and JSDL understand the filesystem(s) of the computing host and the responsibilities of the submitter and job system such that supporting a "WS-GRAM file management extension" may not be consistent with supporting the "JSDL staging model". I certainly thought this would be possible. Note, this is a technical question in my mind. I realize there is an entirely orthogonal concern about when one "should" use the extension mechanisms in a particular way... in these hypothetical discussions we all bring a lot of assumptions about how other standards will appear. For example, I assume BES will not include staging nor define how BES services respond to FileStaging elements. A BES client expecting interop would not use the JSDL staging nor any other non-BES extensions. Therefore, I do not see a BES + WS-GRAM staging extension as described above to really be more or less interoperable than one that tries to use the JSDL staging syntax. It would be specifically for transitional/legacy use by applications not content to use the interop profile(s). karl -- Karl Czajkowski karlcz@univa.com

Donal K. Fellows

1:46 p.m.

Karl Czajkowski wrote:

...

Donal: did we lose the xsd:any child of the top-level JSDL document element?

Not from rev 18 of the spec, and having xsd:any##other everywhere (except in RangeValueType of course) is certainly our intention.

...

If we decided to have a WS-GRAM dialect of JSDL where we just transliterated some of the biggies like our staging clause, I would expect there to be, for example, a single (or three) "RFT element" at the top-level as a peer to the POSIX application and resource sections and zero jsdl:FileStaging elements.

Sounds fine to me actually. I'm personally intending to use JSDL in an overall workflow document where the datastaging bits are peers to the JSDL document-lets. Obviously this is a scope way outside the classic scope of JSDL, but that doesn't bother me in the slightest. :^) [much very sensible stuff elided]

...

I certainly thought this would be possible. Note, this is a technical question in my mind. I realize there is an entirely orthogonal concern about when one "should" use the extension mechanisms in a particular way... in these hypothetical discussions we all bring a lot of assumptions about how other standards will appear. For example, I assume BES will not include staging nor define how BES services respond to FileStaging elements. A BES client expecting interop would not use the JSDL staging nor any other non-BES extensions. Therefore, I do not see a BES + WS-GRAM staging extension as described above to really be more or less interoperable than one that tries to use the JSDL staging syntax. It would be specifically for transitional/legacy use by applications not content to use the interop profile(s).

I suspect that JSDL 1.0's very simple data staging stuff is not going to be anything like the end of the story. The problem is that I think doing anything better at the common standard level (as opposed to in some system-specific extension) will require us to take on the tangle of workflow. Anyone up for rechartering? :^) To cut a long story short, do something sensible with data staging. Getting the standard to a point where it won't be necessary in most situations will require enough effort that an interim solution is a recommended strategy. Well, in my opinion anyway. Donal.

Peter G.Lane

4:24 p.m.

On May 20, 2005, at 7:46 AM, Donal K. Fellows wrote:

...

Karl Czajkowski wrote:

...
Donal: did we lose the xsd:any child of the top-level JSDL document element?

Not from rev 18 of the spec, and having xsd:any##other everywhere (except in RangeValueType of course) is certainly our intention.

...
If we decided to have a WS-GRAM dialect of JSDL where we just transliterated some of the biggies like our staging clause, I would expect there to be, for example, a single (or three) "RFT element" at the top-level as a peer to the POSIX application and resource sections and zero jsdl:FileStaging elements.

Sounds fine to me actually. I'm personally intending to use JSDL in an overall workflow document where the datastaging bits are peers to the JSDL document-lets. Obviously this is a scope way outside the classic scope of JSDL, but that doesn't bother me in the slightest. :^)

Now I believe I understand better when Karl mentioned to me that JSDL would be most useful for use in GRAM if it were used in conjunction with something like the BES. Semantics, as I read and you allude to here, are out of scope of the JSDL spec. So the only way to get real interop between DRMs which use JSDL is to also have a standard semantic behind it. Anyway, I appreciate everyone's time in explaining the finer points. Thanks! Peter

Peter G.Lane

4:12 p.m.

...

...
1) There's still controversy over whether staging should or should not be integrated into a DRM. As far as I can tell, for example, the BES doesn't have any plans to implement staging. DRMMA makes this optional. If BES ends up using JSDL, wouldn't this be a violation of the spec which requires each element to be supported in some way?

"Supported" has a very particular meaning within a JSDL context, and the effective meaning could include a definite response "I don't know how to do data staging, man!"

Great, thanks for that explanation. I had kind of assumed something of the sort, but I wasn't sure.

...

We discussed data staging quite a few time (around a year ago IIRC) and what we came up with is a minimum to allow processing of jobs on a number of different systems including domains like cross-cluster deployment where everything has to be shipped in first.

If we'd had a proper workflow language too, we'd have done data staging differently. But there wasn't something suitable already existing (BPEL does something else) and if we'd have had to develop our own, we'd still be arguing about it now.

...
3) I don't particularly like that the DataStaging sections include an option to remove the file at the end of the job. If I'm staging out data then this doesn't make a whole lot of sense. I'd much prefer a separate section which explicitly lists all the files that are to be removed from the submission machine after the job has been completed. This would also cover the case of data that is created rather than staged in but still needs to be removed after job completion.

The "remove data at the end of the job" applies after staging it out (it'd be a bit silly otherwise).

Right. I was thinking that it would mean "remove from target". I understand the semantics better now, so this makes sense.

...

It is also the case that you can list a file in the data staging section and not have it staged in or out, but just deleted at end-of-job.

Great! I hadn't considered this possibility.

...

...
4) Based on the current GRAM incarnation, it would be nice to let RFT's transfer request description extend a base staging schema and then use that in the JSDL document rather than adding a bunch of extensions to DataStaging. This is similar to how I'd want to go about using POSIXApplication.

I don't fully grasp what you mean here. The following is legal (modulo namespaces) according to draft-18:

<jsdl:DataStaging> <jsdl:FileName>example</jsdl:FileName>  <jsdl:CreationFlag>jsdl:overwrite</jsdl:CreationFlag> <jsdl:Source>  <rft:SynchFileWithSomewhere>...</rft:SynchFileWithSomewhere> </jsdl:Source> </jsdl:DataStaging>

(Hmm, the non-normative examples in the data-staging section of d18 seem out of step. Bother.)

Given that the above is legal, what's the problem?

Actually, this clarifies things for me. This misunderstanding was also due to my incorrect interpretation of what Source and Target meant.

7359

Age (days ago)

7361

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Donal K. Fellows
Karl Czajkowski
Michel Drescher
Peter G.Lane
William Lee