From: Andre Merzky <andre@merzky.net>

Date: 8 October 2009 19:10:02 GMT+01:00

To: Irfan Habib <irfan.habib@cern.ch>

Cc: Andre Merzky <andre@merzky.net>, <gat-devel@cct.lsu.edu>

Bcc: Andre Merzky <andre@merzky.net>

Subject: Re: [Gat-devel] Proposal for changes in the glite-adaptor

Reply-To: Andre Merzky <andre@merzky.net>

Hi Irfan,

BTW: nice to hear from you again! :-) Greetings to the other guys
in Bristol!

Quoting [Irfan Habib] (Oct 08 2009):

Dear Andre,

I completely understand your motivation, however to me the glite-

adaptor is not equal to the other adaptors. Because the glite-adaptor

does not work at the "same level" as the other adaptors that are part

of the SAGA and javaGAT package. All other adaptors work at the site

level, where the user selects a specific site gateway to submit a job

to (SGE, condor, gridsam are all middleware for site based clusters).

If the glite-adaptor submitted jobs to a glite-CE rather than glite-

WMS we can consider creating our own workflow enactor on top of SAGA,

and that would be a SAGA compliant solution.

However, the glite-adaptor interacts with a Grid-level service (glite-

WMS) and hence a single job submission incurs significant amount of

overhead. The testing Grid environment I have access to, incurs a

submission overhead of 2-4 minutes and production EGEE environment has

2 - 45 minutes. If I were to enact a workflow of tasks (tasks that are

a mutually dependant) to the Grid all the job latencies that will

accumulate will severely impact the workflow turnaround time.

To address this issue glite-WMS enables a user to define a workflow in

a JDL. You can author a JDL which includes a DAG workflow and submit

the JDL to glite-WMS, this sends a single request to glite-WMS, and

submission overheads are incurred only once rather than repeatedly for

large workflows. Of course users may want to test the workflow in a

local cluster of SGE or Condor this can be easily supported through

the use of adaptors and SAGA based workflow enactor.

Our motivation is purely performance oriented.

FWIW, there are other backends which expose similar latencies. Our
EC2 adaptor for example submits jobs to a Virtual Machine instance -
if that instance needs to be newly created, submission has a latency
of up to 5 minutes. We also try to enact workflows on EC2 (actually
Teragrid + EC2), so we face quite similar problems as you do, I
expect.

We have three different solutions for the problem:

(i) We try to use large jobs, for which the startup time becomes
relatively small. That is of course not always possible.

(ii) We built a PilotJob infrastructure on top of SAGA, which
requires one pilot job per requested rsource to get submitted once
(with the latency penalty, but you can of course overlap that
bootstrapping for all resources). That pilot job is then some kind
of private gateway, which executes the real workflow nodes for us.

I am not sure if our pilot job implementation is of any use for you,
as its implemented in Python on top of C++, so probably well outside
of your tool chain, but let me know if you want to check it out, and
I get you in contact with the developer.

(iii) not as much a solution, but rather an approach: we started to
implement a DAG enactor in SAGA, which basically acts as a service,
so replaces your glite-WMS. The DAG is then specified in an
additional Workflow Extension to SAGA (which is yet to be defined).
As the DAG enactor can access resources from within the system (in a
pilot job like manner), it avoids most submission related latencies.

We do have an prototype for that - again in C++, and most likely not
functional enough for you to be of any use. But if you (or others
on this list) are interested in pursuing that route, in particular
in respect to defining a SAGA extension package, please let me know,
and I'll keep you posted.

[...] and that would be a SAGA compliant solution.

Actually, it would not. I absolutely undertsnd your issues, and
agree that you need a solution to be able to use JavaSAGA/JavaGAT
sensibly, but exposing backend details on SAGA API level is breaking
SAGA compliance, by definition - sorry...

To the GAT list admins: please let us know if this discussion is off
topic for the list, and we move it elsewhere...

Cheers, and thanks, Andre.

Best Regards,

Irfan Habib

On 8 Oct 2009, at 08/10/2009,16:11, Andre Merzky wrote:

Hi Irfan,

FWIW, I agree with Max here: if you can't express your job with the

SAGA means, then that may be a deficiency in SAGA (or GAT) - but

adding backend specific methods or properties on API level defeats

the purpose of a generic API...

Out of curiosity: what parts of the JDS specifically do you have

trouble with?

Cheers, Andre.

Quoting [Max Berger] (Oct 08 2009):

Hi Irfan Habib,

two things:

- If you write and submit the patch (or send it to me) I'll review it

and apply. (I'm the current 'maintainer' for the gLite adaptor).

BUT

this would defeat the purpose of the JavaGAT layer. There are already

too many gLite specfic settings to use, and this would mean adding

even

more peculiarities.

So, what I'd prefer is a solution which uses JavaGAT and maps the job

structure there to the actual job structure. The features you've

requested are called "CoScheduleJob" in JavaGAT:

http://www.cs.vu.nl/ibis/javadoc/javagat/org/gridlab/gat/resources/CoScheduleJob.html

Please consider this alternative. It would make portability to other

Grid environments much easier in the future.

Max

Irfan Habib schrieb:

Hi,

I have been able to submit jobs through JavaSAGA with the glite-

adaptor.

However, the kind of jobs that can be supported through the adaptor

currently are fairly simple atomic executables. For our project we

require more complex JDL jobs (JDL DAG workflows, JDL parametric

jobs

etc).

One way forward for us is to extend the glite-adaptor to accept JDL

jobs from the users. For instance one way to implement this would be

to introduce another attribute in the Preference context. If the

concerned attribute it defined the glite-adaptor instead of

generating

a JDL from scratch, uses the JDL that has been passed to the

JobDescription.EXECUTABLE, for instance.

Such changes, in our opinion, add to the capabilities of the gLite-

adaptor and enable it to be used for more complex gLite specific

jobs.

Would such changes be acceptable to the javaGAT glite-adaptor

developers? Can such changes be included in the trunk?

Best Regards,

Irfan

On 7 Oct 2009, at 07/10/2009,22:53, Ceriel Jacobs wrote:

Irfan Habib wrote:

Hi,

Well according to the svn log, the changes have been committed,

however the directory attribute is still being set.

I hope to have fixed it now. The attribute was still being set

because the WORKINGDIRECTORY entry of JobDescription has a default

value, which I forgot about. SAVE_STATE should also be fixed now.

Best wishes,

Ceriel

_______________________________________________

Gat-devel mailing list

Gat-devel@cct.lsu.edu

https://mail.cct.lsu.edu/mailman/listinfo/gat-devel

_______________________________________________

Gat-devel mailing list

Gat-devel@cct.lsu.edu

https://mail.cct.lsu.edu/mailman/listinfo/gat-devel

--

Nothing is ever easy.

--
Nothing is ever easy.