Dear Andre,
There may be number of other scenarios where a workflow
support in SAGA is needed. I am making here a case that SAGA can get a
wide scale
adoption in number of scientific and business communities if the
workflow
support is made available.
Ashiq Anjum
From: Andre Merzky <andre@merzky.net>
Date: 8 October 2009 19:10:02 GMT+01:00
To: Irfan Habib <irfan.habib@cern.ch>
Cc: Andre Merzky <andre@merzky.net>, <gat-devel@cct.lsu.edu>
Bcc: Andre Merzky <andre@merzky.net>
Subject: Re: [Gat-devel] Proposal for changes in the glite-adaptor
Reply-To: Andre Merzky <andre@merzky.net>
Hi Irfan,
BTW: nice to hear from you again! :-) Greetings to the other guys
in Bristol!
Quoting [Irfan Habib] (Oct 08 2009):
Dear Andre,
I completely understand your motivation, however to me the glite-
adaptor is not equal to the other adaptors. Because the glite-adaptor
does not work at the "same level" as the other adaptors that are part
of the SAGA and javaGAT package. All other adaptors work at the site
level, where the user selects a specific site gateway to submit a job
to (SGE, condor, gridsam are all middleware for site based clusters).
If the glite-adaptor submitted jobs to a glite-CE rather than glite-
WMS we can consider creating our own workflow enactor on top of SAGA,
and that would be a SAGA compliant solution.
However, the glite-adaptor interacts with a Grid-level service (glite-
WMS) and hence a single job submission incurs significant amount of
overhead. The testing Grid environment I have access to, incurs a
submission overhead of 2-4 minutes and production EGEE environment has
2 - 45 minutes. If I were to enact a workflow of tasks (tasks that are
a mutually dependant) to the Grid all the job latencies that will
accumulate will severely impact the workflow turnaround time.
To address this issue glite-WMS enables a user to define a workflow in
a JDL. You can author a JDL which includes a DAG workflow and submit
the JDL to glite-WMS, this sends a single request to glite-WMS, and
submission overheads are incurred only once rather than repeatedly for
large workflows. Of course users may want to test the workflow in a
local cluster of SGE or Condor this can be easily supported through
the use of adaptors and SAGA based workflow enactor.
Our motivation is purely performance oriented.
FWIW, there are other backends which expose similar latencies. Our
EC2 adaptor for example submits jobs to a Virtual Machine instance -
if that instance needs to be newly created, submission has a latency
of up to 5 minutes. We also try to enact workflows on EC2 (actually
Teragrid + EC2), so we face quite similar problems as you do, I
expect.
We have three different solutions for the problem:
(i) We try to use large jobs, for which the startup time becomes
relatively small. That is of course not always possible.
(ii) We built a PilotJob infrastructure on top of SAGA, which
requires one pilot job per requested rsource to get submitted once
(with the latency penalty, but you can of course overlap that
bootstrapping for all resources). That pilot job is then some kind
of private gateway, which executes the real workflow nodes for us.
I am not sure if our pilot job implementation is of any use for you,
as its implemented in Python on top of C++, so probably well outside
of your tool chain, but let me know if you want to check it out, and
I get you in contact with the developer.
(iii) not as much a solution, but rather an approach: we started to
implement a DAG enactor in SAGA, which basically acts as a service,
so replaces your glite-WMS. The DAG is then specified in an
additional Workflow Extension to SAGA (which is yet to be defined).
As the DAG enactor can access resources from within the system (in a
pilot job like manner), it avoids most submission related latencies.
We do have an prototype for that - again in C++, and most likely not
functional enough for you to be of any use. But if you (or others
on this list) are interested in pursuing that route, in particular
in respect to defining a SAGA extension package, please let me know,
and I'll keep you posted.
[...] and that would be a SAGA compliant solution.
Actually, it would not. I absolutely undertsnd your issues, and
agree that you need a solution to be able to use JavaSAGA/JavaGAT
sensibly, but exposing backend details on SAGA API level is breaking
SAGA compliance, by definition - sorry...
To the GAT list admins: please let us know if this discussion is off
topic for the list, and we move it elsewhere...
Cheers, and thanks, Andre.
Best Regards,
Irfan Habib
On 8 Oct 2009, at 08/10/2009,16:11, Andre Merzky wrote:
Hi Irfan,
FWIW, I agree with Max here: if you can't express your job with the
SAGA means, then that may be a deficiency in SAGA (or GAT) - but
adding backend specific methods or properties on API level defeats
the purpose of a generic API...
Out of curiosity: what parts of the JDS specifically do you have
trouble with?
Cheers, Andre.
Quoting [Max Berger] (Oct 08 2009):
Hi Irfan Habib,
two things:
- If you write and submit the patch (or send it to me) I'll review it
and apply. (I'm the current 'maintainer' for the gLite adaptor).
BUT
this would defeat the purpose of the JavaGAT layer. There are already
too many gLite specfic settings to use, and this would mean adding
even
more peculiarities.
So, what I'd prefer is a solution which uses JavaGAT and maps the job
structure there to the actual job structure. The features you've
requested are called "CoScheduleJob" in JavaGAT:
http://www.cs.vu.nl/ibis/javadoc/javagat/org/gridlab/gat/resources/CoScheduleJob.html
Please consider this alternative. It would make portability to other
Grid environments much easier in the future.
Max
Irfan Habib schrieb:
Hi,
I have been able to submit jobs through JavaSAGA with the glite-
adaptor.
However, the kind of jobs that can be supported through the adaptor
currently are fairly simple atomic executables. For our project we
require more complex JDL jobs (JDL DAG workflows, JDL parametric
jobs
etc).
One way forward for us is to extend the glite-adaptor to accept JDL
jobs from the users. For instance one way to implement this would be
to introduce another attribute in the Preference context. If the
concerned attribute it defined the glite-adaptor instead of
generating
a JDL from scratch, uses the JDL that has been passed to the
JobDescription.EXECUTABLE, for instance.
Such changes, in our opinion, add to the capabilities of the gLite-
adaptor and enable it to be used for more complex gLite specific
jobs.
Would such changes be acceptable to the javaGAT glite-adaptor
developers? Can such changes be included in the trunk?
Best Regards,
Irfan
On 7 Oct 2009, at 07/10/2009,22:53, Ceriel Jacobs wrote:
Irfan Habib wrote:
Hi,
Well according to the svn log, the changes have been committed,
however the directory attribute is still being set.
I hope to have fixed it now. The attribute was still being set
because the WORKINGDIRECTORY entry of JobDescription has a default
value, which I forgot about. SAVE_STATE should also be fixed now.
Best wishes,
Ceriel
_______________________________________________
Gat-devel mailing list
Gat-devel@cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/gat-devel
_______________________________________________
Gat-devel mailing list
Gat-devel@cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/gat-devel
--
Nothing is ever easy.
--
Nothing is ever easy.