
Hi Malcolm! :-) Quoting [Malcolm Illingworth] (Jun 19 2008):
Hi,
I think trying to define a workflow description language from scratch would be very tricky to get right. It would have to be sufficiently general not to be domain-specific, but at the same time verbose enough to actually be useful.
Yes, certainly, I agree we should stay away from this.
I do like the idea of having a Workflow class defined in the API. At least from the Java perspective, I think this would be easy to do. I cant remember offhand if a TaskContainer is itself a Task, but it would be extremely powerful if the Workflow class could accept TaskContainers as well as Tasks.
Its not the case at the moment, but would absolutely be very useful.
This would make dependancies easy to manage; for example you could have tasks at multiple sites, with a TaskContainer representing each site, and a single Workflow managing the set of TaskContainer instances.
Yes, true, but it would make other WF concepts (like loops, if-this-do-that etc) more difficult. Question is what expressiveness we want. Another way to add dependencies might be: task_2.depends_upon (task_1); task_2.depends_upon (task_container_1); etc. Then running any of the involved tasks could start the complete workflow, as the SAGA implentation would be able to reconstruct the graph from that. It would also allow for other constructs, like task_2.on_error_run (task_3); task_2.on_success_run (task_4); task_2.on_timeout_run (task_5, 100.0); // 100 seconds Its more verbose than your suggestion though.
(Within DESHL we took a simple but effective approach. Since we are command-line based, some of my colleagues at EPCC have developed a workflow application for ensemble computing based on bash scripts issuing DESHL commands.)
:-)) I also think its actually simple to implement. One problem though when you implement the enactor in the SAGA library: you need to keep your client application (the enactor) alove as long your workflow is running. Not sure if that is a serious limitation...
Regards, Malcolm.
Thanks, Andre.
-----Original Message----- From: saga-rg-bounces@ogf.org [mailto:saga-rg-bounces@ogf.org] On Behalf Of Andre Merzky Sent: 18 June 2008 21:46 To: SAGA RG Subject: Re: [SAGA-RG] Workflow support in SAGA
ust a small comment for now: I'd like to adjust this statement of mine
That is correct. What is missing in SAGA for basic workflow support are job dependencies.
There are actually two ways to add workflow support in SAGA (not mutually exclusive):
(a) add task dependencies, and thus allow to describe a workflow built from saga jobs and saga tasks;
(b) add the ability to submit a work flow description (usually some XML document or such) to SAGA, which will (directly or indirectly) take steps to execute that workflow.
IMHO, (a) would be really cool, and very powerful. (b) OTOH would be simple to implement:
class saga::workflow : public saga::async { workflow (std::string workflow_description); ~workflow (void);
void run (void); };
Problems only start once we start to look inside that description... - as Ashiq said, its unlikely that the will be a single one workflow desciption language to make all people happy...
Best, Andre.
Quoting [Andre Merzky] (Jun 18 2008):
[forwarded reply from Ashiq:]
Dear Andre,
Thanks for your detailed reply.
I believe applications really need the workflow support in the SAGA specification(and implementations). Such a feature will definitely help the applications to quickly adopt SAGA and without this, they are being restricted in exploiting the full potential of Grids. We will be happy to talk to you more on the subject and will help wherever required.
I do no think that different communities will ever agree on a single workflow language. Biomedical community is predominantly using SCUFL, HEP Physics community has been using DAGs since long and business community is quickly adopting BPEL. Some other groups are using pi-calculus and petri-nets for more abstract formalism. I believe different communities should have the freedom to use a language of their choice and SAGA should provide a mechanism where different languages and enactment engines can be supported. This could be a wish list but I do not know how much difficult is to achieve this goal?
Moreover, what are your views on the workflow support in JSAGA? Is it following any standard or they plan to update the SAGA specification retrospectively?
Quoting [Andre Merzky] (Jun 18 2008):
Date: Wed, 18 Jun 2008 22:33:26 +0200 From: Andre Merzky <andre@merzky.net> To: SAGA RG <saga-rg@ogf.org> Bcc: Andre Merzky <andre@merzky.net> Subject: Workflow support in SAGA
Hi folx,
we recently started a discussion thread about workflow support in SAGA, off this list. This mail tries to move that thread onto this list, in order to gauge interest and feedback from the wider community. So, please feel free to jump in!
Below are excerpts from the thread (resorted, reformatted, shortened),
as starting point.
Best regards,
Andre.
>>> On Fri, Jun 13, 2008, yasir mehmood wrote: >>> >>> A colleague of mine (Irfan Habib) is writing a pipeline >>> service as a client to the glueing service which passes it >>> (or plan to pass) a workflow instead of a job. He is >>> concerned about the workflow support in Saga Java >>> Implementations. Could you please confirm this ? >>> >>> >> Thilo Kielmann wrote: >> >> Dear Yasir, >> >> The SAGA API (and thus also its implementation in >> Java) does not deal with workflows, only with individual jobs. >> You have to construct workflows yourself, using SAGA jobs, >> containers, etc. > > > Quoting [Irfan Habib] (Jun 13 2008): > > Dear Thilo, > > Thank you for your kind reply, may I just ask one more > question. > > As you have stated that the SAGA API does not deal with > workflows only individual jobs, so in that case if I want to > execute a workflow through SAGA, I have to externally handle > the execution of the workflow and invoke SAGA only for > execution of individual jobs? > > -----Original Message----- From: Andre Merzky [mailto:andre@merzky.net]
That is correct. What is missing in SAGA for basic workflow support are job dependencies. That is what you need to maintain manually.
Please have a look at the saga::task::container class - that class is supposed to make it easier to manage large numbers of tasks/jobs.
If you have interest in that topic, we could discuss to draft a saga package which adds WF like dependencies to tasks... This is on our roadmap anyway, but so far with low priority, due to lack of interest and use cases.
Best, Andre.
Quoting [Ashiq Anjum] (Jun 17 2008):
Dear Andre,
If this is the case then SAGA introduces a large overhead: The applications (for example the Pipeline service and other services we intend to use) would have to have a middle component which would get the workflow, break it into constituent parts, and invoke the API for each job separately, this component in this scenario would be a enactment engine. In this scenario the Grid middleware at any given time would be scheduling one single job, rather then an entire workflow.
I think this is a bottleneck and users can not do fine grained scheduling of compute and data intensive jobs without this feature. The purpose behind SAGA was to facilitate user by enabling them to transparently access the Grid. But if the users has to code themselves for the workflow dependencies, this will be much more complex for the users and will go against the 20:80 slogan for SAGA. I think this feature will be very helpful to be included in the future releases.
I was also under the impression that glite adapter is already available. Do you have some idea that how far is this from a stable release?
Moreover, did some one submit jobs (using Java implementation of SAGA) on a real execution environment (eg. condor, SGE etc)? It looks like most of the testing has been made using fork based execution but submitting jobs on condor/SGE will require range of JSDL parameters to be sent from the client application through the SAGA API. Can we create JSDL's for complex jobs to be executed on remote execution environments (eg Condor, SGE)?
Thanks,
Best Regards Ashiq Anjum
Dear Ashiq,
we are aware (painfully so) of the fact that SAGA does not yet cover the full scope of programming models used in Grid environments. And yes, workflows are an obvious gap, but so are others: p2p, messaging, checkpoint/recovery, transactions, resorce discovery, service discovery, ...
The list is long, and we are working on a number of those items. As for workflows: we simply did not have enough input from the workflow community in order to make an informed decision on how a workflow oriented job API should look like. It might be enough to simply accept a workflow description and to submit it to an enactor - but which language to use? At the time, it was _very_ unclear if and what workflow language would dominate the scene (I have no idea how the situation is now - is BPAL 'the one'?). And idealy we would also like to express workflows programmatically, too, i.e. as dependencies between saga::job instances.
I have two more comments slightly less defensive, and more constructive I hope :-)
First, writing a workflow enactor in SAGA would be cool, and potentially useful for a range of people and pojects! I understand that this is not on your / Yasirs roadmap, and probably very distracting and time consuming. But you may want to check with other projects if they would be interested in joining forces.
Second, this might be good timing to actually add workflow support to the SAGA standard! We would very much appreciate your input on that topic (although I would like to move this thread to the saga-rg@ogf.org mailing list then). Is that something you would consider spending time on?
These are my toughts on the topic so far - I'd be happy to get your feedback.
Thanks, Andre.
-- Nothing is ever easy.