Fwd (s.newhouse@omii.ac.uk): Re: SAGA Strawman API

Hi Steven, thanks for your comments, I forward them to the list - hope you don't mind. Cheers, Andre. ----- Forwarded message from Steven Newhouse <s.newhouse@omii.ac.uk> -----
Date: Sun, 29 May 2005 14:47:40 +0100 From: Steven Newhouse <s.newhouse@omii.ac.uk> To: Shantenu Jha <s.jha@ucl.ac.uk> CC: Tom Goodale <goodale@cct.lsu.edu>, Andre Merzky <andre@merzky.net> Subject: Re: SAGA Strawman API
Hi Shantenu,
In order to advance the SAGA strawman API, we need urgently some feedback from outside the authors group.
I have been carrying around for over month and have only just (late I'm afraid!) got round to looking at it. A few general comments...
1. OO vs. functional I can understand specifying in one language - no problem. But you need to show read across to different languages and systems.
2. Task Didn't understand this at all! Are you proposing to support 5 different systems or expecting one to be selected from these 5?
3. Security Inconsistent use of security - it appears in the stream section but nowhere else. I think this is very wrong. Not specifying a security system makes sense, but I think all of thes emethods should have a security token being passed into them or it should be an argument in the relevant object constructor.
4. URL - P10 I assume the URL can support a number of protocols - http/https/file/gridftp. I see no way to register plugins in the interface (may be this is an 'internal' interface as opposed to a user funtion. Maybe there needs to be aet of interfaces to help the developers, e.g. register protocol plugins.
5. Languages These examples (with the OO background) obviously look very Java. I think you need a section (but not for each area) showing how the interfaces get rendered in different areas. This will help adoption & buyin from Perl, Python, F77, C, & other communities. How are the exceptions handled in different programming languages? I'd almost be tempted to return all error codes from the function as the return value.
6. Focus The large number of APIs makes the document appear un-focuses. You mention a focus on tier 1 interfaces (which is good) but is this the whole document or a subset of the current draft? If its the whole document I think you need to drop more! Have this document as a framework but take forward a smaller document for standardisation. Samll is better!
7. Partial Implementations How are you going to handle partial functionality implementations? Should _all_ operations support a NotImplementedException ? Should there be a stndard static method in each section to discover what is implemented? e.g. supported protocols & supported methods.
8. SAGA File Transfer. You have an 'LSF' schema for >>, >, < & <<. But in the file/directory area you use a series of attribute flags. I think you should carry these attribute flags forward into the job definition area - either way I think you should have one model not two!
General Thoughts The document needs to be broken up IMHO - either into sections or individual documents. I. General Model. Language mapping. Standard attribute model. Security. Tasks II. Tier 1 Interfaces. III Tier 2 and above Interfaces.
This would make it clearer what you are doing now & the constructs & concepts that you are using to build the model.
Hope that helps! Happy to clarify any of the above...
Regards,
Steven -- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+

... and I also want to comment on some of them :-) text inserted below. Andre. Quoting [Andre Merzky] (May 30 2005):
----- Forwarded message from Steven Newhouse <s.newhouse@omii.ac.uk> -----
In order to advance the SAGA strawman API, we need urgently some feedback from outside the authors group.
I have been carrying around for over month and have only just (late I'm afraid!) got round to looking at it. A few general comments...
1. OO vs. functional I can understand specifying in one language - no problem. But you need to show read across to different languages and systems.
The spec for now is supposed to be language independent - we will define language bindings as soon as that spec stabilizes.
2. Task Didn't understand this at all! Are you proposing to support 5 different systems or expecting one to be selected from these 5?
Oh - I wonder if we sent you an old version? Its definitely only one system - this one: package SAGA version 0.1 { package TaskSubsystem { enum State { Pending = 0, Running = 1, Finished = 2, Cancelled = 3 }; interface Task { run (); wait (in double timeout, out boolean finished); cancel (); getState (out State state); } class TaskContainer { addTask (in Task task); removeTask (in Task task); wait (in double timeout, out array<Task,1> finished); listTasks (out array<Task,1> tasks); } } } Example: Directory dir; Job job; DirectoryTaskFactory dtf = dir.createTaskFactory (); JobTaskFactory jtf = job.createTaskFactory (); Task t1 = dtf.ls (result); Task t2 = dtf.copy (source,target); Task t3 = dtf.move (source,target); Task t4 = jtf.checkpoint (); Task t5 = jtf.signal (USR); t1.run (); t2.run (); ... TaskContainer tc; tc.addTask (t1); tc.addTask (t2); ... Array finished; tc.wait (timeout, finished); Array tasks; tc.listTasks (tasks); tc.removeTask (t5);
3. Security Inconsistent use of security - it appears in the stream section but nowhere else. I think this is very wrong. Not specifying a security system makes sense, but I think all of thes emethods should have a security token being passed into them or it should be an argument in the relevant object constructor.
We have to flesh out a session handle like thing, which is also to encapsulate security. You are right - huge missing piece...
4. URL - P10 I assume the URL can support a number of protocols - http/https/file/gridftp. I see no way to register plugins in the interface (may be this is an 'internal' interface as opposed to a user funtion. Maybe there needs to be aet of interfaces to help the developers, e.g. register protocol plugins.
That is an implementation issue, really. We could imagine SAGA implementations which can only handle gsi and gridftp, and other implementations which have a plugin mechanism and can handle all types of protocols. The latter is better we think, but the API spec does not specify the implementation and architecture.
6. Focus The large number of APIs makes the document appear un-focuses. You mention a focus on tier 1 interfaces (which is good) but is this the whole document or a subset of the current draft? If its the whole document I think you need to drop more! Have this document as a framework but take forward a smaller document for standardisation. Samll is better!
Smaller is better for standardization, but also makes the API less usefull, und less consistent over the various parts. We have been pondering quite a while over this. What would be your suggestion for a better document structure?
7. Partial Implementations How are you going to handle partial functionality implementations? Should _all_ operations support a NotImplementedException ? Should there be a stndard static method in each section to discover what is implemented? e.g. supported protocols & supported methods.
Good one, we should document this: we think all implenmentations hsould provide all methods (so no compile problems), but can return an NotImplemented Exception. That is neccesary in particular if you have an plugin enabled implementation...
8. SAGA File Transfer. You have an 'LSF' schema for >>, >, < & <<. But in the file/directory area you use a series of attribute flags. I think you should carry these attribute flags forward into the job definition area - either way I think you should have one model not two!
Right, we should check this.
General Thoughts The document needs to be broken up IMHO - either into sections or individual documents. I. General Model. Language mapping. Standard attribute model. Security. Tasks II. Tier 1 Interfaces. III Tier 2 and above Interfaces.
Ah, here is the proposed structure :-) Language mappings go into specific documents anyway. To separate attributes, security and tasks seems a good approach - they are common to all classes. I am not sure if I understand your distinction between Tier 1 and Tier 2 - could you give an example please?
This would make it clearer what you are doing now & the constructs & concepts that you are using to build the model.
Hope that helps! Happy to clarify any of the above...
Regards,
Steven -- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+ o

Andre,
The spec for now is supposed to be language independent - we will define language bindings as soon as that spec stabilizes.
Sure. Some indication in this document may help people understand the type of interfaces that will be produced before the complete spec. binding is available.
Oh - I wonder if we sent you an old version?
Probably... date bask to GGF 13 or soon after.
We have to flesh out a session handle like thing, which is also to encapsulate security. You are right - huge missing piece...
I think this needs to be addressed before going much further.
That is an implementation issue, really. We could imagine SAGA implementations which can only handle gsi and gridftp, and other implementations which have a plugin mechanism and can handle all types of protocols.
But how do I as a user discover which protocols are supported by the implementation? Do I discover this by calls failing?
The latter is better we think, but the API spec does not specify the implementation and architecture.
If you want middleware providers to support the interface specifying how protocol plugins are added in is as important as specifying how users will expect to use the APIs.
I am not sure if I understand your distinction between Tier 1 and Tier 2 - could you give an example please?
Tier 1 - Stuff you are going to do now. Generic task & error framework and support for files movement and job submission. Tier 2 - Things for later (V2 of the spec?) - streams, logical file catalogues, etc. Steven -- ---------------------------------------------------------------- Dr Steven Newhouse Tel:+44 (0)2380 598789 Deputy Director, Open Middleware Infrastructure Institute (OMII) Suite 6005, Faraday Building (B21), Highfield Campus, Southampton University, Highfield, Southampton, SO17 1BJ, UK

Quoting [Steven Newhouse] (May 30 2005):
Oh - I wonder if we sent you an old version?
Probably... date bask to GGF 13 or soon after.
An up to date version is on the SAGA wiki front page: http://wiki.cct.lsu.edu/saga/ The task stuff is better described there.
That is an implementation issue, really. We could imagine SAGA implementations which can only handle gsi and gridftp, and other implementations which have a plugin mechanism and can handle all types of protocols.
But how do I as a user discover which protocols are supported by the implementation? Do I discover this by calls failing?
Uhm, that question pops up wherever we look - and there is no good answer in sight. In GAT, we allow any:// as protocol - meaning that GAT can choose whatever it finds. But that has drawbacks. Consider following URLs: ftp://my.remote.host:1234//tmp/test.dat ftp://my.remote.host//tmp/test.dat gridftp://my.remote.host//tmp/test.dat http://my.remote.host//tmp/test.dat may all refer to the same physical location - or not! This all depends on service setup. So, any:// leaves that pretty much open to wild guessing. As do the above URLs really: the user probably does not know http server root settings on all remote hosts. If you think about it that way, the stagein/stageout settings for most job description languages are equally flawed. Its a very general problem. I would e happy if somebody in the group would come up with a good approach to that. I can think of only 2, which both have flaws: 1) ALWAYS use a replica system/grid file system Flaw: that needs to get populated, and user needs to be able to globally navigate therein 2) provide a URL translation service, either for the user or to be used by the implementation. url = URLTransLate ("ftp://my.remote.host//tmp/test.dat", "http://); // url is set to http://my.remote.host//servermount/tmp/test.dat Flaw: The server would need to know about local configurations, needs to be kept in sync, requires a remote op for each action on any URL etc. Conclusion: We don't know a good answer, at least not on API level...
The latter is better we think, but the API spec does not specify the implementation and architecture.
If you want middleware providers to support the interface specifying how protocol plugins are added in is as important as specifying how users will expect to use the APIs.
You might be right - but I am not sure. If every middleware provider _can_ implement its own SAGA version in whatever way he wants, hi might actually do that. If there at some point is a SAGA implementation which allows well defined plugins, the middleware providers might use that instead, or _still_ want to implementis their own way. To be sure: we want to have a pluggable implementation (and in fact we work on such one), but that plugin specification should, in our opinion, totally distinct from the SAGA API spec. What do otheres think about this issue?
I am not sure if I understand your distinction between Tier 1 and Tier 2 - could you give an example please?
Tier 1 - Stuff you are going to do now. Generic task & error framework and support for files movement and job submission.
Tier 2 - Things for later (V2 of the spec?) - streams, logical file catalogues, etc.
I see, ok. Tier 1: - session handle - errors - attrobutes - tasks - files - logical files - job submission, brokering - streams So basically what is in the API right now Tier 2: - steering and monitoring - possibly combining logical/physical files (read on logical files) - Task dependencies (simple work flows and batches) - extensions to Tier 1 classes There is no good and explicit roadmap for Tier2 right now. Best regards, Andre.
Steven
-- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+

Hi, On Mon, 30 May 2005, Andre Merzky wrote:
If you want middleware providers to support the interface specifying how protocol plugins are added in is as important as specifying how users will expect to use the APIs.
You might be right - but I am not sure. If every middleware provider _can_ implement its own SAGA version in whatever way he wants, hi might actually do that.
If there at some point is a SAGA implementation which allows well defined plugins, the middleware providers might use that instead, or _still_ want to implementis their own way.
To be sure: we want to have a pluggable implementation (and in fact we work on such one), but that plugin specification should, in our opinion, totally distinct from the SAGA API spec.
What do otheres think about this issue?
SAGA aims to provide an API for at the application developer level, and not at the level where middleware may be plugged in; we are concentrating on the API from that level, and not specifying architecture - specifying an API to add protocol plug-ins would be out of scope. We want to keep this API small, and focussed on the application developers, giving SAGA implementors maximum freedom within that. I would hope that any implementation which is plug-in based would provide suitable documentation as to its use, and possible at a future date a working group could be setup to standardise such interfaces, but I think it is out of scope for the current group, and probably premature to try it. Cheers, Tom
participants (3)
-
Andre Merzky
-
Steven Newhouse
-
Tom Goodale