
Hi Ole, On Tue, Nov 13, 2012 at 4:54 PM, Ole Weidner <ole.weidner@rutgers.edu> wrote:
Hi Andre,
here's some quick comments w.r.t. the resource package. It's more for the records - we can talk about this in detail when we discuss the resource package on the 28th (journal club).
Sorry this took so long, I am trying to catch up with things now...
I fundamentally disagree that 'Compute' inherits from 'saga.job.Service' and that 'Storage' inherits from saga.filesystem.Directory' because this:
1. breaks SAGA's horizontally independent package model: it would mean that I can't implement a resource-package only implementation of SAGA. I would have to implement the Job and the Filesystem package as well!
A compute resource is useless if you can't submit jobs, a storage resource is useless if you can't store files. That does not change if you replace inheritance with get_job_service / get_filesystem -- in both cases, you will need to implement the job package / file package, too. While we indeed try to avoid too many cross dependencies between functional API packages, we do have them in some places, most notably for the namespace derivates. FWIW, another reason why compute resource inherits from job.service is that we intented to fix some shortcomings of the job service, in particular wanted to add the ability to directly submit JSDL. Inheritance provides a very simple means to do so. I agree though that this should not be the foremost concern for API design - but anyway. Another point though I want to make: I don't like the idea to have a job service, which is not stateful, depending on the state of a compute resource (and same for filesystem / storage resource) -- on API level, there are no means to infer if the job service is valid for job submission at any point in time (you can't get a resource handle from a job service instance) - so it always boils down to try/error. We so far managed to avoid those implicit state dependencies, and I would like to keep it this way. [Yes, a decoupling maps better to the Pilot API, but I would rather like to fix that in the Pilot API ;-)] Don't get me wrong: I understand that inheritance is a pretty strong coupling, and it does not necessarily reflect how DCIs are architected internally -- from the end user perspective though, I find this rendering simple, intuitive, and easy to use...
2. it mixes separate concerns: resource management and job submission!
I kind of agree, but think that this is set off by ease of use: get a compute resource, submit jobs to it - bang. This is, by far, the dominant use case, so I would like to see this rendered exceedingly simple.
3. I don't think that 'Directory' necessarily provides the right abstraction for all 'Storage' types. Certainly for most, but not for all. It's unnecessarily confining.
Yes, that is a limitation -- but unless we have a compelling use case for other storage abstractions, and those use cases do not imply an overly complicated approach to storage resources, that is the best abstraction we have, right? Even if the backend storage resources have a limited / constraint namespace (think Amazon S3), the filesystem abstraction still holds up nicely IMHO. Also, I am not concerned about provisioning of databases etc. -- we don't have decent (or any) abstractions for those in SAGA, nor do we have use cases that I know of -- so that would be out of scope for now.
Furthermore,
- class manager -> Manager
fixed, thanks.
- what does manager.describe_resource() do? why can't it be manager.resources[x].get_description()
Hmm, probably right - but while that works nicely in python, you would have manager.get_resource (id).get_description () and chaining is something we do not promote in the API so far. Thus, I would like to keep the method in the API, but I agree that your version is (in Python) the more intuitive one.
- speaking of resources[x] - there's no 'non-property' version, i.e., get_resource()
thanks, I'll fix that.
- I would prefer explicit list/get_compute(), list/get_storage(), list/get_network() and so on, so that we don't have to do type checking all over the place.
The list / get calls have a 'type' parameter, so you can filter for specific resource types: compute_resources = manager.list_resources (saga.resource.Compute) storage_resources = manager.list_resources (saga.resource.Storage) The default is 'Any' though, which obviously gives you all types.
- why are there two Pool.add() methods? Why do we want to be able to add resources as strings?
Alas, we have that in a few places in the API. There was a very long discussion, a long time ago, where people argued that only using handles would have too much of a performance impact (you'd always need to create handles, which is *at least* one round-trip), and that only using IDs would be too unwieldy to handle in many cases. While I agree with the first point, I do not think that the second one is very valid. That is one item I would like to clean up across SAGA in an eventual API revision (if that ever happens). So, for now that is in the resource API as well, for consistency, but I personally do not care much about it. If we limit that, then I would be in favor of the id version. Cheers, Andre.
Cheers! Ole
-- Nothing is really difficult...