Thoughts on extensions mechanisms for the HPC profile work

Hi; This email is intended to describe my views of the set of extension mechanisms that are both necessary and sufficient to implement the common cases that we have identified for the HPC profile work (see the document "HPC Use Cases - Base Case and Common Cases", a preliminary draft of which I sent out to the ogsa-wg mailing list several weeks ago). These views are in large part derived from ongoing discussions that Chris Smith and I have been having about the subject of interoperable job scheduling designs. This email is intended to start a discussion about extension mechanisms rather than define the "answer" to this topic. So please do reply with suggestions for any changes and extensions (:-)) you feel are needed. Marvin. Additive vs. modifying extensions At a high level, there are two types of extensions that one might consider: * Purely additive extensions. * Extensions that modify the semantics of the underlying base-level design. Purely additive extensions that, for example, add strictly new functionality to an interface or that define additional resource types that clients and schedulers can refer to, seem fairly straight-forward to support. Modifying extensions fall into two categories: * Base case semantics remain unchanged to parties operating at the base (i.e. un-extended) level. * Base case semantics change for parties operating at the base level. Modifying extensions that leave the base-level semantics unchanged are straight-forward to incorporate. An example is adding at-most once semantics to interface requests. These operations now have more tightly defined failure semantics, but their functional semantics remain unchanged and base-level clients can safely ignore the extended semantics. Extensions that change base-level semantics should be disallowed since they violate the fundamental premise of base-level interoperability. An example of such an extension would be having the creation of jobs at a particular (extended) scheduler require that the client issue an additional explicit resource deallocation request once a job has terminated. Base-level clients would not know to do this and the result would be an incorrectly functioning system. Types of extensions I believe the following types of extensions are both necessary and sufficient to meet the needs of the HPC profile work: * Addition of new WSDL operations. * This is needed to support additional new functionality, such as the addition of suspend/resume operations. As long as base-level semantics aren't modified, this form of extension seems to be straight-forward. * Addition of additional parameters to existing WSDL operations. * As long as base-level semantics are maintained, this form of extension is also straight-forward. An example is adding a notification callback parameter to job creation requests. However, it is not clear whether all tooling can readily handle this form of "operation overloading". It may be better - from a pragmatic point-of-view - to define new WSDL operations (with appropriately defined names) that achieve the same effect. * Support for array operations and other forms of batching. * When 1000's of jobs are involved the efficiency gains of employing array operations for things like queries or abort requests are too significant to ignore. Hence a model in which every job must be interacted with on a strictly individual basis via an EPR is arguably unacceptable. * One approach would be to simply add array operations alongside the corresponding individual operations, so that one can selectively interact with jobs (as well as things like data files) in either an "object-oriented" fashion or in "bulk-array" fashion. One could observe that the array operations enable the corresponding individual operations as a trivial special case, but this would arguably violate the principle of defining a minimalist base case and then employing only extensions (rather than replacements). * Array operations are an example or a service-oriented rather than a resource-oriented form of interaction: clients send a single request to a job scheduler (service) that refers to an array of many resources, such as jobs. This raises the question of whether things like jobs should be referred to via EPRs or via unique "abstract" names that are independent of any given service's contact address. At a high level, the choice is unimportant since the client submitting an array operation request is simply using either one as a unique (and opaque) identifier for the relevant resource. On a pragmatic level one might argue that abstract names are easier and more efficient to deal with than EPRs since the receiving scheduler will need to parse EPRs to extract what is essentially the abstract name for each resource. (Using arrays of abstract names rather than arrays of EPRs is also more efficient from a size point-of-view.) * If abstract names are used in array operations then it will necessary that individual operations return the abstract name and not just an EPR for a given resource, such as a job. If this approach is chosen then this implies that the base case design and implementation must return abstract names and not just EPRs for things like jobs. * Extensions to state diagrams. * Chris Smith is in the process of writing up this topic. * Standardized extensions to things like resource definitions and other declarative definitions (e.g. about provisioning). * The base use case assumes a small, fixed set of "standard" resources and other concepts (e.g. working directory) that may be described/requested. The simplest extension approach is to define additional specific "standard sets" that clients and services can refer to by their global name (e.g. the posix resource description set or the Windows resource description set) and of which they pick exactly one to use for any given interaction. * The problem with this simplest form of extension is that it provides only a very crude form of extensibility with no notion of composition or incremental extension of existing definition sets. This is sufficient for very course-grained characterizations, such as "Windows environment" versus "Posix environment", but not for finer-grained resource extensions. An alternative is to define composable sets that cover specific "subjects" (e.g. GPUs). In the extreme, these sets could be of size 1. This implies that clients and services need to be able to deal with the power set of all possible meaningful combinations of these sets. As long as individual definitions are independent of each other (i.e. the semantics of specifying A is unchanged by specifying B in the same description) this isn't a big problem. Allowing the presence of different items in a description to affect each other's semantics is arguably a variation on modifying the base-level semantics of a design via some extension to the design and hence should be disallowed. * If resource descriptions are used only for "matchmaking" against other resource descriptions then another approach is to allow arbitrary resource types whose semantics are not understood by the HPC infrastructure, which deals with them only as abstract entities whose names can be compared textually and whose associated values can be compared textually or numerically depending on their data type. It is important to understand that, whereas the "mechanical" aspects of an HPC infrastructure can mostly be built without having to know the semantics of these abstract resource types, their semantics must still be standardized and well-known at the level of the human beings using and programming the system. Both the descriptions of available computational resources and of client requests for reserving and using such resources must be specified in a manner that will cause the underlying HPC "matchmaking" infrastructure to do the right thing. This matchmaking approach is exemplified by systems such Condor's class ads system. * It should be noted that a generalized matchmaking system is not a trivial thing to implement efficiently and hence one can reasonably imagine extensions based on any of the above approaches to extending resource (and other) definitions. * Hierarchical and extended representations of information. * XML infosets provide a very convenient way to represent extended descriptions of a particular piece of information. * Another form of hierarchical information display shows up when multi-level scheduling systems are involved. In this case it may be desirable to represent information either in a form that hides the scheduling hierarchy or in a form that reflects it. Consider how to represent the list of compute nodes for a job running across multiple clusters: A flat view might list all compute nodes in an undifferentiated list. A hierarchical view might provide a list of clusters, each of which describes information about a cluster, including a list of the compute nodes in that cluster that the job is running on. Both views have their uses. XML infosets are convenient for encoding the syntax of either view, but an extension supporting information representation in these sorts of systems will also have to define the semantics of all allowed hierarchies. * Decomposition of functionality into "micro" protocols. * Micro protocols should reflect things that must occur at different times (e.g. resource reservation/allocation vs. resource use/job-execution) or that can be employed in a stand-alone manner (e.g. job execution vs. data transfer). The decomposition that seems relevant for the HPC use cases (i.e. are visible to clients) is the following: * The base case involves interaction between a client and a scheduler for purposes of executing a job. * A client may wish to independently reserve, or pre-allocate resources for later and/or guaranteed use. Note that this is different from simply submitting a job for execution to a scheduler that then queues the job for later execution - perhaps at a specific time requested by the client. For example, a meta-scheduler might wish to reserve resources so that it can make informed scheduling decisions about which "subsidiary" scheduler to send various jobs to. Similarly, a client might wish to reserve resources so as to run two separate jobs in succession to each other, with one job writing output to a scratch storage system and the second job reading that output as its input without having to worry that the data might have vanished during the interval that occurs between the execution of the two jobs. * A client may wish to query a scheduler to learn what resources might be available to it, without actually laying claim to any resources as part of the query (let alone execute anything using those resources). Scheduling candidate set generators or matchmaking services such as Condor would want this functionality. * A client may need to transfer specific data objects (e.g. files) to and from a system that is under the control of a job scheduling service. * Micro protocols may have relationships to each other. For example, job execution will need to be able to accept a handle of some sort to resources that have already been allocated to the requesting client.

Marvin and all: One additional comment I would add for consideration with extensible content (operations, resource models, etc.) is that there is a practical need for two complementary mechanisms that is often overlooked: 1. Runtime meta-language for marking criticality of extended content, e.g. marking an extension field as "OK to ignore" or "MUST be understood" so that a service in a heterogeneous environment can decide whether to proceed when it encounters some newfangled extension that is not implemeneted in the service. I would argue that there is no default policy that is appropriate for a majority of environments. Making the wrong choices on an extension-by-extension basis can cause faulty behavior and/or waste. I think there is a tendancy to use undisciplined "xsd:any" syntax in GGF documents lately, and I think it is a mistake. Please see the createAgreement operation extensibility of recent WS-Agreement drafts for my take on what is needed at minimum. We define an "OK to ignore" wrapper so that the service can disambiguate required versus optional extension fields in the input message. Unwrapped extensions are assumed to be mandatory/critical. 2. Discovery mechanisms for extensions supported by services. This obviously should complement what other discovery mechanisms are under discussion for job management. This is what will enable efficient brokering/routing of requests in a heterogeneous environment. The runtime disambiguation in (1) is more important if we have a general "aspect oriented" extension mechanism where, as you mentioned, there is a power-set of possible job descriptions. With a more limited profile/dialect approach, there would be a much smaller set of defined combinations. The art is probably finding the right hybrid of some "major" dialects with "minor" aspects so that major contradictory dialects cannot be mixed by accident, but simple minor extensions are not forced into this extend-by-replacement methodology. karl -- Karl Czajkowski karlcz@univa.com

Hi; You're absolutely right that we require some sort of discovery mechanism for determining which extensions are supported by a given service. I would argue that this is a general problem where we should be following the lead of the broader web services community. That said, I don't think that that community has settled on anything yet -- people please correct me if I'm wrong on this -- and that we may well need to define our own mechanism in the interim. We should make sure that we design things so that we can easily migrate to whatever the industry standardizes on whenever that becomes available. Regarding your suggestion for having a runtime meta-language for marking content as "ok to ignore" or must be understood", I have several questions/requests: * When you say "meta-language" are you implying something richer than these two choices? I can imagine at least two answers to this question: * "Simple" (and hence also efficient) resource matchmaking typically involves (mostly) exact matches. Adding a simple binary notion of an optional resource requirement adds a powerful descriptive capability without substantially complicating the matchmaking system. * You want a much more expressive resource description/matchmaking language that lets you specify all kinds of complicated concepts, such as prioritization of optional alternatives. * It would be great if you could provide a variety of example use cases. I personally agree with your view that having a small set of major dialects with minor aspect extensions seems like the most likely approach to be successful. Having a concrete set of examples will make the design conversations much more focused. * Without wanting to comment on the specifics of GGF documents, I think of the use of xsd:any as being as markers for extensibility in specific protocols. Profiles then define and constrain how those xsd:any fields may be turned into more concrete (extension) specifications. Marvin. -----Original Message----- From: Karl Czajkowski [mailto:karlcz@univa.com] Sent: Friday, April 28, 2006 11:09 PM To: Marvin Theimer Cc: ogsa-wg@ggf.org Subject: Re: [ogsa-wg] Thoughts on extensions mechanisms for the HPC profile work Marvin and all: One additional comment I would add for consideration with extensible content (operations, resource models, etc.) is that there is a practical need for two complementary mechanisms that is often overlooked: 1. Runtime meta-language for marking criticality of extended content, e.g. marking an extension field as "OK to ignore" or "MUST be understood" so that a service in a heterogeneous environment can decide whether to proceed when it encounters some newfangled extension that is not implemeneted in the service. I would argue that there is no default policy that is appropriate for a majority of environments. Making the wrong choices on an extension-by-extension basis can cause faulty behavior and/or waste. I think there is a tendancy to use undisciplined "xsd:any" syntax in GGF documents lately, and I think it is a mistake. Please see the createAgreement operation extensibility of recent WS-Agreement drafts for my take on what is needed at minimum. We define an "OK to ignore" wrapper so that the service can disambiguate required versus optional extension fields in the input message. Unwrapped extensions are assumed to be mandatory/critical. 2. Discovery mechanisms for extensions supported by services. This obviously should complement what other discovery mechanisms are under discussion for job management. This is what will enable efficient brokering/routing of requests in a heterogeneous environment. The runtime disambiguation in (1) is more important if we have a general "aspect oriented" extension mechanism where, as you mentioned, there is a power-set of possible job descriptions. With a more limited profile/dialect approach, there would be a much smaller set of defined combinations. The art is probably finding the right hybrid of some "major" dialects with "minor" aspects so that major contradictory dialects cannot be mixed by accident, but simple minor extensions are not forced into this extend-by-replacement methodology. karl -- Karl Czajkowski karlcz@univa.com

On May 01, Marvin Theimer modulated:
Hi;
You're absolutely right that we require some sort of discovery mechanism for determining which extensions are supported by a given service. I would argue that this is a general problem where we should be following the lead of the broader web services community. That said, I don't think that that community has settled on anything yet -- people please correct me if I'm wrong on this -- and that we may well need to define our own mechanism in the interim. We should make sure that we design things so that we can easily migrate to whatever the industry standardizes on whenever that becomes available.
Right, I haven't seen anything adequate from the WS community either. It is funny that simple things like critical/non-critical protocol extensions, which existed even in LDAP are not carried forward here...
Regarding your suggestion for having a runtime meta-language for marking content as "ok to ignore" or must be understood", I have several questions/requests:
• When you say "meta-language" are you implying something richer than these two choices? I can imagine at least two answers to this question: □ "Simple" (and hence also efficient) resource matchmaking typically involves (mostly) exact matches. Adding a simple binary notion of an optional resource requirement adds a powerful descriptive capability without substantially complicating the matchmaking system.
I wanted to raise the general issue in case others have requirements/opinions. I think a binary model is a good one for a basic interface.
□ You want a much more expressive resource description/ matchmaking language that lets you specify all kinds of complicated concepts, such as prioritization of optional alternatives.
WS-Agreement has a much more elaborate meta-language which can capture prioritization and even cost/optimization models for whole combinations of domain-specific constraints (in some sense, every service description is an "extension" in WS-Agreement). I completely agree that this is a hard problem and by the time you want to support this, you are probably better of going to a protocol like WS-Agreement that has defined it in from the start. It is not only the runtime meta-language, but also the discovery model used to expose these options, which become more complex.
• It would be great if you could provide a variety of example use cases. I personally agree with your view that having a small set of major dialects with minor aspect extensions seems like the most likely approach to be successful. Having a concrete set of examples will make the design conversations much more focused.
I think it will be easier to do this after a few of the high-priority (or low-hanging) problem domains have been identified. I don't want to go off into a never-ending modeling exercise... I would like to help define job-description models which can serve equally well in a "basic HPC job protocol" as well as in WS-Agreement descriptions of the same job. In otherwords, a product of this effort needs to be the standardized job ontology that is the basis for these protocols. (There, I said it. ;-)
• Without wanting to comment on the specifics of GGF documents, I think of the use of xsd:any as being as markers for extensibility in specific protocols. Profiles then define and constrain how those xsd:any fields may be turned into more concrete (extension) specifications.
Right, the problem is that without the runtime distinction of critical/non-critical nor an appropriate discovery system, this approach leads to extremely fragile systems. Basically, there is no good way to find out who understands your dialect and everyone has to be paranoid and reject any dialect or jargon they do not understand. The two mechanisms are complementary in getting "good enough" processing to occur in a heterogeneous and evolving environment.
Marvin.
karl -- Karl Czajkowski karlcz@univa.com

Marvin Theimer wrote:
Regarding your suggestion for having a runtime meta-language for marking content as "ok to ignore" or must be understood", I have several questions/requests:
* When you say "meta-language" are you implying something richer than these two choices? I can imagine at least two answers to this question: o "Simple" (and hence also efficient) resource matchmaking typically involves (mostly) exact matches. Adding a simple binary notion of an optional resource requirement adds a powerful descriptive capability without substantially complicating the matchmaking system.
It would be so nice if that was true. Simple matchmaking comes in two varieties according to the basic type of the resource being matched. Capabilities (like the ability to run a particular application) are straight matched as described, but capacities are typically matched according to the scheme where a user wants "at least this much" and the provider has "at most that much" so it's really testing for inequality satisfiability or set overlap. What's more, alternatives are another one of these things that seems to be distinctly confusing, especially as it turns out to be very difficult for users to really understand the space of potential alternatives open to them. A better approach seems to be for users to specify their *real* requirements, and for some kind of intermediate agent to translate from those into terms understood by the resource providers.
o You want a much more expressive resource description/matchmaking language that lets you specify all kinds of complicated concepts, such as prioritization of optional alternatives.
Personally, I think that prioritization sucks. Scoring (which is sort-of but not quite the same thing) works better as it is far more flexible. It's also easier to apply to things other than the initial job request; far better to say "I prefer cheapest/quickest" after getting the tenders than to try to figure out what the space of tenders is going to look like before soliciting for them. Donal (I suspect I'm not being clear enough...)

Hi; I mis-spoke when referring to resource matchmaking systems as being mostly about exact matches. What I had in mind was exactly what you described and your description is much better than mine. Many thanks for the correction and improved description! I agree with you that specifying alternatives and prioritizations is difficult. Personally, I'm not convinced that specifying specific scores is much easier (having had to use such a system once). In general, I would argue that keeping things simple is most effective and that we should make sure that specifying simple common cases is both easy and efficient to do. The beauty of an evolutionary, extensions-based design is that we can start with simple approaches, layer more complex alternatives on top as desired, and then let user experience decide which extensions are actually useful. Along those lines, I think Karl's suggestion of having a binary "ignore/must-support" flag represents a relatively simple extension, whereas anything beyond would represent a much more complicated extension. Marvin. -----Original Message----- From: Donal K. Fellows [mailto:donal.k.fellows@manchester.ac.uk] Sent: Wednesday, May 03, 2006 3:16 AM To: Marvin Theimer Cc: ogsa-wg@ggf.org Subject: Re: [ogsa-wg] Thoughts on extensions mechanisms for the HPC profile work Marvin Theimer wrote:
Regarding your suggestion for having a runtime meta-language for marking content as "ok to ignore" or must be understood", I have several questions/requests:
* When you say "meta-language" are you implying something richer than these two choices? I can imagine at least two answers to this question: o "Simple" (and hence also efficient) resource matchmaking typically involves (mostly) exact matches. Adding a simple binary notion of an optional resource requirement adds a powerful descriptive capability without substantially complicating the matchmaking system.
It would be so nice if that was true. Simple matchmaking comes in two varieties according to the basic type of the resource being matched. Capabilities (like the ability to run a particular application) are straight matched as described, but capacities are typically matched according to the scheme where a user wants "at least this much" and the provider has "at most that much" so it's really testing for inequality satisfiability or set overlap. What's more, alternatives are another one of these things that seems to be distinctly confusing, especially as it turns out to be very difficult for users to really understand the space of potential alternatives open to them. A better approach seems to be for users to specify their *real* requirements, and for some kind of intermediate agent to translate from those into terms understood by the resource providers.
o You want a much more expressive resource description/matchmaking language that lets you specify all kinds of complicated concepts, such as prioritization of optional alternatives.
Personally, I think that prioritization sucks. Scoring (which is sort-of but not quite the same thing) works better as it is far more flexible. It's also easier to apply to things other than the initial job request; far better to say "I prefer cheapest/quickest" after getting the tenders than to try to figure out what the space of tenders is going to look like before soliciting for them. Donal (I suspect I'm not being clear enough...)

Marvin Theimer wrote:
I mis-spoke when referring to resource matchmaking systems as being mostly about exact matches. What I had in mind was exactly what you described and your description is much better than mine. Many thanks for the correction and improved description!
Resource matching code is what I've been working on for a few years now. It's really why I started engaging fully with GGF; so I could have fewer resource description languages to have to work with. :-)
I agree with you that specifying alternatives and prioritizations is difficult. Personally, I'm not convinced that specifying specific scores is much easier (having had to use such a system once). In general, I would argue that keeping things simple is most effective and that we should make sure that specifying simple common cases is both easy and efficient to do.
I know scoring is hard too, though it's not too bad if you're keeping the scoring of things separate from a filtering stage. The difficulty is really if you try to understand the score values themselves; my experience is that they're pretty arbitrary and often with quite large magnitudes.
The beauty of an evolutionary, extensions-based design is that we can start with simple approaches, layer more complex alternatives on top as desired, and then let user experience decide which extensions are actually useful. Along those lines, I think Karl's suggestion of having a binary "ignore/must-support" flag represents a relatively simple extension, whereas anything beyond would represent a much more complicated extension.
I'd prefer a "mayIgnore" flag. :-) OK, the reason for this is that I believe that if someone is asking for a feature they should get it or get a definite early failure by default, and if they want to specify that they have optional resource requirements they'll have to do extra work for it. I suppose it's trying to make the default case (which I always think of as "false" for boolean flags; I think that's a notion that's programmed into the brains of many people other than me too) be the common one. On the other hand, I question the extent to which an optional resource requirement has any real meaning anyway. The only times I can see a use are for when you're really trying to capture some other sense of resource by proxy, such as asking for certain operating systems with an explicit executable path instead of asking for the abstract application name that is installed at that location on those systems. But I feel that people should not ask for such proxies for what they desire; they should say what they really need (Blast, Gaussian, etc.) and let the middleware take the strain. Since checking for optional resources makes writing resource checkers much more complex and yet only gives you a feature that I think shouldn't be used, what's the point of putting it in? It just makes life harder, including for us as spec writers. To elaborate on the last point, I'd like to say that many revisions of JSDL had a complex language for resource composition, but we threw it out. This was in part because we were having problems coming up with good use-cases for it (we could construct silly examples, but nothing that we'd actually want to use in practice; even the complex cases were far better served by the introduction of some resource virtualization than trying to use the several composition schemes we tried) but was also due to the fact that it proved really hard to write a good spec of what happens when a resource is optional. What do you do if a resource is missing? How do you communicate to the application what resources were actually allocated? Should there be some kind of preferencing system? What is the semantics of the composition system? What does negation/complement mean? How do you nail the schema down hard enough so that stupidities don't slip through? Throwing that whole lot out was one of the best things we did as a working group. We'd still be at it now otherwise. :-) It did take several major rounds of revisions to throw it all out though, and it's arguable that we missed a little bit of it (i.e. JSDL's somewhat peculiar outer structure). That's probably going to turn out useful though; the world is funny that way... Donal.

On May 03, Donal K. Fellows modulated: ...
On the other hand, I question the extent to which an optional resource requirement has any real meaning anyway. The only times I can see a use are for when you're really trying to capture some other sense of resource by proxy, such as asking for certain operating systems with an explicit executable path instead of asking for the abstract application name that is installed at that location on those systems. But I feel that people should not ask for such proxies for what they desire; they should say what they really need (Blast, Gaussian, etc.) and let the middleware take the strain.
But aren't we defining standards for the middleware to follow (and not necessarily the people)? I think it is a HUGE mistake to keep talking about these standards efforts as if they are human interfaces. That will not give us the robust machine-to-machine communication we require, including the ability to evolve in place etc. Human information consumers are too flexible and adaptable to be the evaluation criteria for whether a protocol is going to be robust between heterogeneous software agents... The point of the "optional extension" is to allow software components to behave with graceful degradation in a heterogeneous environment. If there are basic and extended ways to describe what the client software wants/needs and the extended one is "better" but not critical to function, this allows the single message exchange to express all of that and get the best available behavior. So, I think "may ignore, but should try not to" is about the right semantics for this flag. :-) I agree that critical handling is the appropriate default for extensions. My main point is that the criticality of the extension is instance-specific to the use and NOT usually a characteristic of the extended concept itself. karl -- Karl Czajkowski karlcz@univa.com

Hi; The beauty of a "may ignore" flag is that it can be ignored. :-) So, whereas I side with Donal in believing that complex optional resource descriptions rarely work well, I side with Karl in believing that the flag should be available at the protocol level (as an extension case since I'm trying to be hard-nosed about not letting anything creep into the base case that isn't absolutely necessary for minimal interop). It's worth noting that the concept of "optionally understood" parameters is something that many consider to be one of the bigger success stories of the way the Web works (as compared to Web services). Marvin. -----Original Message----- From: Karl Czajkowski [mailto:karlcz@univa.com] Sent: Wednesday, May 03, 2006 9:07 PM To: Donal K. Fellows Cc: Marvin Theimer; ogsa-wg@ggf.org Subject: Re: [ogsa-wg] Thoughts on extensions mechanisms for the HPC profile work On May 03, Donal K. Fellows modulated: ...
On the other hand, I question the extent to which an optional resource requirement has any real meaning anyway. The only times I can see a use are for when you're really trying to capture some other sense of resource by proxy, such as asking for certain operating systems with an explicit executable path instead of asking for the abstract application name that is installed at that location on those systems. But I feel that people should not ask for such proxies for what they desire; they should say what they really need (Blast, Gaussian, etc.) and let the middleware take the strain.
But aren't we defining standards for the middleware to follow (and not necessarily the people)? I think it is a HUGE mistake to keep talking about these standards efforts as if they are human interfaces. That will not give us the robust machine-to-machine communication we require, including the ability to evolve in place etc. Human information consumers are too flexible and adaptable to be the evaluation criteria for whether a protocol is going to be robust between heterogeneous software agents... The point of the "optional extension" is to allow software components to behave with graceful degradation in a heterogeneous environment. If there are basic and extended ways to describe what the client software wants/needs and the extended one is "better" but not critical to function, this allows the single message exchange to express all of that and get the best available behavior. So, I think "may ignore, but should try not to" is about the right semantics for this flag. :-) I agree that critical handling is the appropriate default for extensions. My main point is that the criticality of the extension is instance-specific to the use and NOT usually a characteristic of the extended concept itself. karl -- Karl Czajkowski karlcz@univa.com

Marvin Theimer wrote:
Hi;
The beauty of a "may ignore" flag is that it can be ignored. :-) So, whereas I side with Donal in believing that complex optional resource descriptions rarely work well, I side with Karl in believing that the flag should be available at the protocol level (as an extension case since I'm trying to be hard-nosed about not letting anything creep into the base case that isn't absolutely necessary for minimal interop).
It's worth noting that the concept of "optionally understood" parameters is something that many consider to be one of the bigger success stories of the way the Web works (as compared to Web services).
Marvin.
Sometimes. But then there is the way that you have to comment out stuff in script tags in case a legacy browser renders your javascript, which strikes me as a failure mode of the HTML extension system: <script><!-- fun something() { .. } --></script> In SOAP1.1 the mustUnderstand logic is simple and pretty much all you need. SOAP1.2's MU logic is way more convoluted, and that makes it a dog to test -which, in a TDD process, means a dog to implement, and will inevitably a source of problems for years to come. One issue with mustUnderstand logic, is what does "understand" mean? Does it mean "recognise", or does it mean "process in 100% compliance with the official specification of what this soap header required". Axis1.1 shipped with the check for all headers being understood taking place *after* the message was delivered. While following the letter of SOAP1.1, and passing the limited (stateless) tests of SOAPBuilders, it violated the spirit of the spec quite blatantly. It's testing those optional bits that really hurts. Unless the clients implement all possible bits of optional behaviour, you cannot tests that the endpoints implement it properly. This essentially makes it impossible to make any declaration about the quality of implementation of any optional part of a spec -you have to just hope for the best. summary: whenever you mark something as optional, you create an interop problem. -Steve

Hi;
The beauty of a "may ignore" flag is that it can be ignored. :-) So, whereas I side with Donal in believing that complex optional resource descriptions rarely work well, I side with Karl in believing that the flag should be available at the protocol level (as an extension case since I'm trying to be hard-nosed about not letting anything creep into the base case that isn't absolutely necessary for minimal interop).
It's worth noting that the concept of "optionally understood"
Hi; I agree with you that we should try to keep to as simple a means of specifying the concept of "must understand" or "may ignore" as possible. I haven't looked at the difference between how soap1.1 and soap1.2 do things and will go off and do my homework. One thing that I believe is happening in this conversation is that two separable issues are being convolved together. One issue is whether or not to allow optional feature specifications. The other has to do with the main concern that I heard in your email, namely that "optional" features are often ill-defined in terms of their semantics. I would argue that the semantics of optional features need to be precisely defined -- including what it means if the feature is optional. I would argue that ANY feature that isn't precisely defined in a spec has the potential to cause interop problems. So I would argue that "must understand" features must be implemented precisely and completely. Alternatively, "may ignore" features must be either completely and precisely implemented or not at all (meaning the semantics provided must be those if the feature had not been mentioned). Marvin. -----Original Message----- From: owner-ogsa-wg@ggf.org [mailto:owner-ogsa-wg@ggf.org] On Behalf Of Steve Loughran Sent: Tuesday, May 09, 2006 8:12 AM To: ogsa-wg@ggf.org Subject: Re: [ogsa-wg] Thoughts on extensions mechanisms for the HPC profile work Marvin Theimer wrote: parameters
is something that many consider to be one of the bigger success stories of the way the Web works (as compared to Web services).
Marvin.
Sometimes. But then there is the way that you have to comment out stuff in script tags in case a legacy browser renders your javascript, which strikes me as a failure mode of the HTML extension system: <script><!-- fun something() { .. } --></script> In SOAP1.1 the mustUnderstand logic is simple and pretty much all you need. SOAP1.2's MU logic is way more convoluted, and that makes it a dog to test -which, in a TDD process, means a dog to implement, and will inevitably a source of problems for years to come. One issue with mustUnderstand logic, is what does "understand" mean? Does it mean "recognise", or does it mean "process in 100% compliance with the official specification of what this soap header required". Axis1.1 shipped with the check for all headers being understood taking place *after* the message was delivered. While following the letter of SOAP1.1, and passing the limited (stateless) tests of SOAPBuilders, it violated the spirit of the spec quite blatantly. It's testing those optional bits that really hurts. Unless the clients implement all possible bits of optional behaviour, you cannot tests that the endpoints implement it properly. This essentially makes it impossible to make any declaration about the quality of implementation of any optional part of a spec -you have to just hope for the best. summary: whenever you mark something as optional, you create an interop problem. -Steve

Marvin I think this is a good start. I did find some areas missing such as Workflow and support for Job Dependencies as well as extensions for MPI/UPC programs. I like the "object-oriented" approach and agree with Dave that being able to specify more complex expressions is important and is in my opinion a requirement for ease of use. If you launch a 1000 job you want to be able to do query on groups of jobs without having to specify the individual jobs. Regards Susanne -----Original Message----- From: owner-ogsa-wg@ggf.org [mailto:owner-ogsa-wg@ggf.org] On Behalf Of Marvin Theimer Sent: Friday, April 28, 2006 10:06 PM To: ogsa-wg@ggf.org Subject: [ogsa-wg] Thoughts on extensions mechanisms for the HPC profile work Hi; This email is intended to describe my views of the set of extension mechanisms that are both necessary and sufficient to implement the common cases that we have identified for the HPC profile work (see the document "HPC Use Cases - Base Case and Common Cases", a preliminary draft of which I sent out to the ogsa-wg mailing list several weeks ago). These views are in large part derived from ongoing discussions that Chris Smith and I have been having about the subject of interoperable job scheduling designs. This email is intended to start a discussion about extension mechanisms rather than define the "answer" to this topic. So please do reply with suggestions for any changes and extensions (:-)) you feel are needed. Marvin. Additive vs. modifying extensions At a high level, there are two types of extensions that one might consider: * Purely additive extensions. * Extensions that modify the semantics of the underlying base-level design. Purely additive extensions that, for example, add strictly new functionality to an interface or that define additional resource types that clients and schedulers can refer to, seem fairly straight-forward to support. Modifying extensions fall into two categories: * Base case semantics remain unchanged to parties operating at the base (i.e. un-extended) level. * Base case semantics change for parties operating at the base level. Modifying extensions that leave the base-level semantics unchanged are straight-forward to incorporate. An example is adding at-most once semantics to interface requests. These operations now have more tightly defined failure semantics, but their functional semantics remain unchanged and base-level clients can safely ignore the extended semantics. Extensions that change base-level semantics should be disallowed since they violate the fundamental premise of base-level interoperability. An example of such an extension would be having the creation of jobs at a particular (extended) scheduler require that the client issue an additional explicit resource deallocation request once a job has terminated. Base-level clients would not know to do this and the result would be an incorrectly functioning system. Types of extensions I believe the following types of extensions are both necessary and sufficient to meet the needs of the HPC profile work: * Addition of new WSDL operations. * This is needed to support additional new functionality, such as the addition of suspend/resume operations. As long as base-level semantics aren't modified, this form of extension seems to be straight-forward. * Addition of additional parameters to existing WSDL operations. * As long as base-level semantics are maintained, this form of extension is also straight-forward. An example is adding a notification callback parameter to job creation requests. However, it is not clear whether all tooling can readily handle this form of "operation overloading". It may be better - from a pragmatic point-of-view - to define new WSDL operations (with appropriately defined names) that achieve the same effect. * Support for array operations and other forms of batching. * When 1000's of jobs are involved the efficiency gains of employing array operations for things like queries or abort requests are too significant to ignore. Hence a model in which every job must be interacted with on a strictly individual basis via an EPR is arguably unacceptable. * One approach would be to simply add array operations alongside the corresponding individual operations, so that one can selectively interact with jobs (as well as things like data files) in either an "object-oriented" fashion or in "bulk-array" fashion. One could observe that the array operations enable the corresponding individual operations as a trivial special case, but this would arguably violate the principle of defining a minimalist base case and then employing only extensions (rather than replacements). * Array operations are an example or a service-oriented rather than a resource-oriented form of interaction: clients send a single request to a job scheduler (service) that refers to an array of many resources, such as jobs. This raises the question of whether things like jobs should be referred to via EPRs or via unique "abstract" names that are independent of any given service's contact address. At a high level, the choice is unimportant since the client submitting an array operation request is simply using either one as a unique (and opaque) identifier for the relevant resource. On a pragmatic level one might argue that abstract names are easier and more efficient to deal with than EPRs since the receiving scheduler will need to parse EPRs to extract what is essentially the abstract name for each resource. (Using arrays of abstract names rather than arrays of EPRs is also more efficient from a size point-of-view.) * If abstract names are used in array operations then it will necessary that individual operations return the abstract name and not just an EPR for a given resource, such as a job. If this approach is chosen then this implies that the base case design and implementation must return abstract names and not just EPRs for things like jobs. * Extensions to state diagrams. * Chris Smith is in the process of writing up this topic. * Standardized extensions to things like resource definitions and other declarative definitions (e.g. about provisioning). * The base use case assumes a small, fixed set of "standard" resources and other concepts (e.g. working directory) that may be described/requested. The simplest extension approach is to define additional specific "standard sets" that clients and services can refer to by their global name (e.g. the posix resource description set or the Windows resource description set) and of which they pick exactly one to use for any given interaction. * The problem with this simplest form of extension is that it provides only a very crude form of extensibility with no notion of composition or incremental extension of existing definition sets. This is sufficient for very course-grained characterizations, such as "Windows environment" versus "Posix environment", but not for finer-grained resource extensions. An alternative is to define composable sets that cover specific "subjects" (e.g. GPUs). In the extreme, these sets could be of size 1. This implies that clients and services need to be able to deal with the power set of all possible meaningful combinations of these sets. As long as individual definitions are independent of each other (i.e. the semantics of specifying A is unchanged by specifying B in the same description) this isn't a big problem. Allowing the presence of different items in a description to affect each other's semantics is arguably a variation on modifying the base-level semantics of a design via some extension to the design and hence should be disallowed. * If resource descriptions are used only for "matchmaking" against other resource descriptions then another approach is to allow arbitrary resource types whose semantics are not understood by the HPC infrastructure, which deals with them only as abstract entities whose names can be compared textually and whose associated values can be compared textually or numerically depending on their data type. It is important to understand that, whereas the "mechanical" aspects of an HPC infrastructure can mostly be built without having to know the semantics of these abstract resource types, their semantics must still be standardized and well-known at the level of the human beings using and programming the system. Both the descriptions of available computational resources and of client requests for reserving and using such resources must be specified in a manner that will cause the underlying HPC "matchmaking" infrastructure to do the right thing. This matchmaking approach is exemplified by systems such Condor's class ads system. * It should be noted that a generalized matchmaking system is not a trivial thing to implement efficiently and hence one can reasonably imagine extensions based on any of the above approaches to extending resource (and other) definitions. * Hierarchical and extended representations of information. * XML infosets provide a very convenient way to represent extended descriptions of a particular piece of information. * Another form of hierarchical information display shows up when multi-level scheduling systems are involved. In this case it may be desirable to represent information either in a form that hides the scheduling hierarchy or in a form that reflects it. Consider how to represent the list of compute nodes for a job running across multiple clusters: A flat view might list all compute nodes in an undifferentiated list. A hierarchical view might provide a list of clusters, each of which describes information about a cluster, including a list of the compute nodes in that cluster that the job is running on. Both views have their uses. XML infosets are convenient for encoding the syntax of either view, but an extension supporting information representation in these sorts of systems will also have to define the semantics of all allowed hierarchies. * Decomposition of functionality into "micro" protocols. * Micro protocols should reflect things that must occur at different times (e.g. resource reservation/allocation vs. resource use/job-execution) or that can be employed in a stand-alone manner (e.g. job execution vs. data transfer). The decomposition that seems relevant for the HPC use cases (i.e. are visible to clients) is the following: * The base case involves interaction between a client and a scheduler for purposes of executing a job. * A client may wish to independently reserve, or pre-allocate resources for later and/or guaranteed use. Note that this is different from simply submitting a job for execution to a scheduler that then queues the job for later execution - perhaps at a specific time requested by the client. For example, a meta-scheduler might wish to reserve resources so that it can make informed scheduling decisions about which "subsidiary" scheduler to send various jobs to. Similarly, a client might wish to reserve resources so as to run two separate jobs in succession to each other, with one job writing output to a scratch storage system and the second job reading that output as its input without having to worry that the data might have vanished during the interval that occurs between the execution of the two jobs. * A client may wish to query a scheduler to learn what resources might be available to it, without actually laying claim to any resources as part of the query (let alone execute anything using those resources). Scheduling candidate set generators or matchmaking services such as Condor would want this functionality. * A client may need to transfer specific data objects (e.g. files) to and from a system that is under the control of a job scheduling service. * Micro protocols may have relationships to each other. For example, job execution will need to be able to accept a handle of some sort to resources that have already been allocated to the requesting client.

Hi; You're right that I wasn't thinking about the most general cases that can occur in workflow and parallel/distributed programs, such as MPI or UPC programs. In particular, I was only thinking about things that can be declaratively described and don't require the provision of code that needs to run "inside" the scheduling infrastructure. Let's consider workflow and job dependencies first. Static workflows and job dependencies can be declaratively described in the form of XML infosets using standardized terminology. For these I claim that the extension mechanisms I've described are sufficient. That is, supporting static workflows and job dependencies is a matter of defining the appropriate standardized description syntax and semantics and supporting extended versions of such is a matter of agreeing on how extensions to the descriptions can be made (which are covered by the mechanisms I've already included). Dynamic workflows and job dependencies require that a client supply application-specific code that can be run inside the scheduling infrastructure in order to supply dynamically computed decisions. This requires an additional extension mechanism beyond the ones that I listed. Note that whether the client describes the decision making code in terms of something like BPEL or in terms of something like a Java servlet that the scheduling infrastructure runs is a second-order issue. In both cases the client is supplying code that gets run inside the scheduling infrastructure. So, you are right that we need the ability to run client-supplied code inside the scheduling infrastructure for some of the extensions we might contemplate. That said, I would argue that we should save these kinds of extensions for later in our deliberations since they are far more complicated to get right than the ones I listed. But we should definitely keep them in mind. Regarding MPI/UPC and other forms of parallel/distributed programs (e.g. PVM): There is a declarative aspect that is visible to clients and an internal, implementation aspect that I would argue should not be visible in the interface between clients and schedulers. Let's consider an MPI program that is based on the MPICH infrastructure consisting of SMPD daemons running on each compute node used by the program. A client will specify the MPI program to run, which MPI infrastructure it expects, and the relevant MPI-related arguments to supply (as well as other arguments, environment variables, etc.). This can all be described and encoded as an XML infoset. The scheduling infrastructure will internally need to implement the MPICH SMPD daemon-based infrastructure, but the details of that aren't visible in the job scheduling interface. An interesting question is whether MPICH's implementation aspects need to become visible when we consider scheduler-scheduler interactions. If an MPI program can span multiple clusters then the relevant SMPD daemons from multiple clusters need to be put in touch with each other. In the case of MPICH, I believe the main thing needed is that the root SMPD daemon receive a list of the IP addresses of all the compute nodes that will participate in a given MPI program. In that case, the server-server aspects of scheduling an MPI program mainly have to do with allocating the appropriate number of compute nodes - and getting their names - from appropriate compute clusters that have indicated that they support the MPICH SMPD infrastructure. So I hypothesize that support for parallel/distributed programs is mainly a matter of defining the appropriate declarative standards and doesn't require any additional extension mechanisms beyond those I've already described. I would, of course, be very interested to learn of examples where this is not enough. Marvin. ________________________________ From: Balle, Susanne [mailto:Susanne.Balle@hp.com] Sent: Monday, May 01, 2006 7:42 AM To: Marvin Theimer Cc: ogsa-wg@ggf.org Subject: RE: [ogsa-wg] Thoughts on extensions mechanisms for the HPC profile work Marvin I think this is a good start. I did find some areas missing such as Workflow and support for Job Dependencies as well as extensions for MPI/UPC programs. I like the "object-oriented" approach and agree with Dave that being able to specify more complex expressions is important and is in my opinion a requirement for ease of use. If you launch a 1000 job you want to be able to do query on groups of jobs without having to specify the individual jobs. Regards Susanne -----Original Message----- From: owner-ogsa-wg@ggf.org [mailto:owner-ogsa-wg@ggf.org] On Behalf Of Marvin Theimer Sent: Friday, April 28, 2006 10:06 PM To: ogsa-wg@ggf.org Subject: [ogsa-wg] Thoughts on extensions mechanisms for the HPC profile work Hi; This email is intended to describe my views of the set of extension mechanisms that are both necessary and sufficient to implement the common cases that we have identified for the HPC profile work (see the document "HPC Use Cases - Base Case and Common Cases", a preliminary draft of which I sent out to the ogsa-wg mailing list several weeks ago). These views are in large part derived from ongoing discussions that Chris Smith and I have been having about the subject of interoperable job scheduling designs. This email is intended to start a discussion about extension mechanisms rather than define the "answer" to this topic. So please do reply with suggestions for any changes and extensions (:-)) you feel are needed. Marvin. Additive vs. modifying extensions At a high level, there are two types of extensions that one might consider: 1. * Purely additive extensions. 2. * Extensions that modify the semantics of the underlying base-level design. Purely additive extensions that, for example, add strictly new functionality to an interface or that define additional resource types that clients and schedulers can refer to, seem fairly straight-forward to support. Modifying extensions fall into two categories: 3. * Base case semantics remain unchanged to parties operating at the base (i.e. un-extended) level. 4. * Base case semantics change for parties operating at the base level. Modifying extensions that leave the base-level semantics unchanged are straight-forward to incorporate. An example is adding at-most once semantics to interface requests. These operations now have more tightly defined failure semantics, but their functional semantics remain unchanged and base-level clients can safely ignore the extended semantics. Extensions that change base-level semantics should be disallowed since they violate the fundamental premise of base-level interoperability. An example of such an extension would be having the creation of jobs at a particular (extended) scheduler require that the client issue an additional explicit resource deallocation request once a job has terminated. Base-level clients would not know to do this and the result would be an incorrectly functioning system. Types of extensions I believe the following types of extensions are both necessary and sufficient to meet the needs of the HPC profile work: 5. * Addition of new WSDL operations. 6. * This is needed to support additional new functionality, such as the addition of suspend/resume operations. As long as base-level semantics aren't modified, this form of extension seems to be straight-forward. 7. * Addition of additional parameters to existing WSDL operations. 8. * As long as base-level semantics are maintained, this form of extension is also straight-forward. An example is adding a notification callback parameter to job creation requests. However, it is not clear whether all tooling can readily handle this form of "operation overloading". It may be better - from a pragmatic point-of-view - to define new WSDL operations (with appropriately defined names) that achieve the same effect. 9. * Support for array operations and other forms of batching. 10. * When 1000's of jobs are involved the efficiency gains of employing array operations for things like queries or abort requests are too significant to ignore. Hence a model in which every job must be interacted with on a strictly individual basis via an EPR is arguably unacceptable. 11. * One approach would be to simply add array operations alongside the corresponding individual operations, so that one can selectively interact with jobs (as well as things like data files) in either an "object-oriented" fashion or in "bulk-array" fashion. One could observe that the array operations enable the corresponding individual operations as a trivial special case, but this would arguably violate the principle of defining a minimalist base case and then employing only extensions (rather than replacements). 12. * Array operations are an example or a service-oriented rather than a resource-oriented form of interaction: clients send a single request to a job scheduler (service) that refers to an array of many resources, such as jobs. This raises the question of whether things like jobs should be referred to via EPRs or via unique "abstract" names that are independent of any given service's contact address. At a high level, the choice is unimportant since the client submitting an array operation request is simply using either one as a unique (and opaque) identifier for the relevant resource. On a pragmatic level one might argue that abstract names are easier and more efficient to deal with than EPRs since the receiving scheduler will need to parse EPRs to extract what is essentially the abstract name for each resource. (Using arrays of abstract names rather than arrays of EPRs is also more efficient from a size point-of-view.) 13. * If abstract names are used in array operations then it will necessary that individual operations return the abstract name and not just an EPR for a given resource, such as a job. If this approach is chosen then this implies that the base case design and implementation must return abstract names and not just EPRs for things like jobs. 14. * Extensions to state diagrams. 15. * Chris Smith is in the process of writing up this topic. 16. * Standardized extensions to things like resource definitions and other declarative definitions (e.g. about provisioning). 17. * The base use case assumes a small, fixed set of "standard" resources and other concepts (e.g. working directory) that may be described/requested. The simplest extension approach is to define additional specific "standard sets" that clients and services can refer to by their global name (e.g. the posix resource description set or the Windows resource description set) and of which they pick exactly one to use for any given interaction. 18. * The problem with this simplest form of extension is that it provides only a very crude form of extensibility with no notion of composition or incremental extension of existing definition sets. This is sufficient for very course-grained characterizations, such as "Windows environment" versus "Posix environment", but not for finer-grained resource extensions. An alternative is to define composable sets that cover specific "subjects" (e.g. GPUs). In the extreme, these sets could be of size 1. This implies that clients and services need to be able to deal with the power set of all possible meaningful combinations of these sets. As long as individual definitions are independent of each other (i.e. the semantics of specifying A is unchanged by specifying B in the same description) this isn't a big problem. Allowing the presence of different items in a description to affect each other's semantics is arguably a variation on modifying the base-level semantics of a design via some extension to the design and hence should be disallowed. 19. * If resource descriptions are used only for "matchmaking" against other resource descriptions then another approach is to allow arbitrary resource types whose semantics are not understood by the HPC infrastructure, which deals with them only as abstract entities whose names can be compared textually and whose associated values can be compared textually or numerically depending on their data type. It is important to understand that, whereas the "mechanical" aspects of an HPC infrastructure can mostly be built without having to know the semantics of these abstract resource types, their semantics must still be standardized and well-known at the level of the human beings using and programming the system. Both the descriptions of available computational resources and of client requests for reserving and using such resources must be specified in a manner that will cause the underlying HPC "matchmaking" infrastructure to do the right thing. This matchmaking approach is exemplified by systems such Condor's class ads system. 20. * It should be noted that a generalized matchmaking system is not a trivial thing to implement efficiently and hence one can reasonably imagine extensions based on any of the above approaches to extending resource (and other) definitions. 21. * Hierarchical and extended representations of information. 22. * XML infosets provide a very convenient way to represent extended descriptions of a particular piece of information. 23. * Another form of hierarchical information display shows up when multi-level scheduling systems are involved. In this case it may be desirable to represent information either in a form that hides the scheduling hierarchy or in a form that reflects it. Consider how to represent the list of compute nodes for a job running across multiple clusters: A flat view might list all compute nodes in an undifferentiated list. A hierarchical view might provide a list of clusters, each of which describes information about a cluster, including a list of the compute nodes in that cluster that the job is running on. Both views have their uses. XML infosets are convenient for encoding the syntax of either view, but an extension supporting information representation in these sorts of systems will also have to define the semantics of all allowed hierarchies. 24. * Decomposition of functionality into "micro" protocols. 25. * Micro protocols should reflect things that must occur at different times (e.g. resource reservation/allocation vs. resource use/job-execution) or that can be employed in a stand-alone manner (e.g. job execution vs. data transfer). The decomposition that seems relevant for the HPC use cases (i.e. are visible to clients) is the following: 26. * The base case involves interaction between a client and a scheduler for purposes of executing a job. 27. * A client may wish to independently reserve, or pre-allocate resources for later and/or guaranteed use. Note that this is different from simply submitting a job for execution to a scheduler that then queues the job for later execution - perhaps at a specific time requested by the client. For example, a meta-scheduler might wish to reserve resources so that it can make informed scheduling decisions about which "subsidiary" scheduler to send various jobs to. Similarly, a client might wish to reserve resources so as to run two separate jobs in succession to each other, with one job writing output to a scratch storage system and the second job reading that output as its input without having to worry that the data might have vanished during the interval that occurs between the execution of the two jobs. 28. * A client may wish to query a scheduler to learn what resources might be available to it, without actually laying claim to any resources as part of the query (let alone execute anything using those resources). Scheduling candidate set generators or matchmaking services such as Condor would want this functionality. 29. * A client may need to transfer specific data objects (e.g. files) to and from a system that is under the control of a job scheduling service. 30. * Micro protocols may have relationships to each other. For example, job execution will need to be able to accept a handle of some sort to resources that have already been allocated to the requesting client.

Balle, Susanne wrote:
I think this is a good start. I did find some areas missing such as Workflow and support for Job Dependencies as well as extensions for MPI/UPC programs.
None of those are base cases. They're just very useful extensions. :-) The JSDL-WG is currently working on defining a spec for describing parallel (e.g. MPI) job requests, an area chosen because there was considerable community interest (if we didn't, several different groups of people stated they'd go off and do their own lash-ups, indicating that the need is urgent). With that, I'd imagine that the main requirement would then be a "parallel-aware" BES container, and that it would be the responsibility of such a system to act as a facade and hide the details from the largely-uninterested users. Donal.
participants (5)
-
Balle, Susanne
-
Donal K. Fellows
-
Karl Czajkowski
-
Marvin Theimer
-
Steve Loughran