BES Last Call

Mark Morgan

2 Feb 2007 2 Feb '07

6:43 p.m.

Ladies and Gentlemen, As per consensus at the BES session yesterday, 1 February 2007, at 6pm at OGF19, I have made all noted modifications to the BES document. It is uploaded to the Gridforge website and is available as ogsa-bes-draft-v30. If you feel that you deserve acknowledgements for work done, or to be listed as an author, and you are not currently indicated properly in the document, please speakup. Once again, this is the last call for OGSA BES specification before we send it off for public comment. -Mark -- Mark Morgan Research Scientist Department of Computer Science University of Virginia http://www.cs.virginia.edu mmm2a@virginia.edu (434) 982-2047

Show replies by date

Stephen M Pickles

6 Feb 6 Feb

2:20 a.m.

Dear Mark, I notice only now that my name appears to have crept into the authors' list. While I would be happy to be included in such company, I honestly feel that authorship overstates my contribution. Please relegate me to the list of contributors (section 12) or the acknowledgments (section 13), as you see fit. Best regards, Stephen

...

-----Original Message----- From: ogsa-bes-wg-bounces@ogf.org [mailto:ogsa-bes-wg-bounces@ogf.org] On Behalf Of Mark Morgan Sent: 02 February 2007 18:43 To: ogsa-bes-wg@ggf.org Subject: [OGSA-BES-WG] BES Last Call

Ladies and Gentlemen,

As per consensus at the BES session yesterday, 1 February 2007, at 6pm at OGF19, I have made all noted modifications to the BES document. It is uploaded to the Gridforge website and is available as ogsa-bes-draft-v30.

If you feel that you deserve acknowledgements for work done, or to be listed as an author, and you are not currently indicated properly in the document, please speakup.

Once again, this is the last call for OGSA BES specification before we send it off for public comment.

-Mark

-- Mark Morgan Research Scientist Department of Computer Science University of Virginia http://www.cs.virginia.edu mmm2a@virginia.edu (434) 982-2047

-- ogsa-bes-wg mailing list ogsa-bes-wg@ogf.org http://www.ogf.org/mailman/listinfo/ogsa-bes-wg

Andreas Savva

7:58 a.m.

Mark, Sorry if this is an FAQ, but glancing through the document I'm not clear on the distinction between the definitions of "UnsupportedFeatureFault" and "InvalidRequestMessageFault". For example, "UnsupportedFeatureFault" says "... well-formed, supported JSDL document input element containing a sub-element that is not implemented by this BES implementation." and "InvalidRequestMessageFault" says "An element in the request message is not recognized. ... This does not mean that the element itself is in error, but rather that it specifies a syntactically correct value which does not in fact make sense." Suppose that the jsdl 'other' value is used to provide extra XML content in order to specify an operating system not in the OperatingSystem enumeration. If a BES container does not 'support' this operating system which of the two faults should be returned? Btw, the CPU example is clear but it would not have been an example I would have thought of given the normative definition ("..not recognized") of the InvalidRequestMessageFault. Andreas Mark Morgan wrote:

...

Ladies and Gentlemen,

As per consensus at the BES session yesterday, 1 February 2007, at 6pm at OGF19, I have made all noted modifications to the BES document. It is uploaded to the Gridforge website and is available as ogsa-bes-draft-v30.

If you feel that you deserve acknowledgements for work done, or to be listed as an author, and you are not currently indicated properly in the document, please speakup.

Once again, this is the last call for OGSA BES specification before we send it off for public comment.

-- Andreas Savva Fujitsu Laboratories Ltd

Chris Smith

8:03 p.m.

UnsupportedFeatureFault indicates that a particular element or attribute contained within the JSDL document is either not supported, or (for extension content) not supported or recognized. InvalidRequestMessageFault indicates that the value of some element is invalid input. For example, if TotalCPUCount in JSDL was given as -10. -- Chris On 05/2/07 23:58, "Andreas Savva" <andreas.savva@jp.fujitsu.com> wrote:

...

Mark,

Sorry if this is an FAQ, but glancing through the document I'm not clear on the distinction between the definitions of "UnsupportedFeatureFault" and "InvalidRequestMessageFault".

For example, "UnsupportedFeatureFault" says "... well-formed, supported JSDL document input element containing a sub-element that is not implemented by this BES implementation."

and "InvalidRequestMessageFault" says "An element in the request message is not recognized. ... This does not mean that the element itself is in error, but rather that it specifies a syntactically correct value which does not in fact make sense."

Suppose that the jsdl 'other' value is used to provide extra XML content in order to specify an operating system not in the OperatingSystem enumeration. If a BES container does not 'support' this operating system which of the two faults should be returned?

Btw, the CPU example is clear but it would not have been an example I would have thought of given the normative definition ("..not recognized") of the InvalidRequestMessageFault.

Andreas

Mark Morgan wrote:

...
Ladies and Gentlemen,

As per consensus at the BES session yesterday, 1 February 2007, at 6pm at OGF19, I have made all noted modifications to the BES document. It is uploaded to the Gridforge website and is available as ogsa-bes-draft-v30.

If you feel that you deserve acknowledgements for work done, or to be listed as an author, and you are not currently indicated properly in the document, please speakup.

Once again, this is the last call for OGSA BES specification before we send it off for public comment.

Andreas Savva

7 Feb 7 Feb

3:41 a.m.

Chris, Chris Smith wrote:

...

UnsupportedFeatureFault indicates that a particular element or attribute contained within the JSDL document is either not supported, or (for extension content) not supported or recognized.

InvalidRequestMessageFault indicates that the value of some element is invalid input. For example, if TotalCPUCount in JSDL was given as -10.

This is nice text and I hope it is included in the BES spec. "...not recognized" is not correct. Also given the above, HPC Profile sections 3.9 and 3.10 specify the wrong value for the returned fault. For example in 3.9 it says

...

If the consuming system does not provide the requested operating system, or if the JSDL special token “other” is used as the content of the jsdl:OperatingSystemName sub-element, and if the consuming system does not understand the provided extension content, then the consuming system MAY return the BES InvalidRequestMessageFault to the requester.

It should be UnsupportedFeatureFault. (And why is the fault returned a MAY and not a MUST for the profile?) Andreas

...

-- Chris

On 05/2/07 23:58, "Andreas Savva" <andreas.savva@jp.fujitsu.com> wrote:

...
Mark,

Sorry if this is an FAQ, but glancing through the document I'm not clear on the distinction between the definitions of "UnsupportedFeatureFault" and "InvalidRequestMessageFault".

For example, "UnsupportedFeatureFault" says "... well-formed, supported JSDL document input element containing a sub-element that is not implemented by this BES implementation."

and "InvalidRequestMessageFault" says "An element in the request message is not recognized. ... This does not mean that the element itself is in error, but rather that it specifies a syntactically correct value which does not in fact make sense."

Suppose that the jsdl 'other' value is used to provide extra XML content in order to specify an operating system not in the OperatingSystem enumeration. If a BES container does not 'support' this operating system which of the two faults should be returned?

Btw, the CPU example is clear but it would not have been an example I would have thought of given the normative definition ("..not recognized") of the InvalidRequestMessageFault.

-- Andreas Savva Fujitsu Laboratories Ltd

Christopher Smith

8 Feb 8 Feb

7:02 p.m.

On 06/2/07 19:41, "Andreas Savva" <andreas.savva@jp.fujitsu.com> wrote:

...

Chris,

Chris Smith wrote:

...
UnsupportedFeatureFault indicates that a particular element or attribute contained within the JSDL document is either not supported, or (for extension content) not supported or recognized.

InvalidRequestMessageFault indicates that the value of some element is invalid input. For example, if TotalCPUCount in JSDL was given as -10.

This is nice text and I hope it is included in the BES spec. "...not recognized" is not correct.

The recognized part referred to extension content where the element might not be known to the consuming system (as opposed to being known, but unsupported). I have no problems dropping it.

...

Also given the above, HPC Profile sections 3.9 and 3.10 specify the wrong value for the returned fault. For example in 3.9 it says

...
If the consuming system does not provide the requested operating system, or if the JSDL special token ³other² is used as the content of the jsdl:OperatingSystemName sub-element, and if the consuming system does not understand the provided extension content, then the consuming system MAY return the BES InvalidRequestMessageFault to the requester.

It should be UnsupportedFeatureFault. (And why is the fault returned a MAY and not a MUST for the profile?)

Ahh ... it is InvalidRequestMessage ... that's because the element (OperatingSystemName) is recognized and supported by the system, but the "value" of OperatingSystemName is not recognized. I know this seems in conflict with the statements about UnsupportedFeatureFault above, but in this case, the extension elements are the "value" of the OperatingSystemName element (if that makes sense). The reason for the MAY is in the phrase "If the consuming system does not provide the requested operating system....". Some systems may choose to accept the JSDL as is, and might just have an activity whose resource requirements can never be satisfied (unless an operating system of that type is configured in the system and made available to the BES). This would be the case for my BES implementation on top of LSF, which allows one to specify resources that may never be satisfied. -- Chris

Joseph Bester

9:11 p.m.

Sorry for coming in so late to this discussion. As I wasn't too involved in the earlier drafts, I'm not sure how much of this has been covered already. Below are my questions and comments on the latest draft. Each concern is separated by a -- and tries to refer to the nearest heading or subheading title in the document. joe Naming Activities: Endpoint References

...

A BES implementation MUST always present an XML anyURI to clients and this single anyURI MUST be one of a well-defined set of URIs that represent the types of endpoints that the BES implementation deals with. Two URIs are defined: http://schemas.ggf.org/bes/2006/08/bes/naming/BasicWSAddressing http://schemas.ggf.org/bes/2006/08/bes/naming/WS-Naminghttp:// schemas.ggf.org/bes/2006/08/bes/naming/WS-Naming

...

BESs MUST not implement such specialization profiles. More specifically, any specialization profile that a BES implements MUST obey the following rules regarding sub-state definitions and allowed state transitions:

1. A specialization can introduce sub-states only by replacing a state in the state model that it is specializing (which itself may be a specialization of some other state model) with a graph of sub-states and state

I think this is referring to the NamingProfile BES-Factory attribute, but it is quite unclear from the context. Is it strictly required to have only those two URIs hard-coded and not have extensibility here? -- Defining Valid Specializations transitions

...

among those sub-states. 2. A state transition from any sub-state in the specialization to another state, S, in the unspecialized state model may only occur if a corresponding state transition already existed in the unspecialized state model from the state that has been replaced, R, to that state S.

...

This case raises the question of what to tell a client about their request, and when. The client may wish to receive a response to its request immediately, telling it that the request will eventually be applied once the relevant activity has progressed to a suitable sub-state. Alternatively, the client may wish to receive a response to its request only when the requested change has actually occurred. To support both response cases requires that clients can specify in a request whether they wish to receive an immediate response back or whether they wish to only receive a response once

...

request has actually been acted upon. In the former case a client must be prepared to receive back a fault response indicating that their request will eventually be applied. [2] The OperationWillBeAppliedEventuallyFault is used by the BES to indicate to the client that the requested state operation is allowed, but that

...

operation can not be applied immediately given the current Activity state. By throwing this fault, the BES indicates to the client that it will apply the requested operation when the Activity state allows. For example, an Activity in Running:Migrating sub-state can not be put into Running:Suspended, until the Activity has completed the migration operation, and is back in

The wording of #1 is a little bit amiguous: it might be interpreted as stating that specialized states must only have transitions within the new substate graph. It might also clarify things to explicitly allow transitions to substates of S in #2. -- Composition of Specializations [1] and Specialized Fault Responses [2] [1] the the the

...

Running:On-resource sub-state.

I'm not terribly happy with this particular fault. It seems like this is a gap in the state transition diagram for the substates---the server is moving the job into some intermediate state (between migrating and suspended in this example) rather than to the desired state. Since the server intends to move to the suspended state, it seems like a poor candidate for a fault. Also, this fault isn't actually used in the description of the only client-initialized state transition (TerminateActivities) -- Representing sub-states There isn't a clear XML example of how a union state would look. The reference to section 4.3 is wrong---that part is about specialized fault responses. -- BES-Management Port-type (Attributes and Operations) Port-type vs Port-Type in subsequent sections. No Attributes are described in this section despite the heading. -- StopAcceptingNewActivities, StartAcceptingNewActivities

...

Output(s) - None. The response message will be sent once the BES has stopped accepting new activity creation requests. - None. The response message will be sent once the BES has started accepting new activity creation requests

Is it necessary to allow the service to block the client for an indeterminate time? Other state transitions described in the doc allow for a BES implementation to intend to make the state change without guaranteeing that it is done before the reply is made. -- BES-Factory attributes General note: not all attributes in the table are listed in the schema (I noticed TotalNumberOfActivities and TotalNumberofContainedResources were missing)

...

ContainedResource anyType

Is it intended to use an anyType holding a BasicResourceAttributesDocument or FactoryResourceAttributesDocument or is it intended to have the ContainedResource element be of type BasicResourceAttributesDocumentType or FactoryResourceAttributesDocumentType? To clarify what I mean:  <ContainedResource> <FactoryResourceAttributesDocumentType> <IsAcceptingNewActivities>true</IsAcceptingNewActivities> </FactoryResourceAttributesDocumentType> </ContainedResource> or  <ContainedResource xsi:type="FactoryResourceAttributesDocumentType"> <IsAcceptingNewActivities>true</IsAcceptingNewActivities> </ContainedResource> -- BES-Factory attributes

...

LocalResourceManagerType URI Should there be a specific URI defined for the case where the BES acts as a facade for one or more other BESes? (one of the use cases described in this doc).

-- BES-Factory Attributes: TotalNumberOfActivities BES-Factory Attributes: TotalNumberOfContainedResources These are listed as mandatory attributes. Are they essential to the operation of the BES factory? -- BES-Factory Attributes: ActivityReference The description says *all* of the currrently contained activities should be listed. I would expect in a Grid environment sometimes it would makes sense to restrict the ActivityReference list to those activities which the requester is authorized to view or manipulate. -- BESExtension The list of valid values might be better listed as list of example values. The names should be closer to the extension definitions. -- BES-Factory Operations

...

If a request fails for some reason that applies to all the specified activities—e.g., due to an authorization fault—then the BES MUST respond with an appropriate fault response message.

Is there a reason to require this different fault processing on the service as opposed to folding this into the other fault handling case?

...

If a request can succeed for one or more of the specified activities then the BES MUST respond with a > vector of response elements, where each element corresponds to the equivalent activity designated in > the input EPR vector. Each response element MUST be either an element describing the results of the request, as applied to the designated activity, or a SOAP-1.1 fault element describing why the request could not be applied to the designated activity (e.g., because the EPR could not be resolved to any known activity within the BES).

...

CreateActivityResponseType Response: On success, Response contains an ActivityIdentifier (EPR) identifying the requested activity. An ActivityDocument element MAY be included representing the current representation of the requested activity. This operation may include a NotAuthorizedFault SOAP fault in the result element indicating that while the front- end (i.e. BES web service) was able to validate the incoming user credentials, the back-end would not

I'm a little nervous about explicitly referring to SOAP 1.1 Fault types in the schema instead of including only BES-specific faults here. Also, the "either" part of the response is not enforced in the XML schema---probably simplest as a choice, though there are many ways to do so in a schema doc. -- CreateActivity Output(s) permit the operation given the

...

credentials supplied. This can happen for example when a BES implements its activities by fork/ exec’ing those applications using su.

...

InvalidRequestMessageFault: An element in the request message is not recognized. The elements that are not recognized are described in the body of the fault. This does not mean that the element itself is in error, but rather that it specifies a syntactically correct value which does not in fact make sense. For example, the number of CPUs is represented by a double, so fractional values are syntactically correct, but would cause this fault to be thrown as

Is this fault text correct? It seems odd to describe a fault in the response section (and it doesn't seem to appear in the schema for the service). -- CreateActivity Fault(s) they do not make sense in the

...

context given.

This seems confusing when compared to the UnsupportedFeatureFault. Perhaps this could be renamed to something containing the word "unsatisfiable" to match more closely the JSDL notational conventions. This would also handle the multitude of fault specializations which could reasonably be generated BES implementations (too many cpus requested, insufficient memory, invalid host, etc) -- GetActivityStatuses / TerminateActivities

...

EPR[] ActivityIdentifier: A vector of zero or more EPRs (obtained from previous CreateActivity operations) Zero doesn't make sense here, does it?

...

Since the BES specification allows for extensible activity state diagrams, it is possible that not all states within the state diagram will be relevant/meaningful to a particular client. BES requires that all legal state transitions are transitioned even if they are not relevant to a particular client. For instance, if an empty JSDL document is submitted to

-- GetActivityStatuses the BES then all the states from New

...

to Finished will be transitioned through even though there is no underlying specified activity.

...

TerminateActivityResponseType[] Response: A vector detailing the responses of the BES to the termination requests. The Terminated element is a boolean value indicating whether the BES successfully (true) terminated the activity or not (false). If

...

activity is now in the Terminated state. If false is returned then

...

transition to the Terminated state. If an activity specified in

Does this text belong in this section? The New state is not mentioned elsewhere. -- GetActivityStatuses / TerminateActivities InvalidRequestMessageFault Does this fault make sense for these operations? -- true is returned, then the associated the activity MAY eventually the input cannot be located or cannot

...

be terminated for some reason then the TerminateResponse MUST contain a SOAP-1.1 fault element instead of a Terminated element.

...

The BES-Activity port-type defines operations for monitoring and managing individual activities. These operations are intended to be applied to EPRs returned by

...

Returning an EPR that supports the BES-Activity Port-type is

The interface for this might be slightly more useful if it were to return the activity status instead of the boolean value. -- GetActivityDocuments Missing heading + same concerns as in BES-Factory Operations above -- GetAttributesDocument The wsdl for this operation doesn't seem to match the description -- BES-Activity Port-type previous CreateActivity operations. optional in the CreateActivity operation

...

of the BES-Factory Port-type.

...

Should a BES receive a second CreateActivity request that includes

Actually, this port type is not defined in this document at all. This section is a definition for an activity document which can be used in some messaging by the BES Factory port type. -- Optional Extensions Probably helpful to add the extension URIs in the section defining the relevant extensions. -- Idempotent Execution Semantics the same identifier as a

...

previously received request, the BES MUST not create the requested activity a second time if it already created the activity for the first request

This should probably read "Should a BES receive a subsequent CreateActivity request"... -- Subscription to Notification Events

...

A BES that allows its clients to subscribe for messages concerning activity state changes MUST do so using either the WS-Eventing or WS-Notification protocols.

Change this to be a MUST only if referring to a BES advertising the SupportsSubscriptions extension. Should additionally list the Topic (for wsnt---not sure about WS- Eventing) which should be used for the subscription request. wsnt prefix isn't defined in the Namespace table in section 1.2 -- Authors: Should refer to Argonne National Laboratory not Argonne National Labs Since Peter has moved on from Globus/ANL, I'm not sure if Peter Lane should be listed as author based on the criteria discussed in emails:

...

Greg also takes the view (and the GFSG supports this) that anyone listed as an author must individually be willing to take full responsibility for the document, now and forever. This will go into the revised GFD #1, too. It's the important part, in Greg's view.

Andreas Savva

9 Feb 9 Feb

5:54 a.m.

If I may, I have an additional comment on the following point:

...

Naming Activities: Endpoint References

...
A BES implementation MUST always present an XML anyURI to clients and this single anyURI MUST be one of a well-defined set of URIs that represent the types of endpoints that the BES implementation deals with. Two URIs are defined: http://schemas.ggf.org/bes/2006/08/bes/naming/BasicWSAddressing http://schemas.ggf.org/bes/2006/08/bes/naming/WS-Naminghttp:// schemas.ggf.org/bes/2006/08/bes/naming/WS-Naming

I think this is referring to the NamingProfile BES-Factory attribute, but it is quite unclear from the context. Is it strictly required to have only those two URIs hard-coded and not have extensibility here?

At GGF18 during a BES session and when reviewing BES draft 28 (I think) we agreed to remove these BES WS-Naming URIs because WS-Naming was defining a bunch of them already. There is no reason to have duplicate definitions. Also because WS-Naming defines separate claim URIs for each of its sub-profiles (and because an implementation is free to claim support for any set of those sub-profiles) we agreed that a BES implementation may present "one or more" naming URIs to indicate which set of naming conventions it supports. (Hopefully Mark and Andrew should remember that discussion.) Andreas Joseph Bester wrote:

...

Sorry for coming in so late to this discussion. As I wasn't too involved in the earlier drafts, I'm not sure how much of this has been covered already. Below are my questions and comments on the latest draft. Each concern is separated by a -- and tries to refer to the nearest heading or subheading title in the document.

joe

Naming Activities: Endpoint References

...
A BES implementation MUST always present an XML anyURI to clients and this single anyURI MUST be one of a well-defined set of URIs that represent the types of endpoints that the BES implementation deals with. Two URIs are defined: http://schemas.ggf.org/bes/2006/08/bes/naming/BasicWSAddressing http://schemas.ggf.org/bes/2006/08/bes/naming/WS-Naminghttp:// schemas.ggf.org/bes/2006/08/bes/naming/WS-Naming

I think this is referring to the NamingProfile BES-Factory attribute, but it is quite unclear from the context. Is it strictly required to have only those two URIs hard-coded and not have extensibility here?

--

Defining Valid Specializations

...
BESs MUST not implement such specialization profiles. More specifically, any specialization profile that a BES implements MUST obey the following rules regarding sub-state definitions and allowed state transitions:

1. A specialization can introduce sub-states only by replacing a state in the state model that it is specializing (which itself may be a specialization of some other state model) with a graph of sub-states and state transitions among those sub-states. 2. A state transition from any sub-state in the specialization to another state, S, in the unspecialized state model may only occur if a corresponding state transition already existed in the unspecialized state model from the state that has been replaced, R, to that state S.

The wording of #1 is a little bit amiguous: it might be interpreted as stating that specialized states must only have transitions within the new substate graph. It might also clarify things to explicitly allow transitions to substates of S in #2.

--

...
This case raises the question of what to tell a client about their request, and when. The client may wish to receive a response to its request immediately, telling it that the request will eventually be applied once the relevant activity has progressed to a suitable sub-state. Alternatively, the client may wish to receive a response to its request only when the requested change has actually occurred. To support both response cases requires that clients can specify in a request whether they wish to receive an immediate response back or whether they wish to only receive a response once

...
request has actually been acted upon. In the former case a client must be prepared to receive back a fault response indicating that their request will eventually be applied. [2] The OperationWillBeAppliedEventuallyFault is used by the BES to indicate to the client that the requested state operation is allowed, but that

...
operation can not be applied immediately given the current Activity state. By throwing this fault, the BES indicates to the client that it will apply the requested operation when the Activity state allows. For example, an Activity in Running:Migrating sub-state can not be put into Running:Suspended, until the Activity has completed the migration operation, and is back in

Composition of Specializations [1] and Specialized Fault Responses [2] [1] the the the

...
Running:On-resource sub-state.

I'm not terribly happy with this particular fault. It seems like this is a gap in the state transition diagram for the substates---the server is moving the job into some intermediate state (between migrating and suspended in this example) rather than to the desired state. Since the server intends to move to the suspended state, it seems like a poor candidate for a fault. Also, this fault isn't actually used in the description of the only client-initialized state transition (TerminateActivities)

--

Representing sub-states

There isn't a clear XML example of how a union state would look. The reference to section 4.3 is wrong---that part is about specialized fault responses.

--

BES-Management Port-type (Attributes and Operations)

Port-type vs Port-Type in subsequent sections. No Attributes are described in this section despite the heading.

--

StopAcceptingNewActivities, StartAcceptingNewActivities

...
Output(s) - None. The response message will be sent once the BES has stopped accepting new activity creation requests. - None. The response message will be sent once the BES has started accepting new activity creation requests

Is it necessary to allow the service to block the client for an indeterminate time? Other state transitions described in the doc allow for a BES implementation to intend to make the state change without guaranteeing that it is done before the reply is made.

--

BES-Factory attributes

General note: not all attributes in the table are listed in the schema (I noticed TotalNumberOfActivities and TotalNumberofContainedResources were missing)

...
ContainedResource anyType

Is it intended to use an anyType holding a BasicResourceAttributesDocument or FactoryResourceAttributesDocument or is it intended to have the ContainedResource element be of type BasicResourceAttributesDocumentType or FactoryResourceAttributesDocumentType?

To clarify what I mean:

 <ContainedResource> <FactoryResourceAttributesDocumentType> <IsAcceptingNewActivities>true</IsAcceptingNewActivities> </FactoryResourceAttributesDocumentType> </ContainedResource>

or

 <ContainedResource xsi:type="FactoryResourceAttributesDocumentType"> <IsAcceptingNewActivities>true</IsAcceptingNewActivities> </ContainedResource>

--

BES-Factory attributes

...
LocalResourceManagerType URI Should there be a specific URI defined for the case where the BES acts as a facade for one or more other BESes? (one of the use cases described in this doc).

--

BES-Factory Attributes: TotalNumberOfActivities BES-Factory Attributes: TotalNumberOfContainedResources These are listed as mandatory attributes. Are they essential to the operation of the BES factory?

--

BES-Factory Attributes: ActivityReference The description says *all* of the currrently contained activities should be listed. I would expect in a Grid environment sometimes it would makes sense to restrict the ActivityReference list to those activities which the requester is authorized to view or manipulate.

--

BESExtension

The list of valid values might be better listed as list of example values. The names should be closer to the extension definitions.

--

BES-Factory Operations

...
If a request fails for some reason that applies to all the specified activities—e.g., due to an authorization fault—then the BES MUST respond with an appropriate fault response message.

Is there a reason to require this different fault processing on the service as opposed to folding this into the other fault handling case?

...
If a request can succeed for one or more of the specified activities then the BES MUST respond with a > vector of response elements, where each element corresponds to the equivalent activity designated in > the input EPR vector. Each response element MUST be either an element describing the results of the request, as applied to the designated activity, or a SOAP-1.1 fault element describing why the request could not be applied to the designated activity (e.g., because the EPR could not be resolved to any known activity within the BES).

I'm a little nervous about explicitly referring to SOAP 1.1 Fault types in the schema instead of including only BES-specific faults here. Also, the "either" part of the response is not enforced in the XML schema---probably simplest as a choice, though there are many ways to do so in a schema doc.

--

CreateActivity Output(s)

...
CreateActivityResponseType Response: On success, Response contains an ActivityIdentifier (EPR) identifying the requested activity. An ActivityDocument element MAY be included representing the current representation of the requested activity. This operation may include a NotAuthorizedFault SOAP fault in the result element indicating that while the front- end (i.e. BES web service) was able to validate the incoming user credentials, the back-end would not permit the operation given the credentials supplied. This can happen for example when a BES implements its activities by fork/ exec’ing those applications using su.

Is this fault text correct? It seems odd to describe a fault in the response section (and it doesn't seem to appear in the schema for the service).

--

CreateActivity Fault(s)

...
InvalidRequestMessageFault: An element in the request message is not recognized. The elements that are not recognized are described in the body of the fault. This does not mean that the element itself is in error, but rather that it specifies a syntactically correct value which does not in fact make sense. For example, the number of CPUs is represented by a double, so fractional values are syntactically correct, but would cause this fault to be thrown as they do not make sense in the context given.

This seems confusing when compared to the UnsupportedFeatureFault. Perhaps this could be renamed to something containing the word "unsatisfiable" to match more closely the JSDL notational conventions. This would also handle the multitude of fault specializations which could reasonably be generated BES implementations (too many cpus requested, insufficient memory, invalid host, etc)

--

GetActivityStatuses / TerminateActivities

...
EPR[] ActivityIdentifier: A vector of zero or more EPRs (obtained from previous CreateActivity operations) Zero doesn't make sense here, does it?

--

GetActivityStatuses

...
Since the BES specification allows for extensible activity state diagrams, it is possible that not all states within the state diagram will be relevant/meaningful to a particular client. BES requires that all legal state transitions are transitioned even if they are not relevant to a particular client. For instance, if an empty JSDL document is submitted to the BES then all the states from New to Finished will be transitioned through even though there is no underlying specified activity.

Does this text belong in this section? The New state is not mentioned elsewhere.

--

GetActivityStatuses / TerminateActivities

InvalidRequestMessageFault

Does this fault make sense for these operations?

--

...
TerminateActivityResponseType[] Response: A vector detailing the responses of the BES to the termination requests. The Terminated element is a boolean value indicating whether the BES successfully (true) terminated the activity or not (false). If true is returned, then the associated activity is now in the Terminated state. If false is returned then the activity MAY eventually transition to the Terminated state. If an activity specified in the input cannot be located or cannot be terminated for some reason then the TerminateResponse MUST contain a SOAP-1.1 fault element instead of a Terminated element.

The interface for this might be slightly more useful if it were to return the activity status instead of the boolean value.

--

GetActivityDocuments

Missing heading + same concerns as in BES-Factory Operations above

--

GetAttributesDocument

The wsdl for this operation doesn't seem to match the description

--

BES-Activity Port-type

...
The BES-Activity port-type defines operations for monitoring and managing individual activities. These operations are intended to be applied to EPRs returned by previous CreateActivity operations. Returning an EPR that supports the BES-Activity Port-type is optional in the CreateActivity operation of the BES-Factory Port-type.

Actually, this port type is not defined in this document at all. This section is a definition for an activity document which can be used in some messaging by the BES Factory port type.

--

Optional Extensions

Probably helpful to add the extension URIs in the section defining the relevant extensions.

--

Idempotent Execution Semantics

...
Should a BES receive a second CreateActivity request that includes the same identifier as a previously received request, the BES MUST not create the requested activity a second time if it already created the activity for the first request

This should probably read "Should a BES receive a subsequent CreateActivity request"...

--

Subscription to Notification Events

...
A BES that allows its clients to subscribe for messages concerning activity state changes MUST do so using either the WS-Eventing or WS-Notification protocols.

Change this to be a MUST only if referring to a BES advertising the SupportsSubscriptions extension.

Should additionally list the Topic (for wsnt---not sure about WS- Eventing) which should be used for the subscription request.

wsnt prefix isn't defined in the Namespace table in section 1.2

--

Authors: Should refer to Argonne National Laboratory not Argonne National Labs

Since Peter has moved on from Globus/ANL, I'm not sure if Peter Lane should be listed as author based on the criteria discussed in emails:

...
Greg also takes the view (and the GFSG supports this) that anyone listed as an author must individually be willing to take full responsibility for the document, now and forever. This will go into the revised GFD #1, too. It's the important part, in Greg's view.

-- ogsa-bes-wg mailing list ogsa-bes-wg@ogf.org http://www.ogf.org/mailman/listinfo/ogsa-bes-wg

-- Andreas Savva Fujitsu Laboratories Ltd

Christopher Smith

6:11 p.m.

Comments for some of the issues inline ... not sure I have answers for all of them. -- Chris On 08/2/07 13:11, "Joseph Bester" <bester@mcs.anl.gov> wrote:

...

Sorry for coming in so late to this discussion. As I wasn't too involved in the earlier drafts, I'm not sure how much of this has been covered already. Below are my questions and comments on the latest draft. Each concern is separated by a -- and tries to refer to the nearest heading or subheading title in the document.

joe

Naming Activities: Endpoint References

...
A BES implementation MUST always present an XML anyURI to clients and this single anyURI MUST be one of a well-defined set of URIs that represent the types of endpoints that the BES implementation deals with. Two URIs are defined: http://schemas.ggf.org/bes/2006/08/bes/naming/BasicWSAddressing http://schemas.ggf.org/bes/2006/08/bes/naming/WS-Naminghttp:// schemas.ggf.org/bes/2006/08/bes/naming/WS-Naming

I think this is referring to the NamingProfile BES-Factory attribute, but it is quite unclear from the context. Is it strictly required to have only those two URIs hard-coded and not have extensibility here?

It is intended to be extensible. The context should be clarified by referring to the factory attribute.

...

--

Defining Valid Specializations

...
BESs MUST not implement such specialization profiles. More specifically, any specialization profile that a BES implements MUST obey the following rules regarding sub-state definitions and allowed state transitions:

1. A specialization can introduce sub-states only by replacing a state in the state model that it is specializing (which itself may be a specialization of some other state model) with a graph of sub-states and state transitions among those sub-states. 2. A state transition from any sub-state in the specialization to another state, S, in the unspecialized state model may only occur if a corresponding state transition already existed in the unspecialized state model from the state that has been replaced, R, to that state S.

The wording of #1 is a little bit amiguous: it might be interpreted as stating that specialized states must only have transitions within the new substate graph. It might also clarify things to explicitly allow transitions to substates of S in #2.

So I actually found the description quite clear. The text says that the specialization must support both 1 and 2. 1 says that you can define whatever state diagram you want within the confines of the specialized state, and 2 says how your sub-states transition into the rest of the unspecialized state diagram. I can't explicitly allow transitions to substates of S (i.e. other specializations) because my specialization only has explicit knowledge of the state it's specializing and the unspecialized state diagram that is described in BES.

...

--

...
This case raises the question of what to tell a client about their request, and when. The client may wish to receive a response to its request immediately, telling it that the request will eventually be applied once the relevant activity has progressed to a suitable sub-state. Alternatively, the client may wish to receive a response to its request only when the requested change has actually occurred. To support both response cases requires that clients can specify in a request whether they wish to receive an immediate response back or whether they wish to only receive a response once

...
request has actually been acted upon. In the former case a client must be prepared to receive back a fault response indicating that their request will eventually be applied. [2] The OperationWillBeAppliedEventuallyFault is used by the BES to indicate to the client that the requested state operation is allowed, but that

...
operation can not be applied immediately given the current Activity state. By throwing this fault, the BES indicates to the client that it will apply the requested operation when the Activity state allows. For example, an Activity in Running:Migrating sub-state can not be put into Running:Suspended, until the Activity has completed the migration operation, and is back in

Composition of Specializations [1] and Specialized Fault Responses [2] [1] the the the

...
Running:On-resource sub-state.

I'm not terribly happy with this particular fault. It seems like this is a gap in the state transition diagram for the substates---the server is moving the job into some intermediate state (between migrating and suspended in this example) rather than to the desired state. Since the server intends to move to the suspended state, it seems like a poor candidate for a fault. Also, this fault isn't actually used in the description of the only client-initialized state transition (TerminateActivities)

Well ... the BES may have any number of internal states that it uses to implement it's functionality. The point is that the client sees the published state diagram, and any specializations that it chooses to understand. Good point on TerminateActivities ... it should be listed as one of the faults.

...

--

Representing sub-states

There isn't a clear XML example of how a union state would look. The reference to section 4.3 is wrong---that part is about specialized fault responses.

Yes ... a non-normative example would be useful.

...

--

BES-Management Port-type (Attributes and Operations)

Port-type vs Port-Type in subsequent sections. No Attributes are described in this section despite the heading.

It was a placeholder, but no attributes were defined. The bits in the parentheses should be removed.

...

--

BES-Factory attributes

General note: not all attributes in the table are listed in the schema (I noticed TotalNumberOfActivities and TotalNumberofContainedResources were missing)

I apologize ... must have been some kind of cut and paste error, as these elements are definitely in my schema.

...

...
ContainedResource anyType

Is it intended to use an anyType holding a BasicResourceAttributesDocument or FactoryResourceAttributesDocument or is it intended to have the ContainedResource element be of type BasicResourceAttributesDocumentType or FactoryResourceAttributesDocumentType?

To clarify what I mean:

 <ContainedResource> <FactoryResourceAttributesDocumentType> <IsAcceptingNewActivities>true</IsAcceptingNewActivities> </FactoryResourceAttributesDocumentType> </ContainedResource>

or

 <ContainedResource xsi:type="FactoryResourceAttributesDocumentType"> <IsAcceptingNewActivities>true</IsAcceptingNewActivities> </ContainedResource>

It is intended to be the second. It is stated in section 6.1.7 that this is the case, but I guess it's not clear enough. Can you suggest some alternative text?

...

--

BES-Factory attributes

...
LocalResourceManagerType URI Should there be a specific URI defined for the case where the BES acts as a facade for one or more other BESes? (one of the use cases described in this doc).

Sounds like a decent idea, but I forsee multiple implementations of even this use case, so I think it's better to leave this up to the implementation.

...

--

BES-Factory Attributes: TotalNumberOfActivities BES-Factory Attributes: TotalNumberOfContainedResources These are listed as mandatory attributes. Are they essential to the operation of the BES factory?

Yes. The text does not mandate that the BES must return the list of activity references or contained resources (e.g. maybe it would choose not to return details due to authorization constraints). Having these counters available means that a client can distinguish between no activities in the BES, and activities that it isn't being shown.

...

--

BES-Factory Attributes: ActivityReference The description says *all* of the currrently contained activities should be listed. I would expect in a Grid environment sometimes it would makes sense to restrict the ActivityReference list to those activities which the requester is authorized to view or manipulate.

Right ... I agree that the text should be clarified.

...

--

BESExtension

The list of valid values might be better listed as list of example values. The names should be closer to the extension definitions.

Yes ... the text should be clarified.

...

--

BES-Factory Operations

...
If a request fails for some reason that applies to all the specified activitiese.g., due to an authorization faultthen the BES MUST respond with an appropriate fault response message.

Is there a reason to require this different fault processing on the service as opposed to folding this into the other fault handling case?

Can you clarify "other fault handling case"? Do you mean the operation returning a fault? If so, the reason is that there is a vector of inputs, and a sub set of the vector might succeed, so that the operation can succeed, but individual activities need to be flagged for faults.

...

...
If a request can succeed for one or more of the specified activities then the BES MUST respond with a > vector of response elements, where each element corresponds to the equivalent activity designated in > the input EPR vector. Each response element MUST be either an element describing the results of the request, as applied to the designated activity, or a SOAP-1.1 fault element describing why the request could not be applied to the designated activity (e.g., because the EPR could not be resolved to any known activity within the BES).

I'm a little nervous about explicitly referring to SOAP 1.1 Fault types in the schema instead of including only BES-specific faults here. A little nervous, or nervous enough to suggest an alternative. :-) It's just a rendering question.

...

Also, the "either" part of the response is not enforced in the XML schema---probably simplest as a choice, though there are many ways to do so in a schema doc.

Some feedback received from Microsoft's Web Services team indicated we should avoid 'choice'. Can you provide an example of how you would like to see this rendered?

...

--

CreateActivity Output(s)

...
CreateActivityResponseType Response: On success, Response contains an ActivityIdentifier (EPR) identifying the requested activity. An ActivityDocument element MAY be included representing the current representation of the requested activity. This operation may include a NotAuthorizedFault SOAP fault in the result element indicating that while the front- end (i.e. BES web service) was able to validate the incoming user credentials, the back-end would not permit the operation given the credentials supplied. This can happen for example when a BES implements its activities by fork/ exec¹ing those applications using su.

Is this fault text correct? It seems odd to describe a fault in the response section (and it doesn't seem to appear in the schema for the service).

The text should be moved to the fault section.

...

--

CreateActivity Fault(s)

...
InvalidRequestMessageFault: An element in the request message is not recognized. The elements that are not recognized are described in the body of the fault. This does not mean that the element itself is in error, but rather that it specifies a syntactically correct value which does not in fact make sense. For example, the number of CPUs is represented by a double, so fractional values are syntactically correct, but would cause this fault to be thrown as they do not make sense in the context given.

This seems confusing when compared to the UnsupportedFeatureFault. Perhaps this could be renamed to something containing the word "unsatisfiable" to match more closely the JSDL notational conventions. This would also handle the multitude of fault specializations which could reasonably be generated BES implementations (too many cpus requested, insufficient memory, invalid host, etc)

Seems like a nice clarification. I'd be ok with it.

...

--

GetActivityStatuses / TerminateActivities

...
EPR[] ActivityIdentifier: A vector of zero or more EPRs (obtained from previous CreateActivity operations) Zero doesn't make sense here, does it?

True ... I would make minOccurs=1".

...

--

GetActivityStatuses

...
Since the BES specification allows for extensible activity state diagrams, it is possible that not all states within the state diagram will be relevant/meaningful to a particular client. BES requires that all legal state transitions are transitioned even if they are not relevant to a particular client. For instance, if an empty JSDL document is submitted to the BES then all the states from New to Finished will be transitioned through even though there is no underlying specified activity.

Does this text belong in this section? The New state is not mentioned elsewhere.

An artifact from an older version. I would vote for the removal of this text.

...

--

GetActivityStatuses / TerminateActivities

InvalidRequestMessageFault

Does this fault make sense for these operations?

Yes, since all messages allow for extension content. Maybe it also needs unsupported feature?

...

--

...
TerminateActivityResponseType[] Response: A vector detailing the responses of the BES to the termination requests. The Terminated element is a boolean value indicating whether the BES successfully (true) terminated the activity or not (false). If true is returned, then the associated activity is now in the Terminated state. If false is returned then the activity MAY eventually transition to the Terminated state. If an activity specified in the input cannot be located or cannot be terminated for some reason then the TerminateResponse MUST contain a SOAP-1.1 fault element instead of a Terminated element.

The interface for this might be slightly more useful if it were to return the activity status instead of the boolean value.

I agree.

...

BES-Activity Port-type

...
The BES-Activity port-type defines operations for monitoring and managing individual activities. These operations are intended to be applied to EPRs returned by previous CreateActivity operations. Returning an EPR that supports the BES-Activity Port-type is optional in the CreateActivity operation of the BES-Factory Port-type.

Actually, this port type is not defined in this document at all. This section is a definition for an activity document which can be used in some messaging by the BES Factory port type.

A placeholder ... should be cleaned up.

Donal K. Fellows

13 Feb 13 Feb

10:20 a.m.

Christopher Smith wrote:

...

So I actually found the description quite clear. The text says that the specialization must support both 1 and 2. 1 says that you can define whatever state diagram you want within the confines of the specialized state, and 2 says how your sub-states transition into the rest of the unspecialized state diagram. I can't explicitly allow transitions to substates of S (i.e. other specializations) because my specialization only has explicit knowledge of the state it's specializing and the unspecialized state diagram that is described in BES. [...] Well ... the BES may have any number of internal states that it uses to implement it's functionality. The point is that the client sees the published state diagram, and any specializations that it chooses to understand.

If memory serves, there's actually a formal name for this sort of thing. The specialization must be Weakly Similar to the general state diagram. This means that every state in the specialization must be mappable to a general state, and that every transition between states in the specialization must be either mapped to a transition between equivalent states in the general diagram, or that the two states must be mapped to the same general state and that the transition between the two must be not observable using just the general transitions. Or at least I think that's Weak Simulation (I know we don't want bisimulation; that's too strong) and I think I've got it the right way round. Too long since I last worked with these things in detail. My real point though is that someone's already formalized the notion we want to use; it captures exactly what we want. Donal (this formal CS stuff is occasionally useful :-)).

Karl Czajkowski

9 Feb 9 Feb

3:01 a.m.

On Feb 08, Christopher Smith modulated: ...

...

The reason for the MAY is in the phrase "If the consuming system does not provide the requested operating system....". Some systems may choose to accept the JSDL as is, and might just have an activity whose resource requirements can never be satisfied (unless an operating system of that type is configured in the system and made available to the BES). This would be the case for my BES implementation on top of LSF, which allows one to specify resources that may never be satisfied.

Chris, I think you are trying to get at some awfully subtle distinctions here. Where do you draw the line between: 1. XML unparseable (malformed) 2. XML unvalidatable (extension schema not known) 3. semantically inconsistent (schema allows meaningless expressions) 4. not supported by implementation 5. resources "never" available (not configured in cluster, etc.) 6. resources "temporarily" unavailable (due to load, etc.) 7. resources unavailable for given user role (due to policy, etc.) ... It would seem to me that you want to be careful in making sure faults are also interoperable if you are going to model them. Can a programmatic client behavior take a meaningfully different fault response action based on the fault type here? Or does this boil down to the fault indicating "the service did not like it" and there being some hints that a human might find useful during fault determination? This leads me to wonder if there is any real information captured in these very similar fault messages. If a system MAY report something as invalid when it might really be extensible value content not understood by the system, and if a system MAY accept something that is contradictory, what exactly is being communicated? It would seem to me that you need a fault "lattice" where the most generic fault at the bottom can be generated and understood by any sensible implementation, while the more precise faults should have more stringent requirements on when they can be signalled. I don't think you want to model subtle faults and then allow implementations to be liberal in what they generate here, as that would fail to encode real information for further processing decisions. karl -- Karl Czajkowski karlcz@univa.com

Christopher Smith

5:36 p.m.

So I will agree that the MAY should be MUST (like stated in the other email I just sent). As for the use of these, it's quite clear to me. One fault is used to indicate to the client that the JSDL they are providing (and the extensions to the JSDL that they use) are not implemented in the BES. The other is used to say that I do indeed recognize that JSDL construct, but you have put something in it that doesn't make sense to me. That is, to me, the principal guiding the use of one over the other. -- Chris On 08/2/07 19:01, "Karl Czajkowski" <karlcz@univa.com> wrote:

...

On Feb 08, Christopher Smith modulated: ...

...
The reason for the MAY is in the phrase "If the consuming system does not provide the requested operating system....". Some systems may choose to accept the JSDL as is, and might just have an activity whose resource requirements can never be satisfied (unless an operating system of that type is configured in the system and made available to the BES). This would be the case for my BES implementation on top of LSF, which allows one to specify resources that may never be satisfied.

Chris, I think you are trying to get at some awfully subtle distinctions here. Where do you draw the line between:

1. XML unparseable (malformed) 2. XML unvalidatable (extension schema not known) 3. semantically inconsistent (schema allows meaningless expressions) 4. not supported by implementation 5. resources "never" available (not configured in cluster, etc.) 6. resources "temporarily" unavailable (due to load, etc.) 7. resources unavailable for given user role (due to policy, etc.) ...

It would seem to me that you want to be careful in making sure faults are also interoperable if you are going to model them. Can a programmatic client behavior take a meaningfully different fault response action based on the fault type here? Or does this boil down to the fault indicating "the service did not like it" and there being some hints that a human might find useful during fault determination?

This leads me to wonder if there is any real information captured in these very similar fault messages. If a system MAY report something as invalid when it might really be extensible value content not understood by the system, and if a system MAY accept something that is contradictory, what exactly is being communicated?

It would seem to me that you need a fault "lattice" where the most generic fault at the bottom can be generated and understood by any sensible implementation, while the more precise faults should have more stringent requirements on when they can be signalled. I don't think you want to model subtle faults and then allow implementations to be liberal in what they generate here, as that would fail to encode real information for further processing decisions.

karl

Andreas Savva

6:13 a.m.

Chris, [Some comments inline.] Christopher Smith wrote:

...

On 06/2/07 19:41, "Andreas Savva" <andreas.savva@jp.fujitsu.com> wrote:

...
Chris,

Chris Smith wrote:

...
UnsupportedFeatureFault indicates that a particular element or attribute contained within the JSDL document is either not supported, or (for extension content) not supported or recognized.

InvalidRequestMessageFault indicates that the value of some element is invalid input. For example, if TotalCPUCount in JSDL was given as -10. This is nice text and I hope it is included in the BES spec. "...not recognized" is not correct.

The recognized part referred to extension content where the element might not be known to the consuming system (as opposed to being known, but unsupported). I have no problems dropping it.

I think simplifying and tightening the fault definitions in BES would be a good idea.

...

...
Also given the above, HPC Profile sections 3.9 and 3.10 specify the wrong value for the returned fault. For example in 3.9 it says

...
If the consuming system does not provide the requested operating system, or if the JSDL special token ³other² is used as the content of the jsdl:OperatingSystemName sub-element, and if the consuming system does not understand the provided extension content, then the consuming system MAY return the BES InvalidRequestMessageFault to the requester. It should be UnsupportedFeatureFault. (And why is the fault returned a MAY and not a MUST for the profile?)

Ahh ... it is InvalidRequestMessage ... that's because the element (OperatingSystemName) is recognized and supported by the system, but the "value" of OperatingSystemName is not recognized. I know this seems in conflict with the statements about UnsupportedFeatureFault above, but in this case, the extension elements are the "value" of the OperatingSystemName element (if that makes sense).

The reason for the MAY is in the phrase "If the consuming system does not provide the requested operating system....". Some systems may choose to accept the JSDL as is, and might just have an activity whose resource requirements can never be satisfied (unless an operating system of that type is configured in the system and made available to the BES). This would be the case for my BES implementation on top of LSF, which allows one to specify resources that may never be satisfied.

I agree with Karl's comments in a separate email that these distinctions seem too subtle and overload the meaning of the faults. The HPC Profile (being a profile) should be the same or more (and definitely not less) restrictive and precise than the BES specification on returned faults. Btw, on a different topic, HPC Profile section 3.11 TotalCPUCount says "non-negative integer" which includes the value of 0. I guess this type should be "positive integer". -- Andreas Savva Fujitsu Laboratories Ltd

Christopher Smith

5:34 p.m.

On the fault issue ... fair enough ... making the fault a MUST when the OS is not recognized (same text for CPU architecture) is ok with me. -- Chris On 08/2/07 22:13, "Andreas Savva" <andreas.savva@jp.fujitsu.com> wrote:

...

Chris,

[Some comments inline.]

Christopher Smith wrote:

...
On 06/2/07 19:41, "Andreas Savva" <andreas.savva@jp.fujitsu.com> wrote:

...
Chris,

Chris Smith wrote:

...
UnsupportedFeatureFault indicates that a particular element or attribute contained within the JSDL document is either not supported, or (for extension content) not supported or recognized.

InvalidRequestMessageFault indicates that the value of some element is invalid input. For example, if TotalCPUCount in JSDL was given as -10. This is nice text and I hope it is included in the BES spec. "...not recognized" is not correct.

The recognized part referred to extension content where the element might not be known to the consuming system (as opposed to being known, but unsupported). I have no problems dropping it.

I think simplifying and tightening the fault definitions in BES would be a good idea.

...
...
Also given the above, HPC Profile sections 3.9 and 3.10 specify the wrong value for the returned fault. For example in 3.9 it says

...
If the consuming system does not provide the requested operating system, or if the JSDL special token ³other² is used as the content of the jsdl:OperatingSystemName sub-element, and if the consuming system does not understand the provided extension content, then the consuming system MAY return the BES InvalidRequestMessageFault to the requester. It should be UnsupportedFeatureFault. (And why is the fault returned a MAY and not a MUST for the profile?)

Ahh ... it is InvalidRequestMessage ... that's because the element (OperatingSystemName) is recognized and supported by the system, but the "value" of OperatingSystemName is not recognized. I know this seems in conflict with the statements about UnsupportedFeatureFault above, but in this case, the extension elements are the "value" of the OperatingSystemName element (if that makes sense).

The reason for the MAY is in the phrase "If the consuming system does not provide the requested operating system....". Some systems may choose to accept the JSDL as is, and might just have an activity whose resource requirements can never be satisfied (unless an operating system of that type is configured in the system and made available to the BES). This would be the case for my BES implementation on top of LSF, which allows one to specify resources that may never be satisfied.

I agree with Karl's comments in a separate email that these distinctions seem too subtle and overload the meaning of the faults. The HPC Profile (being a profile) should be the same or more (and definitely not less) restrictive and precise than the BES specification on returned faults.

Btw, on a different topic, HPC Profile section 3.11 TotalCPUCount says "non-negative integer" which includes the value of 0. I guess this type should be "positive integer".

Andreas Savva

13 Feb 13 Feb

6:29 a.m.

Hi Chris, I would suggest as a way forward to divide and conquer. Can we agree first on the exact text for the definitions of the faults, staying on a single (perhaps very conservative) reading. And then choose which one to apply to the various cases? I asked the initial question because I could not understand why the case of giving an OS value from the JSDL-defined enumeration (well-defined presumably) would fall in the same category as that of specifying a fractional or negative value for CPUCount. Andreas Christopher Smith wrote:

...

On the fault issue ... fair enough ... making the fault a MUST when the OS is not recognized (same text for CPU architecture) is ok with me.

-- Chris

On 08/2/07 22:13, "Andreas Savva" <andreas.savva@jp.fujitsu.com> wrote:

...
Chris,

[Some comments inline.]

Christopher Smith wrote:

...
On 06/2/07 19:41, "Andreas Savva" <andreas.savva@jp.fujitsu.com> wrote:

...
Chris,

Chris Smith wrote:

...
UnsupportedFeatureFault indicates that a particular element or attribute contained within the JSDL document is either not supported, or (for extension content) not supported or recognized.

InvalidRequestMessageFault indicates that the value of some element is invalid input. For example, if TotalCPUCount in JSDL was given as -10. This is nice text and I hope it is included in the BES spec. "...not recognized" is not correct.

The recognized part referred to extension content where the element might not be known to the consuming system (as opposed to being known, but unsupported). I have no problems dropping it.

I think simplifying and tightening the fault definitions in BES would be a good idea.

...
...
Also given the above, HPC Profile sections 3.9 and 3.10 specify the wrong value for the returned fault. For example in 3.9 it says

...
If the consuming system does not provide the requested operating system, or if the JSDL special token ³other² is used as the content of the jsdl:OperatingSystemName sub-element, and if the consuming system does not understand the provided extension content, then the consuming system MAY return the BES InvalidRequestMessageFault to the requester. It should be UnsupportedFeatureFault. (And why is the fault returned a MAY and not a MUST for the profile?)

Ahh ... it is InvalidRequestMessage ... that's because the element (OperatingSystemName) is recognized and supported by the system, but the "value" of OperatingSystemName is not recognized. I know this seems in conflict with the statements about UnsupportedFeatureFault above, but in this case, the extension elements are the "value" of the OperatingSystemName element (if that makes sense).

The reason for the MAY is in the phrase "If the consuming system does not provide the requested operating system....". Some systems may choose to accept the JSDL as is, and might just have an activity whose resource requirements can never be satisfied (unless an operating system of that type is configured in the system and made available to the BES). This would be the case for my BES implementation on top of LSF, which allows one to specify resources that may never be satisfied.

I agree with Karl's comments in a separate email that these distinctions seem too subtle and overload the meaning of the faults. The HPC Profile (being a profile) should be the same or more (and definitely not less) restrictive and precise than the BES specification on returned faults.

Btw, on a different topic, HPC Profile section 3.11 TotalCPUCount says "non-negative integer" which includes the value of 0. I guess this type should be "positive integer".

-- Andreas Savva Fujitsu Laboratories Ltd

Karl Czajkowski

7:04 a.m.

On Feb 13, Andreas Savva modulated: ...

...

I asked the initial question because I could not understand why the case of giving an OS value from the JSDL-defined enumeration (well-defined presumably) would fall in the same category as that of specifying a fractional or negative value for CPUCount.

I think Chris sees this as an "out of range" error for the implementation, without distinguishing necessarily between "consistent, but not available in my resource manager", and "inconsistent". It is just a constraint that fails to match. In the general case, this distinction may be incomputable, though some (optional) validation and analysis could weed out common inconsistencies before handing the constraint to the solver. (Right, Chris?) This is an area where I can see the value in being liberal in the specification requirements to allow for a range of implementations. The only issue is making sure the concept space is modeled such that a less discerning implementation returns a more generic error and not a detailed but mischaracterized description of what is going on... it is that latter bit which pollutes the value of the detailed faults from an interop point of view. karl -- Karl Czajkowski karlcz@univa.com

Christopher Smith

4:29 p.m.

Right ... as Karl says. -- Chris On 12/2/07 23:04, "Karl Czajkowski" <karlcz@univa.com> wrote:

...

On Feb 13, Andreas Savva modulated: ...

...
I asked the initial question because I could not understand why the case of giving an OS value from the JSDL-defined enumeration (well-defined presumably) would fall in the same category as that of specifying a fractional or negative value for CPUCount.

I think Chris sees this as an "out of range" error for the implementation, without distinguishing necessarily between "consistent, but not available in my resource manager", and "inconsistent". It is just a constraint that fails to match.

In the general case, this distinction may be incomputable, though some (optional) validation and analysis could weed out common inconsistencies before handing the constraint to the solver. (Right, Chris?)

This is an area where I can see the value in being liberal in the specification requirements to allow for a range of implementations. The only issue is making sure the concept space is modeled such that a less discerning implementation returns a more generic error and not a detailed but mischaracterized description of what is going on... it is that latter bit which pollutes the value of the detailed faults from an interop point of view.

karl

Andreas Savva

14 Feb 14 Feb

8:51 a.m.

Ok ... I'm really not trying to be difficult with this. It's just that I find confusing a fault called "InvalidRequestMessageFault" which can potentially mean both "it's a legal value that I don't support" and "it's an illegal value". Especially when there is another fault called "UnsupportedFeatureFault" which means "an element was not supported". Updating the definitions of these faults would help, but is it possible to choose fault names that are sufficiently clear when viewed alone and in combination? Anyway, I think there's been sufficient traffic on this issue. Feel free to dispose of it as you see fit and I'll probably give you a public comment if I don't like the result. ;-) Andreas Christopher Smith wrote:

...

Right ... as Karl says.

-- Chris

On 12/2/07 23:04, "Karl Czajkowski" <karlcz@univa.com> wrote:

...
On Feb 13, Andreas Savva modulated: ...

...
I asked the initial question because I could not understand why the case of giving an OS value from the JSDL-defined enumeration (well-defined presumably) would fall in the same category as that of specifying a fractional or negative value for CPUCount.

I think Chris sees this as an "out of range" error for the implementation, without distinguishing necessarily between "consistent, but not available in my resource manager", and "inconsistent". It is just a constraint that fails to match.

In the general case, this distinction may be incomputable, though some (optional) validation and analysis could weed out common inconsistencies before handing the constraint to the solver. (Right, Chris?)

This is an area where I can see the value in being liberal in the specification requirements to allow for a range of implementations. The only issue is making sure the concept space is modeled such that a less discerning implementation returns a more generic error and not a detailed but mischaracterized description of what is going on... it is that latter bit which pollutes the value of the detailed faults from an interop point of view.

karl

-- Andreas Savva Fujitsu Laboratories Ltd

Karl Czajkowski

9:12 a.m.

On Feb 14, Andreas Savva modulated:

...

Ok ... I'm really not trying to be difficult with this.

Hah, don't be discouraged! I am just trying to help "channel" Chris since I've bent his ear on this topic many times before. I agree the specific faults seem a bit screwy... It sounds like there is a need for more separate faults: InvalidRequest: IMHO should mean invalid, e.g. doesn't validate to schema, just in case the user's tooling didn't already stop that before it got to the service. is this a MUST or a SHOULD for an implementation to validate input? UnsupportedRequest: detected some feature or extension not implemented (an implementation limit, not a site policy), and I would advocate that a service MUST detect and refuse unsupported/unrecognized features rather than silently proceeding, unless you have some way to mark fields as mandatory or not Unavailable: we cannot give what you are asking for, and this one has to be more generic I think and could be thrown when an implementation cannot be bothered to distinguish any of the following The previous seem very important and basic, while these latter ones seem more useful to expose more information but allow various levels of implementation complexity at the same time, all being alternatives to the generic Unavailable: Disallowed: if you want to report on site policy limits such as min/max thresholds, to help problem determination? OutOfRange: the parameter of a supported feature is out of range for either making sense or being available (Chris calls this being "configured" in a cluster, I think) or due to site policy, e.g. overlaps with Disallowed Depleted: this is a temporary condition such everything being allocated, so it is disjoint with Disallowed and OutOfRange. karl -- Karl Czajkowski karlcz@univa.com

6716

Age (days ago)

6728

Last active (days ago)

List overview

Download

18 comments

8 participants

participants (8)

Andreas Savva
Chris Smith
Christopher Smith
Donal K. Fellows
Joseph Bester
Karl Czajkowski
Mark Morgan
Stephen M Pickles

BES Last Call

tags

participants (8)