Hi;
Coming from the point-of-view of the HPC Profile working group, I have
several questions about BES (including recent discussions on the mailing list),
as well as some straw man thoughts about how BES should relate to the HPC
profile spec.
Based on the BES-1.3 spec that Andrew Grimshaw recently sent out, at an
abstract level, there seem to be the following aspects to BES:
·
A core set of operations around
activities:
·
CreateActivityFromJSDL
·
GetActivityStatus
·
RequestActivityStateChange
·
GetActivityJSDLDocuments
·
A set of BES factory-specific system
management operations and resource properties (RPs):
·
StartAcceptingNewActivities
·
StopAcceptingNewActivities
·
IsAcceptingNewActivities RP
·
Support for notifications.
·
Support for various resource
properties (or their equivalent in a non-WSRF version) having to do with an
information model for describing various things about a BES factory, the
associated container it represents, and any activities it is currently running.
·
An extensible activity state model.
Things explicitly NOT in the BES specification are:
·
Generic system management
interface.
·
Security design.
·
Interface for directly controlling/manipulating
an activity once it has been created.
Things that used to be in the BES spec but now seem to be extensions (please correct me if I’m wrong here!):
·
Data staging
·
Suspension
I have the following questions about BES and the various discussions
that have recently occurred (including the ESI integration):
·
Extensibility:
·
Given that BES has bought into the
notion of an extensible activity state diagram, it needs to also normatively
define how clients can learn of the extensions that a given BES service
supports. Is that something that will be added to the BES specification? Or
will the specification point to some other place where notions of extensibility
are defined more generically? (Personally, I’d vote for the former approach.)
·
Is the “base case” for
BES now fig.2, which shows states of {new, pending, running, canceled, failed,
finished}?
·
Previously included states, such
as Execution-Pending, will presumably be defined in suitable extension
profiles?
·
Assuming that data staging and
suspension are now extensions to the base BES spec, will they be defined as
such in an appendix of the spec, or as a separate extension profile?
·
The original BES spec describes a
fairly sophisticated data staging design that supports parallelism. Is
there any interest in defining a second, simpler data staging extension that
avoids the complexity of the parallelism support?
·
Will the suspension extension be
the simple one that is currently presented in sec. 4 as an example? Or do
people feel that a more complicated version, such as the ESI one is
necessary/important? Can/should we define both?
·
Given that suspension is no longer
in the base design, presumably the createInSuspendedState parameter to
CreateActivityFromJSDL should disappear?
·
RequestActivityStateChange: I
believe this operation will pose challenges in an extensible design. The
current design is imperative by nature: it specifies an explicit state to move
an activity to. However, a client who does not know of all the extensions
that a BES service implements may not know how to pick the appropriate state to
transition to. It seems better to introduce a more declarative approach
in which clients specify “actions” they wish to occur, such as ‘CancelActivity’.
This approach would allow the BES service to make the appropriate state transition
in response to a desired action requested by a client.
·
Information model:
·
JSDL seems to inherently be
focused on describing a single job or a single computational resource. For
example, it has no notion of describing all the differing compute nodes of a (heterogeneous)
compute cluster. By incorporating JSDL elements into the BES information
model it seems that BES is foreclosing the ability to describe things like
compute clusters. This issue also effects what can get returned from GetActivityJSDLDocuments.
If I’m wrong about this, then it seems like it would be worth having an
explicit explanation about how to achieve this functionality somewhere in the
specification.
·
The BES information model now
includes various posix-specific elements of JSDL. How would other systems
– such as a Windows system – be described?
·
The spec requires that all BES
services “support” all the various attributes listed in sec. 5, but
they don’t have to implement them. What exactly does that
mean? For example, if a JSDL doc specifies a CPU-Speed requirement and a
particular BES service doesn’t implement it (meaning it doesn’t
keep track of it), then does the associated CreateActivityFromJSDL request have
to fail? If so, then do clients have to figure out what the minimal set
of implemented attributes are in a system and then only use those in job
descriptions? Is there is a notion of “optional” attributes
that can be ignored, that specify desired attribute values rather than required
ones?
·
Is there any notion of specifying
that all compute nodes should have the same
value for some attribute (e.g. CPU architecture, CPU speed, NIC card)? This
seems to be missing from the JSDL specification, but seems very important for
BES if it is to support things like compute clusters.
·
Some of the elements seem either
incompletely specified, have definitions that are open to multiple
interpretations, or have definitions that would be very difficult to implement
in practice. In particular:
·
CPU architecture seems like it can’t
describe all the variations – let alone all the peripherals such as GPUs –
that a computing resource might have (let alone a cluster).
·
CPU speed seems like the tip of an
iceberg having to do with characterizing the performance of a system, which
will depend on all manner of things like details of the processor chip used,
cache sizes, bus used, etc.
·
Network bandwidth: is this the
theoretical maximum of the NIC on a compute node or is it the current bandwidth
actually available in a (shared) system? Note that the latter is
difficult to measure in a practically useful way. Note also that network
bandwidth only describes one aspect of communications performance and that
several others are arguably equally important (e.g. latency).
All this leads to the question of whether BES will
have a notion of extending the information model that is supplied. If so, then
that leads to the question of what the base case should be and whether it
should include a smaller set of things than is currently listed in the spec.
Are there any plans to tighten the definitions of some
of the more vague information elements? (I guess this really is an issue
more for the JSDL WG than for BES.)
·
GetActivityJSDLDocuments returns a
JSDL document for each specified activity. Is this sufficient to capture the
entire “provenance” for what has happened to the activity? In
particular, would it be sufficient to allow someone to (a) run the same
activity on another BES service (assuming same hardware and software) and get
the same results and (b) debug what has happened to an errant activity? I
would argue that both capabilities have proven to be important in actual
systems.
·
System management operations:
·
Currently BES supports 2 specific
system management operations: Start and stop activities commands. Most
schedulers support a variety of scheduling-specific system management
operations and I’m wondering why these two operations were singled out in
particular to be part of the base case?
·
These operations seem to require a
different set of authorization credentials than the other interface operations
since they should be invoked by system administrators rather than random users.
How will that work, given that these operations are in the same WSDL as
the other operations? Wouldn’t this argue for moving these
operations to a separate system management interface?
·
Array operations:
·
Currently one can create a single
activity, but all other operations accept an array of AEDs as input. Was
there some reason why an array creation operation wasn’t included so
that, for example, parameter sweep applications can be created with a single
request instead of N requests (where N can be in the thousands)?
·
Given that BES seems to have
bought into the notion of extensibility, should the base case be a “non-array”
one? For example, currently if you want to handle a fault for a RequestActivityStateChange
operation on a single activity you need to look inside the returned array of
results to see if a fault infoset was returned. All the exception
handling machinery that modern tooling provides can’t get used because RequestActivityStateChange
never returns an actual fault message (as compared to a fault infoset for the
appropriate array elements that are returned.
·
Other questions:
·
An entire (small) section is
devoted to talking about the optional use of WS-Names. However, since the
specification doesn’t require
them, it’s unclear to me whether BES needs to say anything about
WS-Names. As far as I understand things, whether an EPR is a WS-Name or
not can be determined by inspecting it. Hence the only reason to have a
special property on a BES service that indicates what kind of AEDs it returns
is to alert potential clients ahead of time about this feature of the service. But
it’s not clear to me what a client would do with that information, as
compared to deciding opportunistically to exploit a WS-Name AED for, e.g. resolution,
at the time that that would be necessary. Is there a use case that
describes how clients would exploit the AED-type resource property?
·
Since JSDL documents are
self-describing, a BES service can figure out by inspection whether the job
description infoset parameter to CreateActivityFromJSDL is JSDL or something
else. This would seem to imply that naming the operation CreateActivity
would lose no information and would allow for transparent extension to other
job description infoset simply by using them (assuming they are
self-describing).
·
Container attributes that I have
questions about:
·
LocalResourceManagerType: where do
these get defined normatively?
·
Job Credential Service and File
Credential Service: these imply a specific security model. Given
that security is undefined in the BES spec, is this appropriate –
especially given the rather vague definition of both?
Given these questions, as well as the mandate for the HPC profile to
define a simple base interface, I would like to present the following straw man
proposal for a modified BES specification for feedback from this community:
·
Operations:
·
CreateActivity(jsdlDoc) à EPR
·
GetActivity(EPR) à
activityState
·
GetActivityProvenance(EPR) à
either JSDL doc (if that can describe all the necessary provenance info) or JSDL+
·
CancelActivity(EPR)
·
For non-WSRF versions:
QueryResources() à
schedulerResourcesInfoset
·
‘schedulerResourcesInfoset’
is essentially the union of the RPs that would be exported in a WSRF-based
version for describing the resources that are available for use at this BES
service. Note that a BES service might also want to expose other kinds of
information that would not be returned from this operation – this operation
is there so that clients can determine whether or not a BES service could
potentially meet their needs and is necessary for meta-scheduling scenarios.
·
One might argue that one could use
WS-Transfer for this operation. However, since a BES service might want
to export other kinds of information, this would require an extra level of
indirection so that the BES service could expose which EPRs to use for
retrieving which kinds of information.
·
Additional topics/summary:
·
Simple state diagram and no notion
of array operations, data staging, suspension, or notifications in base BES
case.
·
Extensions defined as separate profiles
for array operations, data staging, suspension, and notifications.
·
RequestActivityStateChange
replaced by operations specifying desired actions rather than states. Base
case supports activity cancellation; extensions can define additional
operations (e.g. SuspendActivity).
·
Information model: small base set
plus extensions model (which ones to include in the base set TBD)
·
All system management functions
moved out to a separate interface.
Thanks for any and all feedback on these questions and this straw man
proposal,
Marvin.