Hi;
Stating the question somewhat differently: does LSF write a log record
to stable storage before running any given job request? If so, then
adding at-most-once semantics wouldn’t be too hard. Note, however,
that exactly-once semantics would require a (distributed) two-phase commit to
ensure that the log record accurately reflects whether or not the job actually
got started on some (remote) compute resource.
Marvin.
From:
owner-ogsa-wg@ggf.org [mailto:owner-ogsa-wg@ggf.org] On Behalf Of Christopher Smith
Sent: Tuesday, March 21, 2006 6:32
PM
To: Ian Foster;
Cc: ogsa-wg@ggf.org
Subject: Re: [ogsa-wg] Paper
proposing "evolutionary vertical design efforts"
Yes ... in the LSF case
they’re written to stable storage, but there are no
“transactional” semantics in the protocol for submission.
That’s I believe what this discussion is about (the at most once
submission in the protocol), not whether job submission is a reliable operation
or not.
-- Chris
On 21/3/06 18:19, "Ian Foster" <foster@mcs.anl.gov> wrote:
Marvin:
I'm sure you are wrong in impying that systems such as LSF do not write
information about jobs to stable storage. LSF and other similar systems MUST be
highly reliable so that they can guarantee that jobs will not be lost. I am
sure that this means that they write some record of each job submitted to
stable storage. Chris or others will I am sure correct me if I am wrong in this
assertion.
That said, I certainly agree that we need input from the scheduler developers.
We should be careful not to use the word "transactional semantics" as
that is not what we are talking about.
Ian.
At 05:14 PM 3/21/2006 -0800,
Hi;
I know that systems
like LSF get used in high throughput settings where the service time for a job
request is an issue. An example is running very large numbers of
relatively short jobs through a compute cluster. Implementing
at-most-once semantics implies doing disk writes to get the necessary
persistency. Im just speculating, but doing those disk writes efficiently
enough (e.g. via group commits) to support the throughputs that Ive been told
of by, for example, customers in the financial industry is not a trivial design
and implementation task. If my assumption is correct, then this common
use case in the HPC world may be one that many, if not most job schedulers
would have a hard time supporting if they have to provide at-most-once
transactional semantics for all job submissions.
So do we exclude that
use case and tell the scheduler vendors servicing that market that they need to
come up with a separate interaction protocol for that use case? I would
prefer to define a base use case without transactional semantics and
immediately also define an extension that provides those semantics. Any
scheduler wanting to play in the wider grid world would implement the extension
because clients will be looking for/insisting on it. Any scheduler
wanting to provide ultra-high-throughput non-transactional semantics could do
so via the base case. Providing a front-end that implements transactional
semantics for a high-throughput scheduler is arguably better and easier than
forcing the high throughput-scheduler to implement either a complicated
high-performance job metadata repository or an additional separate protocol for
the ultra-high-throughput case.
Now, of course, it may
be the case that its really not that hard to provide transactional semantics
for such ultra-high-throughput use cases for all the schedulers we care about;
in which case Id be thrilled to agree to include them in the base use case and
move on. This is where Id really like to obtain input/guidance from
representatives of those scheduler vendors/suppliers. If anyone from
Platform Computing, Altair, SUN, and the other scheduler vendor/providers is
monitoring this email thread then please speak up!
Marvin.
From: Ian Foster
[mailto:foster@mcs.anl.gov]
Sent: Tuesday, March 21, 2006
10:53 AM
To: Marty Humphrey;
Cc: ogsa-wg@ggf.org;
Subject: RE: [ogsa-wg] Paper
proposing "evolutionary vertical design efforts"
Marty:
I wasn't trying to be philosophical, just commenting that at-once-submission
semantics is important. If a client can't be sure that a job is submitted or
not, clients get very complicated.
Ian.
At 01:43 PM 3/21/2006 -0500, Marty Humphrey wrote:
But this is not so simple. The knee-jerk reaction
is to separate these two concerns into implementation vs. interface, and
develop each one independently. But taken to the extreme, a system that
appearsto be rich in its capabilities might not be so in reality for some time
(if EVER!).
Lets assume that we truly separate these concerns
and build sophisticated interfaces. But then what about the potential consumer
of such services? Building an overly complex interface to such a service
(without any practical implementations behind it) might promote further
complicated clients (which promotes further complexity upstream&) Build the
interface and they will come with implementationsis a variation on a theme that
doesnt always come true. Arguably, complexity is what were trying to get away
from.
And no, Im not advocating only an interface that
matches existing capabilities. Im just saying that its NOT obvious that the
most effective approach is to entirely decouple these two concerns.
-- Marty
From: Ian Foster
[mailto:foster@mcs.anl.gov]
Sent: Tuesday, March 21, 2006 1:34
PM
To:
Cc: humphrey@cs.virginia.edu;
ogsa-wg@ggf.org;
Subject: RE: [ogsa-wg] Paper
proposing "evolutionary vertical design efforts"
Marvin:
I think you are mixing two things together: the capabilities of the scheduler
and the capabilities of the remote submission interface. The proposal that we
submit at-most-once submission capabilities is a proposal for capabilities in
the remote submission interface, not the scheduler. I wouldn't expect existing
schedulers to provide this capability, just as they don't (for the most part)
support Web Services interfaces. But once we define a Web Services-based remote
submission interface, at-most-once submission capabilities become important.
Ian.
At 10:28 AM 3/21/2006 -0800,
Hi;
Whereas I agree with you that at-most-once semantics are very desirable, I
would like to point out that not all existing job schedulers implement them.
I know that both LSF and CCS (the Microsoft HPC job scheduler) dont.
Ive been trying to find out whether PBS and SGE do or dont.
So, this brings up the following slightly more general question: should the
simplest base case be the simplest case that does something useful, or should
it be more complicated than that? I can see good arguments on both sides:
· Whittling
things down to the simplest possible base case maximizes the likelihood that
parties can participate. Every feature added represents one more feature
that some existing system may not be able to support or that a new system has
to provide even when its not needed in the context of that system.
Suppose, for example, that PBS and SGE dont provide transactional
semantics of the type you described. Then 4 of the 6 most common job scheduling
systems would not have this feature and would need to somehow add it to their
implementations. In this particular case it might be too difficult to add in
practice, but in general there might be problems.
· On the
other hand, since there are many clients and arguably far fewer server implementations,
features that substantially simplify client behavior/programming and that are
not too onerous to implement in existing and future systems should be part of
the base case. The problem, of course, is that this is a slippery slope
at the end of which lies the number 42 (ignore that last phrase if youre not a
fan of The Hitchhikers Guide to the Galaxy).
Personally, the slippery slope argument makes me lean towards defining the
simplest possible base use case, since otherwise well spend a (potentially
very) long time arguing about which features are important enough to justify
being in the base case. One possible way forward on this issue is to have
people come up with lists of features that they feel belong in the base use
case and then we agree to include only those that have a large majority of the
community arguing for their inclusion in the base case.
Unfortunately defining what large majorityshould be is also not easy or
obvious. Indeed, one can argue that we cant even afford to let all votes
be equal. Consider the following hypothetical (and contrived) case: 100
members of a particular academic research community show up and vote that the
base case must include support for a particular complicated scheduling policy
and the less-than-ten suppliers of existing job schedulers with significant
numbers of users all vote against it. Should it be included in the base case?
What happens if the major scheduler vendors/suppliers decide that they
cant justify implementing it and therefore cant be GGF spec-compliant and
therefore go off and define their own job scheduling standard? The hidden
issue is, of course, whether those voting are representative of the overall HPC
user population. I cant personally answer that question, but it does
again lead me to want to minimize the number of times I have to ask that
question i.e. the number of features that I have to consider for inclusion in
the base case.
So this brings me to the question of next steps. Recall that the approach
Im advocating and that others have bought in to as far as I can tell is that we
define a base case and the mechanisms and approach to how extensions of the
base case are done. I assert that the absolutely most important part of
defining how extension should work is ensuring that multiple extensions dont
end up producing a hairball thats impossible to understand, implement, or use.
In practice this means coming up with a restricted form of extension
since history is pretty clear on the pitfalls of trying to support arbitrarily
general extension schemes.
This is one of the places where identification of common use cases comes in.
If we define the use cases that we think might actually occur then we can
ask whether a given approach to extension has a plausible way of achieving all
the identified use cases. Of course, future desired use cases might not
be achievable by the extension schemes we come up with now, but that
possibility is inevitable given anything less than a fully general extension
scheme. Indeed, even among the common use cases we identify now, we might
discover that there are trade-offs where a simpler (and hence probably more
understandable and easier to implement and use) extension scheme can cover 80%
of the use cases while a much more complicated scheme is required to cover 100%
of the use cases.
Given all this, here are the concrete next steps Id like to propose:
· Everyone
who is participating in this design effort should define what they feel should
be the HPC base use case. This represents the simplest use case and
associated features like transactional submit semantics that you feel everyone in the HPC grid world must
implement. We will take these use case candidates and debate which one to
actually settle on.
· Everyone
should define the set of HPC use cases that they believe might actually occur
in practice. I will refer to these as the common use cases, in contrast
to the base use case. The goal here is not to define the most general HPC
use case, but rather the more restricted use cases that might occur in real
life. For example, not all systems will support job migration, so whereas
a fully general HPC use case would include the notion of job migration, I argue
that one or more common use cases will not include job migration.
Everyone should also prioritize and rank their common use cases so that we can
discuss 80/20-style trade-offs concerning which use cases to support with any
given approach to extension. Thus prioritization should include the
notion of how common you think a use case will actually be, and hence how
important it will be to actually support that use case.
· Everyone
should start thinking about what kinds of extension approaches they believe we
should define, given the base use case and common use cases that they have
identified.
As multiple people have pointed out, an exploration of common HPC use cases has
already been done one or several times before, including in the
One very important point that Id like to raise is the following: Time is short
and bestis the enemy of good enough. Microsoft is planning to provide a
Web services-based interoperability interface to its job scheduler sometime in
the next year or two. I know that many of the other job scheduler
vendors/suppliers are also interested in having an interoperability story in
place sooner rather than later. To meet this schedule on the Microsoft
side will require locking down a first fairly complete draft of whatever design
will be shipped by essentially the end of August. That's so that we can
do all the necessary debugging, interoperability testing, security threat
modeling, etc. that goes with shipping an actual finished product. What
that means for the HPC profile work is that, come the end of August, Microsoft
and possibly other scheduler vendors/suppliers will need to lock down and start
coding some version of Web Services-based job scheduling and data transfer
protocols. If there is a fairly well-defined, feasible set of
specs/profile coming out of the GGF HPC working group (for recommendation NOT
yet for actual standards approval) that has some reasonable level of consensus by
then, then that's what Microsoft will very likely go with. Otherwise
Microsoft will need to defer the idea of shipping anything that might be GGF
compliant to version 3 of our product, which will probably ship about 4 years
from now.
The chances of coming up with the bestHPC profile by the end of August are
slim. The chances of coming up with a fairly simple design that is good
enoughto cover the most important common cases by means of a relatively simple,
restricted form of extension seems much more feasible. Covering a richer
set of use cases would need to be deferred to a future version of the profile,
much in the manner that BES has been defined to cover an important sub-category
of use cases now, with a fuller
Marvin.
From: Carl
Kesselman [mailto:carl@isi.edu]
Sent: Thursday, March 16, 2006
12:49 AM
To:
Cc: humphrey@cs.virginia.edu;
ogsa-wg@ggf.org
Subject: Re: [ogsa-wg] Paper
proposing "evolutionary vertical design efforts"
Hi,
In the interest of furthering agreement, I was not arguing that the application
had to be restartable. Rather, what has been shown to be important is that the
protocol be restartable in the following sense: if you submit a job and
the far and server fails, is the job running or not, if you resubmit, do you
get another job instance. The GT sumbission protocol and Condor have a
transactional semantics so that you can have at most once submit semantics
reegardless of client and server failures. The fact that your application may
be non-itempote is exactly why having a well defined semantics in this case is
important.
So what is the next step?
Carl
Dr. Carl Kesselman
email:
carl@isi.edu
USC/Information Sciences Institute
WWW: http://www.isi.edu/~carl
4676 Admiralty Way, Suite 1001
Phone: (310)
448-9338
-----Original Message-----
From:
To: Carl Kesselman <carl@isi.edu>
CC:
Sent: Wed Mar 15 14:26:36 2006
Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design
efforts"
Hi;
I suspect that were mostly in agreement on things. In particular, I think
your list of four core aspects is a great starting point for a discussion on
the topic.
I just replied to an earlier email from
· Identification of the
simplest base case that everyone will have to implement.
· Identification of common
cases we want to optimize.
· Identification of how
evolution and selective extension will work.
I totally agree with you that the base use case I described isnt really a
griduse case. But it is an HPC use case in fact it is arguably the most
common use case in current existence. J So I think its important that we
understand how to seamlessly integrate and support that common and very simple
use case.
I also totally agree with you that we cant let a solution to the simplest HPC
use case paint us into a corner that prevents supporting the richer use cases
that grid computing is all about. Thats why Id like to spend significant
effort exploring and understanding the issues of how to support evolution and
selective extension. In an ideal world a legacy compute cluster job scheduler
could have a simple grid shimthat let it participate at a basic level, in a
natural manner, in a grid environment, while smarter clients and HPC services
could interoperate with each other in various selectively richer manners by
means of extensions to the basic HPC grid design.
One place where I disagree with you is your assertion that everything needs to
be designed to be restartable. While thats a good goal to pursue Im not
convinced that you can achieve it in all cases. In particular, there are
at least two cases that I claim we want to support that arent restartable:
· We want to be able to run
applications that arent restartable; for example, because they perform
non-idempotent operations on the external physical environment. If such
an application fails during execution then the only one who can figure out what
the proper next steps are is the end user.
· We want to be able to
include (often-times legacy) systems that arent fault tolerant, such as simple
small compute clusters where the owners didnt think that fault tolerance was
worth paying for.
Of course any acceptable design will have to enable systems that are fault
tolerant to export/expose that capability. To my mind its more a matter
of ensuring that non-fault-tolerant systems arent excluded from participation
in a grid.
Other things we agree on:
· We should certainly examine
what remote job submission systems do. We should certainly look at
existing systems like Globus, Unicore, and Legion. In general, we should
be looking at everything that has any actual experience that we can learn from
and everything that is actually deployed and hence represents a system that we
potentially need to interoperate with. (Whether a final design is actually able
to interoperate at any but the most basic level with various exotic existing
systems is a separate issue.)
· We should absolutely focus
on codifying what we know how to do and avoid doing research as part of a
standards process. I believe that thinking carefully about how to support
evolution and extension is our best hope for allowing people to defer trying to
bake their pet research topic into standards since it provides a story for why
todays standards dont preclude tomorrows improvements.
So I would propose that next steps are:
· Continue to explore and
classify various HPC use cases of various differing levels of complexity.
· Describe the requirements
and limitations of existing job scheduling and remote job submission systems.
· Continue identifying and
discussing key featuresof use cases and potential design solutions, such as the
four that you identified in your last email.
Marvin.
________________________________
From: Carl Kesselman [mailto:carl@isi.edu]
Sent: Tuesday, March 14, 2006 7:50 AM
To: Marty Humphrey; ogsa-wg@ggf.org
Cc:
Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design
efforts"
Hi,
Just to be clear, Im not trying to suggest that the scope be expanded. I agree
with the approach of focusing on a baby step is a good one, and many of the
assumptions stated in Marvins list I am in total agreement with. However, in
taking baby steps I think that it is important that we end up walking, and that
in defining the use case, one can easily create solutions that will not get you
to the next step. This is my point about looking at what we know how to do and
have been doing in production settings for many years now. In my mind, one of
the scope grandness problems has been that there has been far too little focus
on codifying what we know how to do in favor of using a standards process as an
excuse to design new things. So at the risk of sounding partisan, the
simplified use case that Marvin is proposing is exactly the use case that GRAM
has been doing for over ten years now (I think the same can be said about
UNICORE and Legion).
So let me try to be constructive. One of the things that falls out
of Marvins list could be a set of basic concepts/operations that need to be
defined. These include:
1) A way of describing localjob configuration, i.e. where to find the
executable, data files, etc. This should be very conservative with its
assumptions on shared file systems and accessibility. In general, what needs to
be stated here are what are the underlying aspects of the underlying resource
that are exposed to the outward facing interface.
2) A way of naming a submission point (should probably have a way of modeling
queues).
3) A core set of job management operations, submit, status, kill. These need to
be defined in such a way at to be tolerate to a variety of failure scenarios,
in that the state needs to be well defined in the case of failure.
4) A state model that one can use to describe what is going on with the jobs
and a way to access that state. Can be simple (queued, running, done),
may need to be extensible. One can view the accounting information as
being exposed
So, one thing to do would be to agree that these are (or are not) the right
four things that need to be defined and if so, start to flesh out these in a
way that supports the core use case but doesnt introduce assumptions that would
preclude more complex use cases in the future.
Carl
________________________________
From: owner-ogsa-wg@ggf.org [mailto:owner-ogsa-wg@ggf.org]
On Behalf Of Marty Humphrey
Sent: Tuesday, March 14, 2006 6:32 AM
To: ogsa-wg@ggf.org
Cc: '
Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design
efforts"
Carl,
Your comments are very important. We would love to have your active
participation in this effort. Your experience is, of course, matched by few!
I re-emphasize that this represents (my words, not anyone elses) baby stepsthat
are necessary and important for the Grid community. In my opinion, the
biggest challenge will be to fight the urge to expand the scope beyond a small
size. You cannot ignore the possibility that the GGF has NOT made as much progress
as it should have to date. Furthermore, one such plausible explanation is that
the scope is too grand.
-- Marty
________________________________
From: owner-ogsa-wg@ggf.org [mailto:owner-ogsa-wg@ggf.org]
On Behalf Of Carl Kesselman
Sent: Tuesday, March 14, 2006 8:47 AM
To:
Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design
efforts"
Hi,
While I have no wish to engage in the what is a Gridargument, there are some
elements of your base use case that I would be concerned about.
Specifically, the assumption that the submission in into a local
clusteron which there is an existing account may lead one to a solution that
may not generalize to the solution to the case of submission across autonomous
policy domains. I would also argue that ignoring issues of fault
tolerance from the beginning is also problematic. One must at least
design operations that are restartable (for example at most once submission
semantics).
I would finally suggest that while examining existing job schedule systems is a
good thing to do, we should also examine existing remote submission systems
(dare I say Grid systems). The basic HPC use case is one in which there
is a significant amount implementation and usage experience.
Thanks,
Carl
________________________________
From: owner-ogsa-wg@ggf.org [mailto:owner-ogsa-wg@ggf.org]
On Behalf Of
Sent: Monday, March 13, 2006 2:42 PM
To: Ian Foster; ogsa-wg@ggf.org
Cc:
Subject: RE: [ogsa-wg] Paper proposing "evolutionary vertical design
efforts"
Hi;
Ian, you are correct that I view job submission to a cluster as being one of
the simplest, and hence most basic, HPC use cases to start with. Or, to be
slightly more general, I view job submission to a black boxthat can run jobs be
it a cluster or an SMP or an SGI NUMA machine or what-have-you as being the
simplest and hence most basic HPC use case to start with. The key
distinction for me is that the internals of the boxare for the most part not
visible to the client, at least as far as submitting and running compute jobs
is concerned. There may well be a separate interface for dealing with
things like system management, but I want to explicitly separate those things
out in order to allow for use of boxesthat might be managed by proprietary
means or by means obeying standards that a particular job submission client is
unfamiliar with.
I think the use case that Ravi Subramaniam posted to this mailing list back on
2/17 is a good one to start a discussion around. However, Id like to
present it from a different point-of-view than he did. The manner in which the
use case is currently presented emphasizes all the capabilities and services
needed to handle the fully general case of submitting a batch job to a
computing utility/service. Thats a great way of producing a taxonomy
against which any given system or design can be compared to see what it has to
offer. I would argue that the next step is to ask whats the simplest
subset that represents a useful system/design and how should one categorize the
various capabilities and services he has identified so as to arrive at
meaningful components that can be selectively used to obtain progressively more
capable systems.
Another useful exercise to do is to examine existing job scheduling systems in
order to understand what they provide. Since in the real world we will
have to deal with the legacy of existing systems it will be important to
understand how they relate to the use cases we explore. In the same vein,
it will be important to take into account and understand other existing
infrastructures that people use that are related to HPC use cases. Im
thinking of things like security infrastructures, directory services, and so
forth. From the point-of-view of managing complexity and reducing
total-cost-of-ownership, it will be important to understand the extent to which
existing infrastructure and services can be reused rather than reinvented.
To kick off a discussion around the topic of a minimalist HPC use case, I
present a straw man description of such below and then present a first attempt
at categorizing various areas of extension. The categorization of
extension areas is not meant to be complete or even all that carefully
thought-out as far as componentization boundaries are concerned; it is merely
meant to be a first contribution to get the discussion going.
A basic HPC use case: Compute cluster embedded within an organization.
· This is your basic batch job scheduling
scenario. Only a very basic state transition diagram is visible to the
client, with the following states for a job: queued, running, finished.
Additional states -- and associated state transition request operations
and functionality -- are not supported. Examples of additional states and
associated functionality include suspension of jobs and migration of jobs.
· Only "standard" resources can be
described, for example: number of cpus/nodes needed, memory requirements, disk
requirements, etc. (think resources that are describable by JSDL).
· Once a job has been submitted it can be
cancelled, but its resource requests can't be modified.
· A distributed file system is accessible from
client desktop machines and client file servers, as well as compute nodes of
the compute cluster. This implies that no data staging is required, that
programs can be (for the most part) executed from existing file system
locations, and that no program "provisioning" is required (since you
can execute them from wherever they are already installed). Thus in this
use case all data transfer and program installation operations are the
responsibility of the user.
· Users already have accounts within the
existing security infrastructure (e.g. Kerberos). They would like to use
these and not have to create/manage additional authentication/authorization
credentials (at least at the level that is visible to them).
· The job scheduling service resides at a
well-known network name and it is aware of the compute cluster and its
resources by "private" means (e.g. it runs on the head node of the
cluster and employs private means to monitor and control the resources of the
cluster). This implies that there is no need for any sort of directory
services for finding the compute cluster or the resources it represents other
than basic DNS.
· Compute cluster system management is opaque to
users and is the concern of the compute cluster's owners. This implies
that system management is not part of the compute cluster's public job
scheduling interface. This also implies that there is no need for a
logging interface to the service. I assume that application-level logging
can be done by means of libraries that write to client files; i.e. that there
is no need for any sort of special system support for logging.
· A simple polling-based interface is the
simplest form of interface to something like a job scheduling service.
However, a simple call-back notification interface is a very useful
addition that potentially provides substantial performance benefits since it
can enable the avoidance of lots of unnecessary network traffic. Only job
state changes result in notification messages.
· There are no notions of fault tolerance. Jobs
that fail must be resubmitted by the client. Neither the cluster head
node nor its compute nodes are fault tolerant. I do expect the client
software to return an indication of failure-due-system-fault when appropriate.
(Note that this may also occur when things like network partitions
occur.)
· One does need some notion of how to deal with
orphaned resources and jobs. The notion of job lifetime and
post-expiration garbage collection is a natural approach here.
· The scheduling service provides a fixed set of
scheduling policies, with only a few basic choices (or maybe even just one),
such as FIFO or round-robin. There is no notion, in general, of SLAs
(which are a form of scheduling policy).
· Enough information must be returned to the
client when a job finishes to enable basic accounting functionality. This
means things like total wall-clock time the job ran and a summary of resources
used. There is not a need for the interface to support any sort of
grouping of accounting information. That is, jobs do not need to be
associated with projects, groups, or other accounting entities and the job
scheduling service is not responsible for tracking accounting information
across such entities. As long as basic resource utilization information
is returnable for each job, accounting can be done externally to the job
scheduling service. I do assume that jobs can be uniquely identified by
some means and can be uniquely associated with some principal entity existing
in the overall system, such as a user name.
· Just as there is no notion of requiring the
job scheduling service to track any but the most basic job-level accounting
information, there is no notion of the service enforcing quotas on jobs.
· Although it is generally useful to separate
the notions of resource reservation from resource usage (e.g. to enable
interactive and debugging use of resources), it is not a necessity for the most
basic of job scheduling services.
· There is no notion of tying multiple jobs
together, either to support things like dependency graphs or to support things
like workflows. Such capabilities must be implemented by clients of the
job scheduling service.
Interesting extension areas:
· Additional scheduling policies
o Weighted fair-share, &
o Multiple queues
o SLAs
o ...
· Extended resource descriptions
o Additional resource types, such as GPUs
o Additional types of compute resources, such as desktop
computers
o Condor-style class ads
· Extended job descriptions (as returned
to requesting clients and sys admins)
· Additional classes of security
credentials
· Reservations separated from execution
o Enabling interactive and debugging jobs
o Support for multiple competing schedulers (incl.
desktop cycle stealing and market-based approaches to scheduling compute
resources)
· Ability to modify jobs during their
existence
· Fault tolerance
o Automatic rescheduling of jobs that failed due to
system faults
o Highly available resources: This is partly a
policy statement by a scheduling service about its characteristics and partly
the ability to rebind clients to migrated service endpoints
· Extended state transition diagrams and
associated functionalities
o Job suspension
o Job migration
o &
· Accounting & quotas
· Operating on arrays of jobs
· Meta-schedulers, multiple schedulers,
and ecologies and hierarchies of multiple schedulers
o Meta-schedulers
· Hierarchical job scheduling with a
meta-scheduler as the only entry point; forwarding jobs to the meta-scheduler
from other subsidiary schedulers
o Condor-style matchmaking
· Directory services
o Using existing directory services
o Abstract directory service interface(s)
· Data transfer topics
o Application data staging
· Naming
· Efficiency
· Convenience
· Cleanup
o Program staging/provisioning
· Description
· Installation
· Cleanup
Marvin.
________________________________
From: Ian Foster [mailto:foster@mcs.anl.gov]
Sent: Monday, February 20, 2006 9:20 AM
To:
Cc:
Subject: Re: [ogsa-wg] Paper proposing "evolutionary vertical design
efforts"
Dear All:
The most important thing to understand at this point (IMHO) is the scope of
this "HPC use case," as this will determine just how minimal we can
be.
I get the impression that the principal goal may be "job submission to a
cluster." Is that correct? How do we start to circumscribe the scope more
explicitly?
Ian.
At 05:45 AM 2/16/2006 -0800,
Enclosed is a paper that advocates an additional set of activities that the
authors believe that the OGSA working groups should engage in.
Broadly speaking, the OGSA and related working groups are already doing a bunch
of important things:
· There is broad
exploration of the big picture, including enumeration of use cases, taxonomy of
areas, identification of research issues, etc.
· There is work going on
in each of the horizontal areas that have been identified, such as
· There is working going
around individual specifications, such as BES, JSDL, etc.
Given that individual specifications are beginning to come to fruition, the
authors believe it is time to also start defining vertical profilesthat
precisely describe how groups of individual specifications should be employed
to implement specific use cases in an interoperable manner. The authors
also believe that the process of defining these profiles offers an opportunity
to close the design loopby relating the various on-going protocol and standards
efforts back to the use cases in a very concrete manner. This provides an
end-to-end setting in which to identify holes and issues that might require additional
protocols and/or (incremental) changes to existing protocols. The paper
introduces both the general notion of doing focused vertical design effortsand
then focuses on a specific vertical design effort, namely a minimal HPC design.
The paper derives a specific HPC design in a first principlesmanner since the
authors believe that this increases the chances of identifying issues. As
a consequence, existing specifications and the activities of existing working
groups are not mentioned and this paper is not an attempt to actually define a
specifications profile. Also, the absence of references to existing work
is not meant to imply that such work is in any way irrelevant or inappropriate.
The paper should be viewed as a first abstract attempt to propose a new
kind of activity within OGSA. The expectation is that future open
discussions and publications will explore the concrete details of such a
proposal.
This paper was recently sent to a few key individuals in order to get feedback
from them before submitting it to the wider GGF community. Unfortunately that
process took longer than intended and some members of the community may have
already seen a copy of the paper without knowing the context within it was
written. This email should hopefully dispel any misconceptions that may
have occurred.
For those people who will be around on for the F2F meetings on Friday,
_______________________________________________________________
Ian Foster
www.mcs.anl.gov/~foster
<http://www.mcs.anl.gov/~foster>
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The
Tel: 630 252 4619
Fax: 630 252
1997
Globus Alliance, www.globus.org
<http://www.globus.org/> <http://www.globus.org/>
_______________________________________________________________
Ian Foster
www.mcs.anl.gov/~foster
<http://www.mcs.anl.gov/~foster>
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The
Tel: 630 252 4619
Fax:
630 252 1997
Globus Alliance, www.globus.org
<http://www.globus.org/>
_______________________________________________________________
Ian Foster
www.mcs.anl.gov/~foster
<http://www.mcs.anl.gov/~foster>
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The
Tel: 630 252 4619
Fax:
630 252 1997
Globus Alliance, www.globus.org
<http://www.globus.org/>
_______________________________________________________________
Ian Foster www.mcs.anl.gov/~foster
<http://www.mcs.anl.gov/~foster>
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The
Tel: 630 252 4619
Fax:
630 252 1997
Globus Alliance, www.globus.org
<http://www.globus.org/>