Ian,
Thanks
for you response.
I joined the ogsa-bes-wg mailing list last week
and I am looking for hearing more about the progress made in this area. I will
look for the ESI specification in the ogsa-bes-wg mailing
archive.
Thanks
Susanne
Susanne:
I'd like to respond to your
comments.
I believe that the reference to "network partitions" refers
to the fact that in a distributed environment, unlike a single machine
environment, we cannot be sure that messages will be delivered: a network
failure can result in any message being lost. Thus, a job submission may not
receive a response, and in that case, we cannot know whether the job was
submitted (i.e., the request got through, but the response was lost) or not
(i.e., the request was lost).
One convenient way of dealing with this
problem is to allow users to associate a "unique job id" with a job request. A
scheduler that receives a second or subsequent submission with the same jobid
should simply return the response it provided to the first request
received.
It's true that a user can achieve a similar effect by
searching for the submitted job in the scheduler queue. However, this approach
is more complex to do, and also suffers from the problem that the job might
have already completed and thus can't be found that way.
In our
recently circulated ESI specification, we proposed an optional "unique job id"
field in JSDL as a way of addressing this requirement. This notion was
discussed on a BES call, and people seemed sympathetic to the
idea.
Ian.
At 12:20 PM 4/28/2006 -0400, Balle, Susanne
wrote:
Marvin,
Enclosed find the
remaining of my comments:
Page 5. (top paragraph) I think I know what
you mean by "with the
ambiguity of distinguishing between scheduler
crashes and network
partitions ". "scheduler crashes" is obvious. I am
assuming that by
"network partitions" you are inferring that various
sub-networks are
going to have different response time which will have an
effect on the
time it takes to deliver a call-back
message.
Reading further along in the same paragraph I am now not
sure I know
what you mean by "network partitions".
Page 5. Section
3.3
The topic of this section is clear (described in the first line of
the
paragraph) but of the section is a little confusing.
"possibility that a client cannot directly tell whether its
job
submission request has been successful ..." --> Do we expect the
client
to re-submit the job if the submission failed or do we expect
users to
inspect that their job has in fact been submitted and resubmit
if
needed? I am wondering if we assume the later if that wouldn't result
in
users re-launching their jobs several time if they do not see their
job
listed in some state when pulling the job scheduler for the state
of
their job?
I guess I do not understand why so much emphasis is
put on the
"At-Most-Once" or "Exactly-Once".
Can't the client poll
the Job scheduler and ask the JS for a list of
jobs queued, running,
terminated, failed, etc.? It might be useful for
the client to be able to
submit jobs with a special keyword like
JOB_SUBMITTED_BY since that would
reduce the list it gets back. It would
be nice if the value for the
keyword was a unique identifier but doesn't
have to be. Most schedulers
allows you to name or associate a group to
programs so that feature could
be used as special keyword.
Page 6. Section 3.4
General question:
Are you taking into account that user applications
will require different
software?
1. For example if my executable is compiled for Linux,
Intel platform
then I would like to run it on a Linux,Intel system and
not a Linux,AMD
system.
2. Are you assuming that the program will
be compiled on the fly on the
allocated system? or pre-compiled and then
staged?
I agree that staging the data is going to be an interesting
topic.
All this is probably out-of-band for the HPC JS Profile but
should be
considered somewhere. I am sure it is I just don't know
where.
I like the section on virtual machines and think that they
will be used
more and more in the future.
Page 7. Extended
Resource Features
The second approach (arbitrary resource types ...) is
the only one that
make sense to me since that approach is extensible. I
believe that Moab
is implementing this approach as well.
Page 8.
Extended Client/System Administrator Operations
Are you assuming that
System Administrators will be able to perform sys
admin operations on
somebody else's system? I don't think that is right.
You mention
suspend-resume. Are you thinking of suspending a job running
across
several clusters that are in different organizations? Or just
suspending
a job on a single cluster/server?
Again I am trying to figure out how
this fit in with "One important
aspect, is that the individual clusters
should remain under the control
of their local system administrator
and/or of their local policies".
I believe that suspend-resume is a
JS operation or an operation to be
performed by the local sys admin, NOT
by remote sys admins.
If we are now talking about a meta-scheduler
then yes it makes sense. In
the case of a meta-scheduler it might take
over the individual JS and
schedule jobs base on its own policies, on its
job reservation system,
etc. In this case I look at it as we have one
deciding entity (the
meta-scheduler) and several "slaves". Moab and Maui
are the only
meta-scheduler I an familiar with and they do take over the
scheduling
decisions/node allocations/etc and just submit jobs to the
local job
schedulers.
This does of course assume that the local
system administrators have
agreed on a schedule when their cluster is
shared within this greater
infrastructure. This is a different approach
than having jobs passed
onto their local scheduler and run on their
systems.
This just seems to be a different approach from the one that
is taken in
this paper.
I might be wrong. If I am please educate
me.
Page 9. Section 3.10
Don't forget UPC (Unified Parallel C:
http://upc.nersc.gov/). This
parallel programming paradigm is getting
more and more interest from
several communities.
We'll need to provide
support for UPC as well.
Page 10. Section 3.13
A meta-scheduler
approach that make sense to me is to allow developers
to submit their job
to their local cluster using their "favorite"
scheduler commands and then
have the meta-scheduler load-balance the
work and forward the job to
another system/cluster if needed. Moab from
cluster resources support
this approach even if the clusters have
different JSs. They have a list
of supported JS such as LSF, PBSpro,
SLURM, etc. and they can "translate"
one JS's commands into another
within that supported set.
Page 11.
SLURM is missing.
Let me know what you
think,
Regards
Susanne
---------------------------------------------------------------
Susanne
M. Balle,
Hewlett-Packard
High Performance Computing R&D
Organization
110 Spit Brook Road
Nashua, NH 03062
Phone:
603-884-7732
Fax:
603-884-0630
Susanne.Balle@hp.com
_______________________________________________________________
Ian Foster, Director, Computation Institute
Argonne National Laboratory
& University of Chicago
Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL
60439
Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637
Tel: +1 630
252 4619. Web: www.ci.uchicago.edu.
Globus Alliance: www.globus.org.