Hiro:
So our email exchange has gone as follows:
* You mentioned in email to OGSA-WG that a certain set of execution
management commands have been defined as out of scope of BES because they
relate to "job management" instead of "container
management."
* I said that this partitioning of function didn't make sense to me, for
reasons that I listed.
* You replied by re-iterating the position that the BES charter is
defined to exclude what you term as "job management"
functions.
In my view, this reply doesn't address the issues that I raised, in that
it just restates the position I was objecting to.
I also note that the BES charter on gridforge.org notes that
"The scope of the
working group is on the definition of services for service instantiation
and management. For example, using the OGSA V1 nomenclature,
service containers and jobs."
Thus I don't even see that this limitation is justified based on the
charter.
I'll stop raising the issue soon, but I do so one
last time because I am concerned that a BES specification that is defined
so narrowly is not going to be useful and thus will not gain broad
acceptance.
Regards,
Ian.
At 01:58 AM 5/26/2005 +0900, Hiro Kishimoto wrote:
Hi Ian,
We've discuss your input on Sunday (at OGSA-BES meeting) and Monday
(at OGSA-WG meeting). The attached is an understanding of the
attendees.
Each box corresponds to EMS service and each dotted line shows
coverage
of OGSA-BES WG's charter, OGSA-RSS WG's (initial) charter, and GRAM
interface.
Some comments;
- We think MJFS corresponds to container instead of Job Manager.
- MJFS and others cover most of "container" interface but not
all.
For example, BES-WG will define check-pointing interface which
is not supported by GRAM.
- GRAM covers "job" interface, which is out of BES-WG's
scope.
Meeting minutes will be available shortly and you can find more
detail
by the minutes.
Thanks,
----
Hiro Kishimoto
-----Original Message-----
From: Ian Foster
[mailto:foster@mcs.anl.gov]
Sent: Monday, May 23, 2005 11:20 PM
To: Hiro Kishimoto; 'ogsa-wg'; OGSA-BES-bof@ggf.org
Subject: RE: [ogsa-wg] Perhaps useful input to BES discussion
Hiro:
The "BES is for container not job manager" argument doesn't
make sense to me.
The question of where you are permitted to direct operations for purposes
of
monitoring and control--to container, job, or both--is orthogonal to
the
question of what operations need to be supported.
The draft BES document defines "check status" and
"terminate" operations, which
are certainly required. However, more are needed, e.g.:
* soft-state lifetime management, to avoid orphan jobs
* subscribe-on-status-change operations, to avoid repeated
polling.
Simply saying "we're not going to consider those because they are
defined in
WSRF" makes no sense to me. WSRF also defines "check
status" and "terminate"
operations, but you're not ignoring those.
Another generic issue that is not addressed in the BES document is how
you model
the state associated with the factory and an individual job. Regardless
of how
you choose to provide access to that state, via standardized WSRF
operations or
some custom operations, a schema needs to be defined implicitly or
explicitly,
and this must surely encompass more than just "job status."
E.g., see below for
those defined in GT4 GRAM.
With respect to your questions below:
#1: Yes, in my view.
#2: I certainly think you need to consider and address these issues
together.
Ian.
Job modeling, from
http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Public_Interface
s.html#id2844424
2.3.2. Managed Job Port Type
. serviceLevelAgreement: A wrapper around fields containing the
single-job and
multi-job descriptions or RSLs. Only one of these sub-fields shall have
a
non-null value.
. state: The current state of the job.
. fault: The fault (if generated) indicating the reason for failure of
the job
to complete.
. localUserId: The job owner's local user account name.
. userSubject: The GSI certificate DN of the job owner.
. holding: Indicates whether a hold has been placed on this job.
2.3.3. Managed Executable Job Port Type
. stdoutURL: A GridFTP URL to the file generated by the job which
contains the
stdout.
. stderrURL: A GridFTP URL to the file generated by the job which
contains the
stderr.
. credentialPath: The path (relative to the job process) to the file
containing
the user proxy used by the job to authenticate out to other services.
. exitCode: The exit code generated by the job process.
2.3.4. Managed Multi-Job Port Type
. subJobEndpoint: A set of endpoint references to the sub-jobs created by
this
multi-job.
2.3.5. Faults
. FaultType: This is the base fault for runtime errors that occur while
managing
a job. It extends the OGSI FaultType.
. CredentialSerializationFaultType: This fault indicates that the managed
job
service was unable to serialize or deserialize a delegated credential.
. InsufficientCredentialsFaultType: This fault indicates that the managed
job
service was unable to perform some action on behalf of the owner of the
job
submission because the owner has delegated insufficient credentials.
. InternalFaultType: This fault indicates that an internal operation
failed.
. InvalidCredentialsFaultType: This fault indicates that the managed job
service
was unable to use a delegated credential.
. ServiceLevelAgreementFaultType: Fault for runtime errors which are
directly
related to a particular part of the ServiceLevelAgreement document passed
to the
createService method. This fault type contains the fragment of the
ServiceLevelAgreement related to the fault as one of its elements.
. ExecutionFailedFaultType: This fault indicates that the Managed Job
service
was unable to begin the execution of the job.
. FilePermissionsFaultType: This fault indicates that the ManagedJob
service
does not have permissions to access a file referenced in the
ServiceLevelAgreement.
. InvalidPathFaultType: This fault indicates that a file or directory
path
referenced in the ServiceLevelAgreement contains an invalid path.
. StagingFaultType: This fault indicates that part of the file
staging
requirements of the ServiceLevelAgreement could not be completed.
. UnsupportedFeatureFaultType: This fault indicates that an error
occurred
because the RSL depended on a feature not implemented by a particular
GRAM
scheduler.
At 04:53 PM 5/23/2005 +0900, Hiro Kishimoto wrote:
Hi Ian,
Thank you for your excellent and thoughtful document!
Yes, we have had a very related discussion at the meeting
yesterday.
We've discussed that BES defines subset of your 8 operation (1, 2,
7,
and 8). Please remember BES is for Container not for Job
Manager.
The climate of the meeting is "container (factory) interface only,
no
job interface." And the reason is operation 2 and 7 are already
specified
in WSRF.
However, I still wondering the following two issues;
(1) Even though interface is already defined in the WSRF, don't we
need
to define domain-specific semantics and behavior (e.g. job destroy
means
soft kill).
(2) Given that Job Manager defines Job interface explained in Ian's
document, combination of Job Manager and Container introduces
unexpected complexity in EMS architecture? (Job itself has its own
interface in the context of Job Manager but has no interface in the
context of container).
Your thoughts?
----
Hiro Kishimoto
-----Original Message-----
From: owner-ogsa-wg@ggf.org
[mailto:owner-ogsa-wg@ggf.org]
On Behalf Of Ian
Foster
Sent: Sunday, May 22, 2005 7:37 AM
To: ogsa-wg; OGSA-BES-bof@ggf.org
Subject: [ogsa-wg] Perhaps useful input to BES discussion
Dear All:
I am sending this draft document in case it is relevant to the OGSA-WG
and/or
BES discussions.
In this document, I use a simple example (a skeleton execution service)
to
compare and contrast four approaches to representing state, namely
WSRF,
WS-Transfer, REST, and "state id."
I haven't sent this earlier because I'd hoped to integrate numerous
comments
that I've received from Savas and others. I hope to do so in the next
week or
two, but perhaps this draft is still of interest.
Regards -- Ian.
_______________________________________________________________
Ian
Foster
www.mcs.anl.gov/~foster
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The University of
Chicago
Argonne, IL 60439, U.S.A. Chicago, IL 60637,
U.S.A.
Tel: 630 252
4619
Fax: 630 252 1997
Globus Alliance,
www.globus.org
_______________________________________________________________
Ian
Foster
www.mcs.anl.gov/~foster
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The University of
Chicago
Argonne, IL 60439, U.S.A. Chicago, IL 60637,
U.S.A.
Tel: 630 252
4619
Fax: 630 252 1997
Globus Alliance,
www.globus.org
_______________________________________________________________
Ian
Foster
www.mcs.anl.gov/~foster
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The University of
Chicago
Argonne, IL 60439, U.S.A. Chicago, IL 60637,
U.S.A.
Tel: 630 252
4619
Fax: 630 252 1997
Globus Alliance,
www.globus.org