Hiro:
The "BES is for container not job manager" argument doesn't
make sense to me. The question of where you are permitted to direct
operations for purposes of monitoring and control--to container, job, or
both--is orthogonal to the question of what operations need to be
supported.
The draft BES document defines "check status" and
"terminate" operations, which are certainly required. However,
more are needed, e.g.:
* soft-state lifetime management, to avoid orphan jobs
* subscribe-on-status-change operations, to avoid repeated
polling.
Simply saying "we're not going to consider those because they are
defined in WSRF" makes no sense to me. WSRF also defines "check
status" and "terminate" operations, but you're not
ignoring those.
Another generic issue that is not addressed in the BES document is how
you model the state associated with the factory and an individual job.
Regardless of how you choose to provide access to that state, via
standardized WSRF operations or some custom operations, a schema needs to
be defined implicitly or explicitly, and this must surely encompass more
than just "job status." E.g., see below for those defined in
GT4 GRAM.
With respect to your questions below:
#1: Yes, in my view.
#2: I certainly think you need to consider and address these issues
together.
Ian.
Job modeling, from
http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Public_Interfaces.html#id2844424
2.3.2. Managed Job Port Type
- serviceLevelAgreement: A wrapper around fields
containing the single-job and multi-job descriptions or RSLs. Only one of
these sub-fields shall have a non-null value.
- state: The current state of the job.
- fault: The fault (if generated) indicating the
reason for failure of the job to complete.
- localUserId: The job owner's local user account
name.
- userSubject: The GSI certificate DN of the job
owner.
- holding: Indicates whether a hold has been
placed on this job.
2.3.3. Managed Executable Job Port
Type
- stdoutURL: A GridFTP URL to the file generated
by the job which contains the stdout.
- stderrURL: A GridFTP URL to the file generated
by the job which contains the stderr.
- credentialPath: The path (relative to the job
process) to the file containing the user proxy used by the job to
authenticate out to other services.
- exitCode: The exit code generated by the job
process.
2.3.4. Managed Multi-Job Port
Type
- subJobEndpoint: A set of endpoint references to
the sub-jobs created by this multi-job.
2.3.5.
Faults
- FaultType: This is the base fault for runtime
errors that occur while managing a job. It extends the OGSI
FaultType.
- CredentialSerializationFaultType: This fault
indicates that the managed job service was unable to serialize or
deserialize a delegated credential.
- InsufficientCredentialsFaultType: This fault
indicates that the managed job service was unable to perform some action
on behalf of the owner of the job submission because the owner has
delegated insufficient credentials.
- InternalFaultType: This fault indicates that an
internal operation failed.
- InvalidCredentialsFaultType: This fault
indicates that the managed job service was unable to use a delegated
credential.
- ServiceLevelAgreementFaultType: Fault for
runtime errors which are directly related to a particular part of the
ServiceLevelAgreement document passed to the createService method. This
fault type contains the fragment of the ServiceLevelAgreement related to
the fault as one of its elements.
- ExecutionFailedFaultType: This fault indicates
that the Managed Job service was unable to begin the execution of the
job.
- FilePermissionsFaultType: This fault indicates
that the ManagedJob service does not have permissions to access a file
referenced in the ServiceLevelAgreement.
- InvalidPathFaultType: This fault indicates that
a file or directory path referenced in the ServiceLevelAgreement contains
an invalid path.
- StagingFaultType: This fault indicates that
part of the file staging requirements of the ServiceLevelAgreement could
not be completed.
- UnsupportedFeatureFaultType: This fault
indicates that an error occurred because the RSL depended on a feature
not implemented by a particular GRAM scheduler.
At 04:53 PM 5/23/2005 +0900, Hiro Kishimoto wrote:
Hi Ian,
Thank you for your excellent and thoughtful document!
Yes, we have had a very related discussion at the meeting
yesterday.
We've discussed that BES defines subset of your 8 operation (1, 2,
7,
and 8). Please remember BES is for Container not for Job
Manager.
The climate of the meeting is "container (factory) interface only,
no
job interface." And the reason is operation 2 and 7 are already
specified
in WSRF.
However, I still wondering the following two issues;
(1) Even though interface is already defined in the WSRF, don't we
need
to define domain-specific semantics and behavior (e.g. job destroy
means
soft kill).
(2) Given that Job Manager defines Job interface explained in Ian's
document, combination of Job Manager and Container introduces
unexpected complexity in EMS architecture? (Job itself has its own
interface in the context of Job Manager but has no interface in the
context of container).
Your thoughts?
----
Hiro Kishimoto
-----Original Message-----
From: owner-ogsa-wg@ggf.org
[mailto:owner-ogsa-wg@ggf.org]
On Behalf Of Ian
Foster
Sent: Sunday, May 22, 2005 7:37 AM
To: ogsa-wg; OGSA-BES-bof@ggf.org
Subject: [ogsa-wg] Perhaps useful input to BES discussion
Dear All:
I am sending this draft document in case it is relevant to the OGSA-WG
and/or
BES discussions.
In this document, I use a simple example (a skeleton execution service)
to
compare and contrast four approaches to representing state, namely
WSRF,
WS-Transfer, REST, and "state id."
I haven't sent this earlier because I'd hoped to integrate numerous
comments
that I've received from Savas and others. I hope to do so in the next
week or
two, but perhaps this draft is still of interest.
Regards -- Ian.
_______________________________________________________________
Ian
Foster
www.mcs.anl.gov/~foster
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The University of
Chicago
Argonne, IL 60439, U.S.A. Chicago, IL 60637,
U.S.A.
Tel: 630 252
4619
Fax: 630 252 1997
Globus Alliance,
www.globus.org
_______________________________________________________________
Ian
Foster
www.mcs.anl.gov/~foster
Math & Computer Science Div. Dept of Computer Science
Argonne National Laboratory The University of
Chicago
Argonne, IL 60439, U.S.A. Chicago, IL 60637,
U.S.A.
Tel: 630 252
4619
Fax: 630 252 1997
Globus Alliance,
www.globus.org