Re: [Pgi-wg] OGF PGI : Draft PGI Execution Service Specification

15 Jan 2010

      Balazs, Morris, Luigi, Johannes and all,

Lot of thanks to Bernd for his remarks and suggestions concerning the 
Draft PGI Specification.

1) CreateActivity
-----------------
    You can NOT have SIMULTANEOUSLY :
    -  Vectors of Jobs, each vector containing many jobs,
    -  Full Job Validation (including even Storage Reservation for early 
stage-in),
    -  Quick response time in order to avoid timeouts when returning JobIds.

    Therefore :
    1.1) If you need 'Vectors of Job' and 'Full Job Validation', you 
have to give up 'Quick response time', and manage timeouts.

    1.2) If you need 'Full Job Validation' and 'Quick response time', 
then you have submit only one Job at a time (give up 'Vectors of Job').

    1.3) If you need 'Vectors of Job' and 'Quick response time', then 
you must perform only XML, Schema, and Semantic validation before 
returning JobIds (give up functional validation, in particular Storage 
Reservation for early stage-in).
         This requires either that :
         - The Job Submitter repeatedly polls the Job Status, or
         - We use a messaging layer permitting notifications.

    We have to make clear choices.

2) Change activity state
------------------------
    Is Bernd taking about the 'State Model' itself ?
    If yes, I confirm that this model is absolutely necessary to verify 
that operations are consistent and can really be implemented inside an 
Execution Service.

    For an activity (= Job), we need to manage :

    -  NOT only the Processing state (start, abort, hold, resume, ...), 
where the Execution Service dedicates all needed Computing resources the 
Job,

    -  but also other lengthy states where the Execution Service can 
allocate these Computing resources to other Jobs :
       - Manual Stage-in  (or other Pre-processing  tasks),
       - Manual Stage-out (or other Post-processing tasks).

    This is clearly shown by the current 'State Model'.

7) 'Delegated' state
--------------------
    I strongly confirm that you can NOT suppose that the Execution 
Service has an intimate knowledge of the Batch System which will really 
execute the Job.
    -  Execution Services are NOT obliged to implement 'Manual 
Stage-in'.  In case of delegation to an Off-site Execution Service (like 
a different Grid middleware), Manual Stage-in simply MAY be impossible.
    -  In case of delegation to an off-site execution service, the Job 
State is 'Delegated:Running'.  An Execution Service MAY provide an EPR 
permitting the Job Submitter to query the precise Job State inside the 
Off-site Execution Service (recursively if needed).
    -  Credential delegation is a real issue which we have to 
investigate.  It can be performed automatically ONLY using Security 
profiles.  By now, the EDGeS project has solved Credential Delegation in 
an ad hoc manner.  See :
       'PGI Security Model'  at 
http://forge.gridforum.org/sf/go/doc15584?nav=1
       Presentation of Security Profiles by Morris  (I do NOT find it at 
the moment.  Morris, please help.)
       'Security related parts for GES strawman'  at 
http://forge.gridforum.org/sf/go/doc15836?nav=1

+----------------+
|  Other points  |
+----------------+

Meeting Notes
-------------
Can Johannes systematically save under GridForge the meeting notes after 
each telephone conference ?

Draft Specification Document
----------------------------
As I remember from the previous telephone conference, this document 
should have been :
-  updated with agreements achieved at this telephone conference,
-  renamed as 'PGI Execution Service Specification'

Client-side UUID for Job identification
---------------------------------------
After a little thinking :

Client-side UUID can NEVER be guaranteed to be really unique.
Any client could easily poison Grid databases with repeated identical 
UUIDs (maliciously, or by client misconfiguration, ...).

So I am most strongly opposed to use Client-side UUID for Job 
identification.

It is still possible to add in the 'PGI Execution Service Specification' 
that an Execution Service MAY accept (potentially non-unique) Client 
tags, and MAY permit the client to retrieve the list of all Jobs having 
a given tag.

Summary of previous discussions
-------------------------------
-  Consistency between the 'State model' and the Operations (which are 
supposed to be implemented using SOAP).
    There are 3 ways to implement message transfers between Job 
Submitter and Execution Service :

    - Execution Service and Job Submitter both implement notifications :
      This permits to avoid the Job Submitter to repeatedly poll the Job 
status, but this is NOT easily supported by SOAP.

    - Job submitter has to poll the Job status.
      In order to minimize polling, the Execution Service MAY perhaps 
return the stage-in location early inside the 'Submitted' state, but 
this requires that the Execution Service is really able to perform 
stage-in inside the 'Submitted' state, which is NOT always the case.

-  For the 'CreateActivity' operation :
    - The Execution Service MAY perhaps provide the stage-in location at 
this early stage (this strongly impacts the 'Submitted' state),
    - Proposal to add a 'Notification EPR' as an additional optional 
input parameter.

-  Exclude internal states
    (in particular, suppress the 'Submitted:Incoming' substate).

-  Second level Job states are also mandatory (but third level Job 
states are still optional).

-  Inside the 'Submitted' state :
    - The Execution Service can return a JobId to the submitter only if 
it has completely validated the JSDL of the Job, therefore rename 
'Waiting' substate as 'Validated',
    - The Execution Service MAY perhaps already allocate resources, 
therefore add 'Hold' substate, which could permit early stage-in.

Best regards.

-----------------------------------------------------
Etienne URBAH         LAL, Univ Paris-Sud, IN2P3/CNRS
                       Bat 200   91898 ORSAY    France
Tel: +33 1 64 46 84 87      Skype: etienne.urbah
Mob: +33 6 22 30 53 27      mailto:urbah@lal.in2p3.fr
-----------------------------------------------------

On Thu, 14/01/2010 10:59, Bernd Schuller wrote:
...
Hi PGI folks,
after reading through the current draft 0.38 from
http://forge.gridforum.org/sf/go/doc15839?nav=1 and listening to a
presentation by Morris, I want to make a few comments.
I'll try to focus on compute functionality, to keep this mail reasonably
short.
Overall I think the PGI looks very promising and I really appreciate
your hard work! Having been present in the UMD/EMI project preparation I
know exactly how hard it can be ;)
So here goes.
0) the requirements doc mentioned in the introduction is not accessible
for lesser mortals, on https://forge.gridforum.org/sf/go/doc15590
I get "permission denied". Maybe you could copy it to the pgi-wg area?
1) CreateActivity
Since the validation steps can take some time, it is impractical to wait
for these steps to finish before assigning activity IDs and returning
the response. Clients or intermediaries will run into timeouts. The
system should create the activities immediately, and assign them a state
like "new" or "validating". IMO every remote operation that can take
more than a couple of seconds to generate the response should be made
asynchronous. Just think of held locks and shared resources like DB
connections together with concurrent access by many clients... we've
been there with UNICORE and have been forced to keep web service
processing times as low as possible.
2) Change activity state
I don't really see a reason for all this generic stuff. In reality you
want to start, abort, hold, resume etc the processing of an activity, so
why not make this more explicit. A compromise might be to do
something like requestActivityStateChange("Hold"), etc, and define the
mandatory list of "target states" supported by this operation.
3) Cancel activity
Isn't this a special case of "Change activity state" ?
4) Wipe
dito
5) Delegation port type
Nice idea. However you should support also SAML assertions here (proxy
certs are so 1995!)
6) in Section 5.1.2
What does "automatic resubmission" mean? Resubmission to the batch
system? Or do you possibly see the PGI execution service as something
"above" a normal execution service (like e.g. a glite wms?). Section
5.1.7 seems to support this view.
IMO resubmitting a failed job to the batch system makes no sense, it
will probably just fail again ;) So what is the idea?
7) "Delegated" state (Section 5.1.4)
Allowing to delegate to an off-site execution service (like a different
Grid middleware) adds complexity and messes up a lot of things, like
credential delegation, state, working directory access, etc etc.
Should "PGI execute" not focus on a simple, practical service for job
execution? This forwarding business seems to be quite out of scope...
How shall manual data staging be done if the session directory is
off-site?
In the intro it lists "request routing" as a requirement, but I'd
reconsider that.
8) "Output sandbox"
I'd try to avoid glite specific terms :) Maybe the "directory containing
the output files produced by the job". At least define the term "output
sandbox" somewhere.
9) I fully support Steven's statements regarding the reuse of JSDL. In
some places you duplicate parts that already exist in JSDL and
JSDL-POSIX, sometimes with less functionality. Some examples:
- 7.2 executable name, path, arguments. This can be done by a
JSDL-Posix  element, which covers even more, such as environment,
stdout/err/in.
- 7.3.1.4 UserTag can be replaced by JSDL JobAnnotation
- 7.3.6.2 Input,output,error,environment ->  JSDL-Posix
IMO JSDL-Posix (possibly with extensions) can be used in all places
where you need to directly specify the execution of a process. Similarly
the normal Application (ApplicationName, ApplicationVersion) (again
possibly with extensions) can be used to define execution of a
pre-installed software.
10) other JSDL related comments
   - 7.3.2.9 LogDir in the interest of interoperability I'd assume that
the internals of how a middleware stores its "grid-specific diagnostics"
is irrelevant to the job description. E.g. UNICORE would store this in a
database, not in a directory on the execution system.
11) In general it is not clear to me which of these elements MUST be
supported by a PGI implementation.
12) 7.3.2.14 Start time. This is reservation functionality which opens a
new can of worms :-) What happens if the RMS does not support this, or
the request cannot be granted? If you want to support reservation, you
need to reflect this in the state model and in the possible errors a
user might get. Also reservation is not listed as a requirement in the
Introduction.
13) 7.3.2.15 Notifications This should not be "custom format" but "comma
separated list of e-mail addresses"
Summarizing: I like the port types and the basic data and execution
model, also data staging and credential delegation looks good. You
should re-consider the job description part and clearly identify the
minimal set that has to be supported by every compliant implementation.
Also I'd try to keep all implementation-specific behaviour out of the
spec, like where logs are stored and what is purged by a "purge"
operation. What is important is the behaviour and session directory
access that a user can expect of any PGI service in each activity state
(maybe a table would be helpful).
Best regards,
Bernd.
--
Dr. Bernd Schuller
Distributed Systems and Grid Computing
Juelich Supercomputing Centre, http://www.fz-juelich.de/jsc
Phone: +49 246161-8736 (fax -8556)
Personal blog: www.jroller.com/page/gridhaus
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
_______________________________________________
Pgi-wg mailing list
Pgi-wg@ogf.org
http://www.ogf.org/mailman/listinfo/pgi-wg