
Dear Etienne,
- For all future OGF PGI telephone conferences, is it possible that a secretary or a chair takes meeting notes, then writes them down in a understandable form, and publish them at the above mentioned page ?
Johannes did a well job and took notes of the last meeting - he will be put them in gridforge soon and highlight the most important points today in the telcon.
-Strawman Rendering ------------------- -I will work on the ODT version of 'Strawman Rendering' at -http://forge.gridforum.org/sf/go/doc15628?nav=1 in order to : - -- include the above precisions on states, - -- include the 'Types of grid Jobs' section of my 'PGI Execution Service -Overview' document,
Nice - please don't forget to track changes. Take care, Morris ------------------------------------------------------------ Morris Riedel SW - Engineer Distributed Systems and Grid Computing Division Jülich Supercomputing Centre (JSC) Forschungszentrum Juelich Wilhelm-Johnen-Str. 1 D - 52425 Juelich Germany Email: m.riedel@fz-juelich.de Info: http://www.fz-juelich.de/jsc/JSCPeople/riedel Phone: +49 2461 61 - 3651 Fax: +49 2461 61 - 6656 Skype: MorrisRiedel "We work to better ourselves, and the rest of humanity" Sitz der Gesellschaft: Jülich Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498 Vorsitzende des Aufsichtsrats: MinDirig'in Bärbel Brumme-Bothe Vorstand: Prof. Dr. Achim Bachem (Vorsitzender), Dr. Ulrich Krafft (stellv. Vorsitzender)
------Original Message----- -From: Etienne URBAH [mailto:urbah@lal.in2p3.fr] -Sent: Thursday, August 13, 2009 11:50 PM -To: balazs.konya@hep.lu.se; Riedel, Morris; pgi-wg@ogf.org -Cc: lodygens@lal.in2p3.fr; edges-na3@mail.edges-grid.eu -Subject: OGF PGI - Job State Model - Execution Service Strawman - -Balazs, Morris and all, - - -Concerning the last OGF PGI telephone conference on 05 August 2009 : - - -Meeting notes -------------- -I see NO meeting notes about this telephone conference at -http://forge.gridforum.org/sf/discussion/do/listTopics/projects.pgi- -wg/discussion.meetings - -So I am working with my own (fragmentary) notes. - -For all future OGF PGI telephone conferences, is it possible that a -secretary or a chair takes meeting notes, then writes them down in a -understandable form, and publish them at the above mentioned page ? - - -Creation of a 'Submitted:Hold' substate ? ------------------------------------------ -First, as general rules, I consider that : - -- In order to AVOID keeping (potentially large) grid resources while -NOT computing, grid Jobs should be designed to be processed completely -automatically, with NO provision for 'Hold' substates, - -- A grid Job needing many 'Hold' substates can NOT be handled by an -automatic Submitter, but should be submitted by a human grid User as an -'Interactive Job', as described for example at -https://edms.cern.ch/file/722398//gLite-3- -UserGuide.html#SECTION00084400000000000000 - - - -Someone asked for the creation of a 'Hold' substate inside the -'Submitted' state, like inside other states. - -This 'Submitted:Hold' substate would make sense only if the Job -Submitter could perform an operation on this substate. - -In order to request such an operation, the Job Submitter needs the Jobid -(or Job EPR). - -This Jobid (or Job EPR) is guaranteed to be allocated by the Execution -Service only at the END of the 'Submitted' state, but NOT before. - -Therefore, I consider that the 'Submitted' state can NOT contain a -'Hold' substate. - -If anyone thinks otherwise, can he/she please present a convincing Use -Case ? - - -Precisions about the 'Finished with Success or Error' state ------------------------------------------------------------ -Someone asked that the 'Error' case of the 'Finished with Success or -Error' state should be moved to the 'Failed' state. - -In fact, inside the current Job State Model, a Job reaches the 'Finished -with Success or Error' state if and only if it successively reached the -end of following states, without failure or cancellation at the JOB level : -- 'Pre-processing' -- 'Delegated', whatever the Application result : - - Success = Application return code equal to zero - - Error = Application return code different of zero -- 'Post-processing' - -Inside the 'Finished with Success or Error' state : -- Success means 'Application return code was equal to zero', -- Error means 'Application return code was different of zero'. - -I copied this behavior from the Job State Model of 'gLite', where the -'Done' state contains both the 'Success' and 'Exit Code !=0' cases, as -can be seen in the 'bookkeeping information' at -https://edms.cern.ch/file/722398//gLite-3- -UserGuide.html#SECTION00084100000000000000 - - -I consider this behavior design, and the strong separation between the -'Failed' and 'Finished with Success or Error' states, as fully justified -by following reasons : - -- Whenever a Job reaches the 'Failed' state, the grid Execution Service -detected an unrecoverable inconsistency at the JOB level. - Therefore, the Job output sandbox and the post-processed Application -output files can potentially be NOT consistent and NOT even accessible -by the Job Submitter. - In order to investigate the Job failure, the grid User then needs -some grid knowledge (and often experience and expertise) to retrieve and -interpret : - - the Job failure code and message, - - the Job logging and bookkeeping, in comparison with the Job -description. - This 'grid level' investigation can sometimes prove that the cause -of the Job failure came from the Application, but is ALWAYS necessary. - -- Whenever a Job reaches the 'Finished with Success or Error' state, -the grid Execution Service could create the Job output sandbox, and -perform post-processing on Application output files, WITHOUT detecting -any unrecoverable inconsistency at the JOB level. - Therefore, the Job output sandbox, and the post-processed -Application output files, can be supposed to be consistent and easily -accessible by the Job Submitter. - On a non-zero return code of the Application, the grid User : - - first has to look (WITHOUT needing any grid knowledge) at the Job -output sandbox and at the post-processed Application output files for an -Application problem, - - before, if necessary, using grid knowledge (and often experience -and expertise) to provide any evidence that the Application error was -caused by a faulty Job description, the Batch system, or the grid -Execution Service. - -As a summary, I consider that the 'Error' case of the 'Finished with -Success or Error' state should be kept as it is, and NOT be moved to the -'Failed' state. - -If anyone thinks otherwise, can he/she please present convincing reasons ? - - -Strawman Rendering ------------------- -I will work on the ODT version of 'Strawman Rendering' at -http://forge.gridforum.org/sf/go/doc15628?nav=1 in order to : - -- include the above precisions on states, - -- include the 'Types of grid Jobs' section of my 'PGI Execution Service -Overview' document, - -- check consistency, and present the relationships between the -operations described in chapter 2 'Interface: Execution Port-Type' and -the different states of the different types of grid Jobs. - - -Joining +9900827049931906 (plus perhaps Skype typing) on Friday 14 -August 2009 at 16h CET. - -Best regards. - ------------------------------------------------------ -Etienne URBAH LAL, Univ Paris-Sud, IN2P3/CNRS - Bat 200 91898 ORSAY France -Tel: +33 1 64 46 84 87 Skype: etienne.urbah -Mob: +33 6 22 30 53 27 mailto:urbah@lal.in2p3.fr ------------------------------------------------------