Hi, In some fault-tollerant at the DRM-level scenarios job must be marked as "rerunnable". Do we want to add this attribute to the DRMAAv2 JobTemplate? Cheers, On 23 August 2010 15:42, Daniel Templeton <daniel.templeton@oracle.com> wrote:
I have a customer who has the resubmission of failed jobs in a greater workflow as a critical requirement. That's not actually something that OGE itself supports, so I'm all for having it in DRMAA to plug the hole.
Daniel
On 08/23/10 02:50 AM, Peter Tröger wrote:
We already have some understanding of persistency, so the implementation effort is manageable. I am more concerned about a clear separation of live monitoring information and original submission data. For the latter, I saw no use case so far ...
Best, Peter.
Am 29.07.2010 um 11:02 schrieb Andre Merzky:
Our use case for having access to the original complete job template is that the user can easily resubmit the same job - just changing for example some command line parameter, but leaving the remainder fixed. In SAGA this would look like:
saga::job::service js ("drmaa://torque.remote.net/"); saga::job::job j1 = js.get_job (jobid); // std::string saga::job::description jd = j1.get_description ();
jd.set_attributes ("Arguments", new_args); // std::vector<std::string>
saga::job::job j2 = js.create_job (jd);
I understand that the backend may no be able to keep the original job template - in that case, a 'DoesNoExist' exception on 'get_description()' would be appropriate, IMHO. If the DRMAA implementation can cache that description somewhere, fine :-)
My $0.02, Andre.
PS: saga::job::description == drmaa::job::template
Quoting [Peter Tr?ger] (Jul 29 2010):
From: Peter Tröger<peter@troeger.eu> Date: Thu, 29 Jul 2010 10:07:23 +0200 To: Mariusz Mamo??ski<mamonski@man.poznan.pl>, drmaa-wg@ogf.org Subject: Re: [DRMAA-WG] Monitoring JobTemplate attributes for running jobs
Am 28.07.2010 um 23:42 schrieb Mariusz Mamo??ski:
Hi,
Hi, Agenda item #8 was not discussed in the call today, but it is the burning issue for me at the moment. Please have a look in the "Attributes in JobInfo" tab: http://spreadsheets.google.com/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGth... Currently, we allow to access the original JobTemplate from a JobInfo object. The idea was to get, beside the job monitoring information, also the information about what was demanded at submission time. While doing the Condor mapping, I figured out that most of the JobTemplate attributes are also monitorable for a running job. This includes things such as executable name and working directory. Normally they should be the same as in the JobTemplate, but Condor and SGE (at least) have this magic job wrapper stuff, were the admin can automatically and silently reconfigure / reinterprete everything in a JobTemplate. This might lead to the situation were the user asks for A, and silently gets B. The question: Should we drop the support for getting the JobTemplate as part of JobInfo, because the information is useless ? Instead, we could add some (or maybe most) of the JobTemplate attributes as true dynamic monitoring information to JobInfo. in my opinion repeating almost all attributes in this case brings additional redundancy in the DRMAA API (another reason may be
2010/7/28 Peter Tröger<peter@troeger.eu>: performance - the JobTemplate attribute are more likely immutable). Why not simply request expected behavior in the spec? e.g.: a) the JobTemplate being part of the JobInfo struct is a reference to the JobTemplate used for submission (for jobs submitted outside the session it MUST be NULL) b) the JobTemplate reflects actual attributes of a job (without obligation that all attributes must be available - e.g. in Torque the actually executed command is hidden in script)
Th interesting thing is that we already started to do this replication, for example: JobTemplate::candidateMachines vs. JobInfo::allocatedMachines. I still vote for finishing this replication, and remove the JT reference from JobInfo as compensation. I also have a problem with fetching live data from a structure called "template".
You example from Torque underlines my argumentation - we should choose a monitorable sub set of JobTemplate and add it to the JobInfo structure, instead of linking the JobTemplate directly.
Any other opinions ?
Peter. -- Nothing is ever easy.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg -- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- Mariusz