X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: from fmsmsx312.amr.corp.intel.com ([132.233.42.227]) by
 fmsmsx403.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211);
 Wed, 5 Apr 2006 01:36:53 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Received: from scsmsx401.amr.corp.intel.com ([10.3.90.12]) by
 fmsmsx312.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.1830);
 Wed, 5 Apr 2006 01:36:53 -0700
Received: from scsmsx331.amr.corp.intel.com ([10.3.90.4]) by
 scsmsx401.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211);
 Wed, 5 Apr 2006 01:36:53 -0700
Received: from azsmga001.ch.intel.com ([10.2.17.19]) by
 scsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211);
 Wed, 5 Apr 2006 01:36:52 -0700
Received: from azsmga101.ch.intel.com ([10.2.16.36])  by
 azsmga001.ch.intel.com with ESMTP; 05 Apr 2006 01:36:52 -0700
Received: from mailbouncer.mcs.anl.gov ([140.221.10.4])  by
 azsmga101.ch.intel.com with ESMTP; 05 Apr 2006 01:36:51 -0700
Received: by mailbouncer.mcs.anl.gov (Postfix) id 6570712C2C;
 Wed,  5 Apr 2006 03:36:49 -0500 (CDT)
Received: from localhost (localhost [127.0.0.1]) by mailbouncer.mcs.anl.gov
 (Postfix) with ESMTP id 4D3B412B31 for
 <grdfm-drmaa-wg-outgoing@mailbouncer.mcs.anl.gov>;
 Wed,  5 Apr 2006 03:36:49 -0500 (CDT)
Received: from mailbouncer.mcs.anl.gov ([127.0.0.1]) by localhost
 (mailbouncer.mcs.anl.gov [127.0.0.1]) (amavisd-new,
 port 10024) with ESMTP id 05818-01 for
 <grdfm-drmaa-wg-outgoing@mailbouncer.mcs.anl.gov>;
 Wed, 5 Apr 2006 03:36:49 -0500 (CDT)
Received: by mailbouncer.mcs.anl.gov (Postfix, from userid 83) id F270612BC7;
 Wed,  5 Apr 2006 03:36:48 -0500 (CDT)
Received: from localhost (localhost [127.0.0.1]) by mailbouncer.mcs.anl.gov
 (Postfix) with ESMTP id 5678D12BC7 for
 <grdfm-drmaa-wg@mailbouncer.mcs.anl.gov>;
 Wed,  5 Apr 2006 03:36:48 -0500 (CDT)
Received: from mailbouncer.mcs.anl.gov ([127.0.0.1]) by localhost
 (mailbouncer.mcs.anl.gov [127.0.0.1]) (amavisd-new,
 port 10024) with ESMTP id 05797-02 for
 <grdfm-drmaa-wg@mailbouncer.mcs.anl.gov>;
 Wed, 5 Apr 2006 03:36:48 -0500 (CDT)
Received: from mail3.hpi.uni-potsdam.de (mail3.hpi.uni-potsdam.de
 [141.89.225.123]) by mailbouncer.mcs.anl.gov (Postfix) with ESMTP id
 9709512B31 for <drmaa-wg@gridforum.org>;
 Wed,  5 Apr 2006 03:36:46 -0500 (CDT)
Received: from nowa.dmz.hpi.uni-potsdam.de (h126.225.hpi.uni-potsdam.de
 [141.89.225.126]) by mail3.hpi.uni-potsdam.de (Postfix) with ESMTP id
 744747D841 for <drmaa-wg@gridforum.org>;
 Wed,  5 Apr 2006 10:36:45 +0200 (CEST)
Received: from 3MXMA1R.hpi.uni-potsdam.de ([141.89.224.242]) by
 nowa.dmz.hpi.uni-potsdam.de with Microsoft SMTPSVC(6.0.3790.211);
 Wed, 5 Apr 2006 10:36:45 +0200
Received: from [141.89.224.143] ([141.89.224.143]) by
 3MXMA1R.hpi.uni-potsdam.de with Microsoft SMTPSVC(6.0.3790.211);
 Wed, 5 Apr 2006 10:36:45 +0200
In-Reply-To: <20060402164336.9544A12CBA@mailbouncer.mcs.anl.gov>
Content-class: urn:content-classes:message
Subject: Re: [drmaa-wg] DRMAA-WG April 4, 2006 call
Date: Wed, 5 Apr 2006 01:36:45 -0700
Message-ID: <4433819D.4080000@hpi.uni-potsdam.de>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [drmaa-wg] DRMAA-WG April 4, 2006 call
Thread-Index: AcZYjBeKeVE4HN6KTeuV8hwnUU5ARw==
References: <20060402164336.9544A12CBA@mailbouncer.mcs.anl.gov>
From: "Peter Troeger" <peter.troeger@hpi.uni-potsdam.de>
Sender: <owner-drmaa-wg@ggf.org>
To: "DRMAA Working Group" <drmaa-wg@gridforum.org>

Meeting minutes for April 4 phone conference:

- March 21 meeting minutes accepted without changes

- Upcoming SGE 6.0U8 release will still be DRMAA 0.95 compliant
   - DRMAA 1.0 compliance with SGE 6.0U9 release (3-6 months from now)
     or with the CVS main trunk

- Small problem with strtok_r() in the test suite under Solaris,
  Dan will commit patched version to Sourceforge CVS

- Latest text proposal for drmaa_wifexited() discussed (tracker #1125),
  accepted on condition that the term "ended" is removed
   - Peter adds updated text to the tracker

- Job state after resuming from suspend state
   - Rough agreement that Condor and GridWay approach of restarting
     the job is something different then suspend ("rescheduling")
   - Added as post 1.0 DRMAA feature (tracker #1787)
   - Suspend feature and according state transition back to PS_RUNNING
     remains mandatory for DRMAA 1.0 (no test suite changes)
   - Peter informs Ruben

- Discussion about job rejection in case of invalid job template
   - Would ease up Condor implementation, since invalid input files
     are detected on job submission by this system
   - Agreement that early rejection of invalid jobs should
     always be possible (e.g. compute centre checks)
   - Proposal for text change in new tracker #1786

- Document submission to GGF on Friday
   - Pending SGE experience report (Dan)
   - Pending updated Condor experience report (Peter)
   - Pending final DRMAA spec (Hrabri)

Regards,
Peter.

> *** new phone numbers ***
> *** new phone numbers ***
>=20
>=20
> The bi-weekly DRMAA call is scheduled for 16:00 UTC (8:00PDT - Pacific
> Daylight Time /10:00CDT/ 17:00 Central Europe). All Participants =
should use
> the following information to reach the conference call:
>=20
> ------------------------------------
> * Toll Free Dial In Number for North America:   1 800 867-8609
> * Toll Free Dial In Number for Germany:         0 800 101-4546
> * Int'l Access/Caller Paid Dial In Number:      +49 069509594678
> * ACCESS CODE: 7223898
> ------------------------------------
>=20
> Attachments to this email:
>=20
>       - March 21 meeting minutes
>=20
>=20
> Meeting Agenda:
>=20
> A. Meeting secretary for this meeting?
>=20
> B. Acceptance of the March 21, 2006 meeting minutes
>=20
> C. Admin
> 	- third chair update
>=20
> F. Open/general issues discussion
>   - experience documents
>   - #1125 Tracker - see the included text at the end of the agenda
>   - Job suspension is different from triggering job rescheduling in =
Condor=20
>            (see attached  " "Re: GridWay Experience Report" mail)
>   - Status of the test suite
>       - post ver 1.0 issues
>       - handling exit status for bad input / ouput / error streams=20
>            (see attached "Re: [drama-wg] DRMAA TEST SUITE" mail)
>   - misc
> =09
>=20
> Cheers,
> 	Hrabri
>=20
>=20
> ------------------------- Tracker #1125 proposed change
> ----------------------------
>=20
> Currently we have:
> "Evaluates into 'exited', a non-zero value if stat was returned for a =
job
> that terminated normally. A zero value can also indicate that although =
the
> job has terminated normally an exit status is not available or that it =
is
> not known whether the job terminated normally. In both cases
> drmaa_wexitstatus() SHALL NOT provide exit status information.
> A non-zero 'exited' value indicates more detailed diagnosis can be =
provided
> by means of drmaa_wifsignaled(), drmaa_wtermsig(),drmaa_wexitstatus(), =
and
> drmaa_wcoredump()."
>=20
> It was proposed (Hrabri's adaptation of Peter's latest proposal) to =
change
> it to=20
>=20
> "Evaluates into 'exited' a non-zero value if stat was returned for a =
ended
> job=20
> that either failed after running or finished after running (see =
section
> 2.6).
> A non-zero 'exited' value indicates more detailed diagnosis can be =
provided
> by
> means of drmaa_wifsignaled(), drmaa_wtermsig(),drmaa_wexitstatus(), =
and
> drmaa_wcoredump() functions.
> A zero result for the 'exited' parameter either indicates that=20
>    1) although it is known that the job was running, more information =
is not
> available=20
>    2) it is not known whether the job was running=20
>=20
> In both cases drmaa_wexitstatus() SHALL NOT provide exit status
> information."
>=20
>=20
>=20
>=20
> =
------------------------------------------------------------------------
>=20
> Betreff:
> Re: [drmaa-wg] DRMAA TEST SUITE
> Von:
> Peter Tr=F6ger <peter.troeger@hpi.uni-potsdam.de>
> Datum:
> Thu, 23 Mar 2006 16:00:06 -0500
> An:
> "Ruben Santiago Montero" <rubensm@dacya.ucm.es>
>=20
> An:
> "Ruben Santiago Montero" <rubensm@dacya.ucm.es>
> CC:
> "DRMAA Working Group" <drmaa-wg@gridforum.org>
>=20
> Absender:
> <owner-drmaa-wg@ggf.org>
> Referenzen:
> <200603181416.03350.rubensm@dacya.ucm.es>
> <200603211155.00824.rubensm@dacya.ucm.es>
> <4420656E.1020806@hpi.uni-potsdam.de>
> <200603231154.56859.rubensm@dacya.ucm.es>
> Nachricht-ID:
> <44230C56.5020000@hpi.uni-potsdam.de>
> MIME-Version:
> 1.0
> Content-Type:
> multipart/alternative; =
boundary=3D"----=3D_NextPart_000_00AB_01C6564A.984CB390"
> X-Mailer:
> Microsoft Office Outlook, Build 11.0.5510
> Thread-Index:
> AcZOvM76P8wyV9lqRV6jpspnNA+aow=3D=3D
> In-Reply-To:
> <200603231154.56859.rubensm@dacya.ucm.es>
> X-MimeOLE:
> Produced By Microsoft MimeOLE V6.00.2900.2180
> X-Apparently-To:
> hrabri@sbcglobal.net via 68.142.199.165; Thu, 23 Mar 2006 13:00:26 =
-0800
> X-Originating-IP:
> [140.221.10.4]
> X-Original-To:
> grdfm-drmaa-wg@mailbouncer.mcs.anl.gov
> x-fsavag4mse-ts:
> dbb6c6d4fbd7d8b3
> X-OriginalArrivalTime:
> 23 Mar 2006 21:00:01.0725 (UTC) FILETIME=3D[C082BED0:01C64EBC]
>=20
>=20
>>> Our proposal is to remove the call of drmaa_wifaborted() for
>>> ST_INPUT_FILE_FAILURE / ST_ERROR_FILE_FAILURE / =
ST_OUTPUT_FILE_FAILURE.
>>> The drmaa_wait() call does not hurt (since all submitted jobs must =
be
>>> waitable), but the crucial part is the testing for the result of
>>> drmaa_synchronize(). After this change, I would expect the test =
cases to
>>> be successful also on your system. In case of malicious input / =
output /
>>> error files, the DRMAA implementation would only be expected to =
state a
>>> job failure. This should work for all GridWay-supported systems, =
right ?
>>> Could you accept this proposal ?
>>>
>> Sure. It make sense for me also.
>>
>> There is also a validator in the state diagram (Section 2.6). I am =
just
>> wondering if a DRMAA implementation could just reject the jobs in
> these tests
>> at submission with a DRMAA_ERRNO_DENIED_BY_DRM.
>=20
> The spec is unclear here, since the description of the input / ouput /
> error parameters demands a particular job state - DRMAA_PS_FAILED. You
> can only have a job state when you have a job id. YOu can only have a
> job id when drmaa_run() was successfull. I really would like to have =
the
> opportunity of DRMAA_ERRNO_DENIED_BY_DRM also in this case, but then =
we
> have to relax the description of the according job template =
attributes.
>=20
> Sounds like another issue for the next phone call. Hrabri ?
>=20
> Regards,
> Peter.
>=20
>=20
> =
------------------------------------------------------------------------
>=20
> Betreff:
> [drmaa-wg] Minutes for DRMAA WG con-call 03/21/2006
> Von:
> "Andreas Haas" <Andreas.Haas@Sun.COM>
> Datum:
> Tue, 21 Mar 2006 12:58:11 -0500
> An:
> "DRMAA Working Group" <drmaa-wg@gridforum.org>
>=20
> An:
> "DRMAA Working Group" <drmaa-wg@gridforum.org>
>=20
> Absender:
> <owner-drmaa-wg@ggf.org>
> Nachricht-ID:
> <Pine.GSO.4.53.0603211807160.41800@sr-ergb01-01>
> MIME-Version:
> 1.0
> Content-Type:
> multipart/alternative; =
boundary=3D"----=3D_NextPart_000_00AF_01C6564A.98516E80"
> X-Mailer:
> Microsoft Office Outlook, Build 11.0.5510
> Thread-Index:
> AcZNEQuJoof2neN9S9actDErgu5YCA=3D=3D
> X-MimeOLE:
> Produced By Microsoft MimeOLE V6.00.2900.2180
> X-Apparently-To:
> hrabri@sbcglobal.net via 68.142.199.167; Tue, 21 Mar 2006 09:58:23 =
-0800
> X-Originating-IP:
> [140.221.10.4]
> X-Original-To:
> grdfm-drmaa-wg@mailbouncer.mcs.anl.gov
> X-X-Sender:
> ah114088@sr-ergb01-01
>=20
>=20
> Attendees: Roger, Peter, Daniel, Hrabri and Andreas
>=20
> Last meeting minutes accepted without corrections.
>=20
> * Harbri proposes to add Peter as 3rd chair for DRMAA WG.
>   Peter says he would be willing to do it. Result of the
>   election is 5 votes pro and 0 votes against!
>=20
> * Discussion about ST_INPUT_FILE_FAILURE test case
>   brought up by Ruben Santiago Montero. There is agreement
>   the testing procedure needs to be to comply with the
>   specification as proposed by Ruben.
>=20
> * Andreas to review change in spec for tracker item 1125
>=20
>=20
> =
------------------------------------------------------------------------
>=20
> Betreff:
> Re: GridWay Experience Report
> Von:
> "Peter Troeger" <peter.troeger@hpi.uni-potsdam.de>
> Datum:
> Thu, 23 Mar 2006 10:33:19 -0500
> An:
> "Andreas Haas" <Andreas.Haas@Sun.COM>
>=20
> An:
> "Andreas Haas" <Andreas.Haas@Sun.COM>
> CC:
> "Ruben Santiago Montero" <rubensm@dacya.ucm.es>, "Hrabri Rajic"
> <hrabri@sbcglobal.net>, Ignacio Mart=EDn Llorente =
<llorente@dacya.ucm.es>,
> "Roger Brobst" <rbrobst@cadence.com>, "Daniel Templeton"
> <Dan.Templeton@Sun.COM>
>=20
> Referenzen:
> <200603211212.41381.rubensm@dacya.ucm.es>
> <44207279.4090500@hpi.uni-potsdam.de>
> <200603231153.39610.rubensm@dacya.ucm.es>
> <Pine.GSO.4.53.0603231428390.41800@sr-ergb01-01>
> Nachricht-ID:
> <4422BFBF.1000800@hpi.uni-potsdam.de>
> MIME-Version:
> 1.0
> Content-Type:
> multipart/alternative; =
boundary=3D"----=3D_NextPart_000_00B3_01C6564A.98565080"
> X-Mailer:
> Microsoft Office Outlook, Build 11.0.5510
> Thread-Index:
> AcZOjxz4c6sJabBCTiSfFifkWcbX0w=3D=3D
> In-Reply-To:
> <Pine.GSO.4.53.0603231428390.41800@sr-ergb01-01>
> X-MimeOLE:
> Produced By Microsoft MimeOLE V6.00.2900.2180
> X-Apparently-To:
> hrabri@sbcglobal.net via 68.142.199.172; Thu, 23 Mar 2006 07:33:20 =
-0800
> X-Originating-IP:
> [141.89.225.123]
> X-Header-Overseas:
> Mail.from.Overseas.source.mail3.hpi.uni-potsdam.de
> x-fsavag4mse-ts:
> ce3a50e13d5a79e
> X-OriginalArrivalTime:
> 23 Mar 2006 15:33:19.0057 (UTC) FILETIME=3D[1C68FC10:01C64E8F]
> X-Accept-Language:
> de-DE, de, en-us, en
> X-Enigmail-Version:
> 0.93.0.0
>=20
>=20
>=20
>>>>- State of jobs after suspension: I loved to read this, since I had
>>>>exactly the same problem in the Condor DRMAA implementation. I ended =
up
>>>>with marking such jobs as "was suspended before", in order to give =
the
>>>>right active state afterwards. If we want to change the spec =
according
>>>>to this, we have a post 1.0 issue.
>>>
>>>Great!. I think I can just make the same thing in GridWay DRMAA.
>>
>>
>> Hm ... I doubt this is a good idea. Job suspension is different
>> from triggering job rescheduling. If implementing job suspension
>> is a severe problem for DRM vendors, I believe that should be rather
>> an argument for not making it mandatory rather than deviating
>> from the standard.
>=20
> Even though we are running out of time for spec changes, this should =
be
> a topic for the next DRMAA phone conference. Hrabri, could you put =
this
> on the agenda ?
>=20
> Regards,
> Peter.
>=20
