Conference call - June 1th - 19:00 UTC
Dear all, the next DRMAA conf call is scheduled for June 1th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit". Preliminary meeting agenda: 1. Meeting secretary for this meeting? 2. DRMAAv2 Draft 5 (see attachment) Best regards, Peter.
Hi, 2011/5/31 Peter Tröger <peter@troeger.eu>:
Dear all,
the next DRMAA conf call is scheduled for June 1th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. DRMAAv2 Draft 5 (see attachment)
Best regards, Peter.
a new spreadsheet tab wich tries to summarize how different resource limits are handled in GE/LSF/Torque: https://spreadsheets.google.com/spreadsheet/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGthc1lDTFE&hl=en_US#gid=13 and the proposition of restructuring the section 5.6.25 ( text in brackets [] == my comment): 5.6.26 resourceLimits [not hardResourceLimits] This attribute specifies the limits on resource utilization of the job(s) on the execution host(s). The valid dictionary keys and their value semantics are defined in Section 4.3. The CORE_FILE_SIZE, DATA_SEG_SIZE, FILE_SIZE, OPEN_FILES, STACK_SIZE, VIRTUAL_MEMORY limits SHOULD be implemented as the soft resource limits. An implementation MAY map them to an setrlimit call in the operating system. [I think the actual usecase for those resources is to increase the system default limit rather than actually limit the application] The WALLCLOCK_TIME and CPU_TIME should be implemented as hard resource limits, i.e. exceeding the resource limit SHOULD eventually lead to termination of a job either by the DRM system or the application itself. The DRM system MAY frist notify the application upon reaching the limit (e.g. by sending a signal that can be handled) before trying to ultimately terminate it (e.g. by sending SIGKILL signal). All the resource limits SHOULD be enforced on per process [not job] basics.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- Mariusz
I like the proposal, makes sense to me. Best regards, Peter. Am 01.06.11 23:16, schrieb Mariusz Mamoński:
Hi, a new spreadsheet tab wich tries to summarize how different resource limits are handled in GE/LSF/Torque:
and the proposition of restructuring the section 5.6.25 ( text in brackets [] == my comment):
5.6.26 resourceLimits [not hardResourceLimits]
This attribute specifies the limits on resource utilization of the job(s) on the execution host(s). The valid dictionary keys and their value semantics are defined in Section 4.3.
The CORE_FILE_SIZE, DATA_SEG_SIZE, FILE_SIZE, OPEN_FILES, STACK_SIZE, VIRTUAL_MEMORY limits SHOULD be implemented as the soft resource limits. An implementation MAY map them to an setrlimit call in the operating system. [I think the actual usecase for those resources is to increase the system default limit rather than actually limit the application]
The WALLCLOCK_TIME and CPU_TIME should be implemented as hard resource limits, i.e. exceeding the resource limit SHOULD eventually lead to termination of a job either by the DRM system or the application itself. The DRM system MAY frist notify the application upon reaching the limit (e.g. by sending a signal that can be handled) before trying to ultimately terminate it (e.g. by sending SIGKILL signal).
All the resource limits SHOULD be enforced on per process [not job] basics.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
Hi, 2011/5/31 Peter Tröger <peter@troeger.eu>:
Dear all,
the next DRMAA conf call is scheduled for June 1th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. DRMAAv2 Draft 5 (see attachment)
Best regards, Peter.
a new spreadsheet tab wich tries to summarize how different resource limits are handled in GE/LSF/Torque: https://spreadsheets.google.com/spreadsheet/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGthc1lDTFE&hl=en_US#gid=13 and the proposition of restructuring the section 5.6.25 ( text in brackets [] == my comment): 5.6.26 resourceLimits [not hardResourceLimits] This attribute specifies the limits on resource utilization of the job(s) on the execution host(s). The valid dictionary keys and their value semantics are defined in Section 4.3. The CORE_FILE_SIZE, DATA_SEG_SIZE, FILE_SIZE, OPEN_FILES, STACK_SIZE, VIRTUAL_MEMORY limits SHOULD be implemented as the soft resource limits. An implementation MAY map them to an setrlimit call in the operating system. [I think the actual usecase for those resources is to increase the system default limit rather than actually limit the application] The WALLCLOCK_TIME and CPU_TIME should be implemented as hard resource limits, i.e. exceeding the resource limit SHOULD eventually lead to termination of a job either by the DRM system or the application itself. The DRM system MAY frist notify the application upon reaching the limit (e.g. by sending a signal that can be handled) before trying to terminate it.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
Cheers, -- Mariusz
Participants: Peter, Daniel G. Mariusz, Roger * JobInfo:slots * Line 545: Not necessary from Daniel's perspective * Exclusive complex: Possible to book a whole machine with only one slot (e.g. memory amount) - difficult to detect * Better to make clear DRMAA semantics, JobInfo:slots should be in between the range from the job template * Line 547: * Currently similar to MPI approach of host file * Might have large memory footprint (more slots) * Alternative with additional struct * Only necessary for reporting * Use case: Cluster monitoring, generate MPI machine file based on this information * Decision: Introduce new structure with machine name and slot count * Decision: Remove optional sentence * Complete IDL list in draft 5 lacks DrmaaCapability structure * Line 899: remove optional * Line 933: Should be clarified that it is intended to fill out templates and structs * Set and get give impression that they are intended for DrmaaReflective interface attributes itself * Line 735: * make it mandatory -> AR might not be implemented, but AR created outside should be supported then * InvalidValue as generic value feasible * Research on resource limits * Line 244: DATA_SEG_SIZE has no use case, but we only take out things if they are not implementable, so leave it in * Decision: Job failing cannot be promised on resource violation (see Google spreadsheet), application might catch signal * Decision: Add sentence that application will be notified by some OS-depending means * Line 762 - might be wrong, rethin it Am 31.05.11 00:00, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for June 1th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. DRMAAv2 Draft 5 (see attachment)
Best regards, Peter.
participants (2)
-
Mariusz Mamoński -
Peter Tröger