Conference call - Apr 27th - 19:00 UTC
Dear all, the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit". Preliminary meeting agenda: 1. Meeting secretary for this meeting? 2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting from page 18) Sorry, I didn't had the time to prepare a new draft. Best regards, Peter.
Participants: Daniel, Mariusz, Roger, Andre (SAGA), Peter Quick check of last weeks decisions, all agreed Line 530 - checkpointability attribute in job template - Grid Engine expresses checkpointability as string reference to checkpointing environment - would be boolean flag in Condor, indicating standard universe - From SAGA perspective, no real use case - Decision: Dropped Line 578 - optional eMail attribute - accepted by group Line 609 - Staging support - reformulate to allow submission and execution host being the same machine - denote support for 'hierarchical copying' as implementation-specific - reformulate to state that with parallel jobs, copy must target at least the master node, and may also copy the files to other hosts - clarify relationship between job working directory and relative paths Line 707 - Reaction on reaching soft / hard limits - Grid Engine: Signal depends on particular limit type - Agreement that crossing a hard limit should lead to FAILED state of the DRMAA job - Agreement to remove softResourceLimits completely, since DRMAA cannot promise any kind of common semantics, and since the attribute is not important enough to add it as opaque concept (as with slots) Section 9.2.4 / 9.2.7: - reservedSlots should be mandatory information, reservedMachines should be optional information Agreement to specify possible error codes per method after some implementations were done Line 751 - Reservation without time frame - Makes no sense, since it might be way too short for the user -> raise invalid argument exception on UNSET/UNSET/UNSET - add rationale why startTime=UNSET is not equal to startTime=NOW - handy concept supported by some, but not all DRM systems - Emulation in the DRMAA library is not a valid option, since this would lead to situations were the reservation already arrives 'too late' in the DRM system Best regards, Peter. Am 27.04.11 00:57, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
There are currently two problems from Grid Engine: - There seems no way for getting the desired NOW behavior (at least in this section the *optional* NOW keyword is not defined) for an GE specific enhancement, without breaking compatibility - In GE there is no currently no sliding windows support for the SET/SET/SET in case of duration is shorter than endTime-startTime (GE DRMAA implementation have then a similar problem then the other DRM which do not support NOW as startTime) Following suggestions for this section (5.6.2): - Add the *optional* "NOW" constant -> if an implementation does not support it, it is treated like UNSET (InvalidAttributeException) - If startTime, endTime and duration is set and duration is shorter than endTime-startTime, the sliding windows approach (take "the earliest point in time") could made optional. That means: take startTime and duration or *optionally* search the earliest point in time. I know we discussed it more than once, but having these options would make it much easier to get a compatible implementation. Cheers, Daniel Am 27.04.2011 um 23:46 schrieb Peter Tröger:
Line 751 - Reservation without time frame - Makes no sense, since it might be way too short for the user -> raise invalid argument exception on UNSET/UNSET/UNSET - add rationale why startTime=UNSET is not equal to startTime=NOW - handy concept supported by some, but not all DRM systems - Emulation in the DRMAA library is not a valid option, since this would lead to situations were the reservation already arrives 'too late' in the DRM system
Best regards, Peter.
Am 27.04.11 00:57, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg -- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
--------------------------------------------------------------------- Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system. ---------------------------------------------------------------------
There are currently two problems from Grid Engine: - There seems no way for getting the desired NOW behavior (at least in this section the *optional* NOW keyword is not defined) for an GE specific enhancement, without breaking compatibility - In GE there is no currently no sliding windows support for the SET/SET/SET in case of duration is shorter than endTime-startTime (GE DRMAA implementation have then a similar problem then the other DRM which do not support NOW as startTime)
Following suggestions for this section (5.6.2):
- Add the *optional* "NOW" constant -> if an implementation does not support it, it is treated like UNSET (InvalidAttributeException)
My understanding of the agreed result was a little but more radical. NOW is not supported by all DRM systems, and it is not as crucial as slots ;-), so we can just leave it out. Applications then will start to build their own "NOW" workarounds (current local time plus ... hmmm .... 10s), which is completely fine in this specific case.
- If startTime, endTime and duration is set and duration is shorter than endTime-startTime, the sliding windows approach (take "the earliest point in time") could made optional. That means: take startTime and duration or *optionally* search the earliest point in time.
I don't understand this. What is the alternative for searching the earliest feasible startTime ? Ignoring the duration value ? Or ignoring end time ? Best regards, Peter.
I know we discussed it more than once, but having these options would make it much easier to get a compatible implementation.
Cheers,
Daniel
Am 27.04.2011 um 23:46 schrieb Peter Tröger:
Line 751 - Reservation without time frame - Makes no sense, since it might be way too short for the user -> raise invalid argument exception on UNSET/UNSET/UNSET - add rationale why startTime=UNSET is not equal to startTime=NOW - handy concept supported by some, but not all DRM systems - Emulation in the DRMAA library is not a valid option, since this would lead to situations were the reservation already arrives 'too late' in the DRM system
Best regards, Peter.
Am 27.04.11 00:57, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg -- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
Am 29.04.2011 um 14:14 schrieb Peter Tröger:
There are currently two problems from Grid Engine: - There seems no way for getting the desired NOW behavior (at least in this section the *optional* NOW keyword is not defined) for an GE specific enhancement, without breaking compatibility - In GE there is no currently no sliding windows support for the SET/SET/SET in case of duration is shorter than endTime-startTime (GE DRMAA implementation have then a similar problem then the other DRM which do not support NOW as startTime)
Following suggestions for this section (5.6.2):
- Add the *optional* "NOW" constant -> if an implementation does not support it, it is treated like UNSET (InvalidAttributeException)
My understanding of the agreed result was a little but more radical. NOW is not supported by all DRM systems, and it is not as crucial as slots ;-), so we can just leave it out. Applications then will start to build their own "NOW" workarounds (current local time plus ... hmmm .... 10s), which is completely fine in this specific case.
We should really take an optional NOW constant really into account. Application could do their own workaround but *if* a DRM has build-in support for this, it is really hard to offer this functionality. It just prevents that this can be optionally implemented. What do we loose with an new optional "NOW" constant??
- If startTime, endTime and duration is set and duration is shorter than endTime-startTime, the sliding windows approach (take "the earliest point in time") could made optional. That means: take startTime and duration or *optionally* search the earliest point in time.
I don't understand this. What is the alternative for searching the earliest feasible startTime ? Ignoring the duration value ? Or ignoring end time ?
One of them, for me it really does not matter which is going to be ignored, it should just be defined. Maybe the best solution would be ignore "end time" when start "time + duration" <= "end time". Cheers, Daniel
Best regards, Peter.
I know we discussed it more than once, but having these options would make it much easier to get a compatible implementation.
Cheers,
Daniel
Am 27.04.2011 um 23:46 schrieb Peter Tröger:
Line 751 - Reservation without time frame - Makes no sense, since it might be way too short for the user -> raise invalid argument exception on UNSET/UNSET/UNSET - add rationale why startTime=UNSET is not equal to startTime=NOW - handy concept supported by some, but not all DRM systems - Emulation in the DRMAA library is not a valid option, since this would lead to situations were the reservation already arrives 'too late' in the DRM system
Best regards, Peter.
Am 27.04.11 00:57, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg -- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
--------------------------------------------------------------------- Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system. ---------------------------------------------------------------------
2011/4/29 Daniel Gruber <dgruber@univa.com>:
Am 29.04.2011 um 14:14 schrieb Peter Tröger:
There are currently two problems from Grid Engine: - There seems no way for getting the desired NOW behavior (at least in this section the *optional* NOW keyword is not defined) for an GE specific enhancement, without breaking compatibility - In GE there is no currently no sliding windows support for the SET/SET/SET in case of duration is shorter than endTime-startTime (GE DRMAA implementation have then a similar problem then the other DRM which do not support NOW as startTime) Following suggestions for this section (5.6.2): - Add the *optional* "NOW" constant -> if an implementation does not support it, it is treated like UNSET (InvalidAttributeException)
My understanding of the agreed result was a little but more radical. NOW is not supported by all DRM systems, and it is not as crucial as slots ;-), so we can just leave it out. Applications then will start to build their own "NOW" workarounds (current local time plus ... hmmm .... 10s), which is completely fine in this specific case.
We should really take an optional NOW constant really into account. Application could do their own workaround but *if* a DRM has build-in support for this, it is really hard to offer this functionality. It just prevents that this can be optionally implemented. What do we loose with an new optional "NOW" constant??
for me it is ok, as far as we can introspect if NOW is supported by the given DRM system.
- If startTime, endTime and duration is set and duration is shorter than endTime-startTime, the sliding windows approach (take "the earliest point in time") could made optional. That means: take startTime and duration or *optionally* search the earliest point in time.
I don't understand this. What is the alternative for searching the earliest feasible startTime ? Ignoring the duration value ? Or ignoring end time ?
One of them, for me it really does not matter which is going to be ignored, it should just be defined. Maybe the best solution would be ignore "end time" when start "time + duration" <= "end time". Cheers, Daniel
the same problem. In my initial proposal: http://fury.man.poznan.pl/~mmamonski/wiki/index.php/DRMAAv2/Advance_Reservat... "duration Reservation duration. If reservation duration is shorter than endTime - startTime the earliest reservation (matching the requirements, e.g.: slotsCount) will be created. If this attribute is omitted then the duration is assumed to be equal to endTime - startTime. Optional attribute." i wanted the "duration" to be optional. Now i remember why ;-) This was an easy way to determine if the DRM system support searching earliest feasible reservation in the given time window. I.E. if the system support the Duration attribute it means that it also offer the aforementioned functionality.
Best regards, Peter.
I know we discussed it more than once, but having these options would make it much easier to get a compatible implementation. Cheers, Daniel
Am 27.04.2011 um 23:46 schrieb Peter Tröger:
Line 751 - Reservation without time frame - Makes no sense, since it might be way too short for the user -> raise invalid argument exception on UNSET/UNSET/UNSET - add rationale why startTime=UNSET is not equal to startTime=NOW - handy concept supported by some, but not all DRM systems - Emulation in the DRMAA library is not a valid option, since this would lead to situations were the reservation already arrives 'too late' in the DRM system
Best regards, Peter.
Am 27.04.11 00:57, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet
on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting?
2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting
from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards,
Peter.
--
drmaa-wg mailing list
drmaa-wg@ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- Mariusz
One of them, for me it really does not matter which is going to be ignored, it should just be defined. Maybe the best solution would be ignore "end time" when start "time + duration" <= "end time". Cheers, Daniel
the same problem. In my initial proposal:
http://fury.man.poznan.pl/~mmamonski/wiki/index.php/DRMAAv2/Advance_Reservat...
"duration Reservation duration. If reservation duration is shorter than endTime - startTime the earliest reservation (matching the requirements, e.g.: slotsCount) will be created. If this attribute is omitted then the duration is assumed to be equal to endTime - startTime. Optional attribute."
i wanted the "duration" to be optional. Now i remember why ;-) This was an easy way to determine if the DRM system support searching earliest feasible reservation in the given time window. I.E. if the system support the Duration attribute it means that it also offer the aforementioned functionality.
Sounds very reasonable. I like this one. Best, Peter.
Best regards, Peter.
I know we discussed it more than once, but having these options would make it much easier to get a compatible implementation. Cheers, Daniel
Am 27.04.2011 um 23:46 schrieb Peter Tröger:
Line 751 - Reservation without time frame - Makes no sense, since it might be way too short for the user -> raise invalid argument exception on UNSET/UNSET/UNSET - add rationale why startTime=UNSET is not equal to startTime=NOW - handy concept supported by some, but not all DRM systems - Emulation in the DRMAA library is not a valid option, since this would lead to situations were the reservation already arrives 'too late' in the DRM system
Best regards, Peter.
Am 27.04.11 00:57, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet
on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting?
2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting
from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards,
Peter.
--
drmaa-wg mailing list
drmaa-wg@ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- Mariusz
Am 29.04.2011 um 14:55 schrieb Peter Tröger:
One of them, for me it really does not matter which is going to be ignored, it should just be defined. Maybe the best solution would be ignore "end time" when start "time + duration" <= "end time". Cheers, Daniel
the same problem. In my initial proposal:
http://fury.man.poznan.pl/~mmamonski/wiki/index.php/DRMAAv2/Advance_Reservat...
"duration Reservation duration. If reservation duration is shorter than endTime - startTime the earliest reservation (matching the requirements, e.g.: slotsCount) will be created. If this attribute is omitted then the duration is assumed to be equal to endTime - startTime. Optional attribute."
i wanted the "duration" to be optional. Now i remember why ;-) This was an easy way to determine if the DRM system support searching earliest feasible reservation in the given time window. I.E. if the system support the Duration attribute it means that it also offer the aforementioned functionality.
Sounds very reasonable. I like this one.
me too Daniel
Best, Peter.
Best regards, Peter.
I know we discussed it more than once, but having these options would make it much easier to get a compatible implementation. Cheers, Daniel
Am 27.04.2011 um 23:46 schrieb Peter Tröger:
Line 751 - Reservation without time frame - Makes no sense, since it might be way too short for the user -> raise invalid argument exception on UNSET/UNSET/UNSET - add rationale why startTime=UNSET is not equal to startTime=NOW - handy concept supported by some, but not all DRM systems - Emulation in the DRMAA library is not a valid option, since this would lead to situations were the reservation already arrives 'too late' in the DRM system
Best regards, Peter.
Am 27.04.11 00:57, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet
on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting?
2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting
from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards,
Peter.
--
drmaa-wg mailing list
drmaa-wg@ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- Mariusz
--------------------------------------------------------------------- Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system. ---------------------------------------------------------------------
Am 29.04.2011 um 14:27 schrieb Daniel Gruber:
Am 29.04.2011 um 14:14 schrieb Peter Tröger:
There are currently two problems from Grid Engine: - There seems no way for getting the desired NOW behavior (at least in this section the *optional* NOW keyword is not defined) for an GE specific enhancement, without breaking compatibility - In GE there is no currently no sliding windows support for the SET/SET/SET in case of duration is shorter than endTime-startTime (GE DRMAA implementation have then a similar problem then the other DRM which do not support NOW as startTime)
Following suggestions for this section (5.6.2):
- Add the *optional* "NOW" constant -> if an implementation does not support it, it is treated like UNSET (InvalidAttributeException)
My understanding of the agreed result was a little but more radical. NOW is not supported by all DRM systems, and it is not as crucial as slots ;-), so we can just leave it out. Applications then will start to build their own "NOW" workarounds (current local time plus ... hmmm .... 10s), which is completely fine in this specific case.
We should really take an optional NOW constant really into account. Application could do their own workaround but *if* a DRM has build-in support for this, it is really hard to offer this functionality. It just prevents that this can be optionally implemented. What do we loose with an new optional "NOW" constant??
Every single optional attribute is weakening the standard - at the end, we all want to achieve portable applications without a lot of if's and when's. We will perform a majority decision about this on the next conf call.
- If startTime, endTime and duration is set and duration is shorter than endTime-startTime, the sliding windows approach (take "the earliest point in time") could made optional. That means: take startTime and duration or *optionally* search the earliest point in time.
I don't understand this. What is the alternative for searching the earliest feasible startTime ? Ignoring the duration value ? Or ignoring end time ?
One of them, for me it really does not matter which is going to be ignored, it should just be defined. Maybe the best solution would be ignore "end time" when start "time + duration" <= "end time".
Ok, if there is no further objection, I will add this one. Best, Peter.
Cheers,
Daniel
Best regards, Peter.
I know we discussed it more than once, but having these options would make it much easier to get a compatible implementation.
Cheers,
Daniel
Am 27.04.2011 um 23:46 schrieb Peter Tröger:
Line 751 - Reservation without time frame - Makes no sense, since it might be way too short for the user -> raise invalid argument exception on UNSET/UNSET/UNSET - add rationale why startTime=UNSET is not equal to startTime=NOW - handy concept supported by some, but not all DRM systems - Emulation in the DRMAA library is not a valid option, since this would lead to situations were the reservation already arrives 'too late' in the DRM system
Best regards, Peter.
Am 27.04.11 00:57, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg -- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
---------------------------------------------------------------------
Notice from Univa Postmaster: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
2011/4/27 Peter Tröger <peter@troeger.eu>:
Participants: Daniel, Mariusz, Roger, Andre (SAGA), Peter
Quick check of last weeks decisions, all agreed
Line 530 - checkpointability attribute in job template - Grid Engine expresses checkpointability as string reference to checkpointing environment - would be boolean flag in Condor, indicating standard universe - From SAGA perspective, no real use case - Decision: Dropped
Line 578 - optional eMail attribute - accepted by group
Line 609 - Staging support - reformulate to allow submission and execution host being the same machine - denote support for 'hierarchical copying' as implementation-specific - reformulate to state that with parallel jobs, copy must target at least the master node, and may also copy the files to other hosts - clarify relationship between job working directory and relative paths
Line 707 - Reaction on reaching soft / hard limits - Grid Engine: Signal depends on particular limit type - Agreement that crossing a hard limit should lead to FAILED state of the DRMAA job - Agreement to remove softResourceLimits completely, since DRMAA cannot promise any kind of common semantics, and since the attribute is not important enough to add it as opaque concept (as with slots)
i promised to do some research, so: we are mixing different resources wich limits have different purpose and thus associated policy: enum ResourceLimitType { CORE_FILE_SIZE , CPU_TIME , DATA_SEG_SIZE , FILE_SIZE , OPEN_FILES , STACK_SIZE , VIRTUAL_MEMORY , WALLCLOCK_TIME }; lets take the first one: CORE_FILE_SIZE and Grid Engine man queue_conf: " The remaining parameters in the queue configuration template specify per job soft and hard resource limits as implemented by the setrlimit(2) ..." man setrlimit " RLIMIT_CORE Maximum size of core file. When 0 no core dump files are created. When non-zero, larger dumps are truncated to this size." and the difference between Soft and Hard limit is defined as follows: " The hard limit acts as a ceiling for the soft limit: an unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit." exceeding other limits like OPEN_FILES would result just in errors on calls like open() which application can handle end exits with 0. So the agreement that "crossing a hard limit should lead to FAILED" should be valid only to some of the limits e.g.: WALLCLOCK_TIME, CPU_TIME.
Section 9.2.4 / 9.2.7: - reservedSlots should be mandatory information, reservedMachines should be optional information
Agreement to specify possible error codes per method after some implementations were done
Line 751 - Reservation without time frame - Makes no sense, since it might be way too short for the user -> raise invalid argument exception on UNSET/UNSET/UNSET - add rationale why startTime=UNSET is not equal to startTime=NOW - handy concept supported by some, but not all DRM systems - Emulation in the DRMAA library is not a valid option, since this would lead to situations were the reservation already arrives 'too late' in the DRM system
Best regards, Peter.
Am 27.04.11 00:57, schrieb Peter Tröger:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg -- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- Mariusz
Hi,
Participants: Daniel, Mariusz, Roger, Andre (SAGA), Peter
Line 707 - Reaction on reaching soft / hard limits - Grid Engine: Signal depends on particular limit type - Agreement that crossing a hard limit should lead to FAILED state of the DRMAA job - Agreement to remove softResourceLimits completely, since DRMAA cannot promise any kind of common semantics, and since the attribute is not important enough to add it as opaque concept (as with slots)
i promised to do some research, so:
we are mixing different resources wich limits have different purpose and thus associated policy:
enum ResourceLimitType { CORE_FILE_SIZE , CPU_TIME , DATA_SEG_SIZE , FILE_SIZE , OPEN_FILES , STACK_SIZE , VIRTUAL_MEMORY , WALLCLOCK_TIME };
lets take the first one:
CORE_FILE_SIZE and Grid Engine
man queue_conf: " The remaining parameters in the queue configuration template specify per job soft and hard resource limits as implemented by the setrlimit(2) ..."
man setrlimit " RLIMIT_CORE Maximum size of core file. When 0 no core dump files are created. When non-zero, larger dumps are truncated to this size."
and the difference between Soft and Hard limit is defined as follows: " The hard limit acts as a ceiling for the soft limit: an unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit."
exceeding other limits like OPEN_FILES would result just in errors on calls like open() which application can handle end exits with 0.
So the agreement that "crossing a hard limit should lead to FAILED" should be valid only to some of the limits e.g.: WALLCLOCK_TIME, CPU_TIME.
That's an issue. I see basically three options here: 1) We define the hard limit violation behavior per parameter. In this case, we could add the soft limits again with the same approach. 2) We declare the job termination as MAY happen at any time after violation, and stick with leaving out the soft limits. 3) We drop resource limits completely. Number 1 is most explicit (== good), but demands careful research on operating system level. Number 2 is our usual safe net. Number 3 is as explicit as number 1, but people may miss the feature.And no, doint it the 'slots' way is not an option ;-) ... Best regards, Peter.
Hi, I finally managed to read the current version of spec more carefully. Bellow some comments (line numbering corresponds to version annotated as "draft3"): line 81: DRMAA1 -> DRMAA Version 1 [reference] 94,95: A Exec.. -> An Exec 159: advanced -> advance 296: "Machine structure" - should we include machine state here (e.g. down, administratively down, available, busy, ...) ? 316: consistent... -> consistent among all Machine struct instances. Moreover any reported name should be a syntactically correct input for the candidateMachines attribute of the JobTemplate. ??? 361: any jobSubState - is there really any case where this would a complex object? Why just not use string here (Yes i know, in the spec there is a requirement that language binding should define conversion to String for every object, but this may be complex... ;-) 370: missing \n 377-383: running, buffered, purged -> i think this sections needs to be more precisely and verbose. In DRMAA 1.0 the wait call was responsible for reaping the jobs. This is important because some DRMS do not "buffer" jobs at all (or do it for a very short time) and the buffering has to be done in the DRMAA library (for the session's jobs only), this implies the question: how long to buffer the job information... 395: exitStatus - should we state here that the valid exitStatus values are 0-125 ? 445: cpuTime - should we state here that it is cumulative time among all the job processes? i.e. cpu time can be grater than wall clock time for parallel jobs 497: maybe we should add "Dictionary consumableResources;" @see Nadev e-mail I also raised this during one of the last telcos... 594: "execution host" -> "submission host" ??? 652: maxSlots should be optional (e.g. Torque do not support range values) 657: SHOULD -> MAY - at least until we don't have predefined JobCategories ;-) 785: SessionManagementException - what is the added value of this exception? can it be thrown from other operations than open/close/destroy Session? If not then why we don't have WaitException, RunException? ;-) 791: OutOfMemoryException - can we also throw this exception when the user supplied buffer was to small? 829: reservationSupported - maybe we can move it now to DrmaaReflective interface? 948: FAILED vs DONE - maybe we should be more precisely for situation when the job was started but: e.g. exited with exitcode != 0 (i believe this should be DONE), was signalled, terminated via DRMAA, 967: REQUEUED, REQUEUED_HELD and BES states. Because BES state model prohibits transition between the Running to Pending... so it it should be Running state. Also the state names in brackets looks like specialization of one of the BES implementations (i will not say which implementation ;-) so they are definitively non-normative. 1035: The largest valid value for endIndex MUST be defined by the language binding. - there may be also DRMS constraint. 1047: "only one of the active thread..." - is this requirement really needed? i'm asking because i'm afraid this would increase complexity of the implementations (do you remember the "session any" and its coincidence with run job operations?). This may be related to comment 377-383. 1063: "DrmaaCallback Interface".... I just wonder if the requirement "An implementation SHOULD also disallow any library calls while the callback function is running, to avoid recursion scenarios. It is RECOMMENDED to raise TryLaterException in this case." is really needed. If we want to keep this requirement is the Job object useful at all as we can only read the jobId from it? 1109-1110: why those methods returns the Job objects? 1262: footnote 30 (what about symmetry ;-) Also last decision was to have separate ReservationInfo struct: http://www.mail-archive.com/drmaa-wg@ogf.org/msg00250.html (when it was revoked?) 1508: reservationInfoOpt, reservationInfoImpl - what if one want to provide more information about the reservation?, also the symmetry rule ;-), relates to 1262 should we also move the drmsJobCategoryNames here (from MonitoringSession)? sorry for not waiting for the newest version but i wanted to finish it before i will go for holidays (i will not be able to join the next telco) All the best, 2011/4/27 Peter Tröger <peter@troeger.eu>:
Dear all,
the next DRMAA conf call is scheduled for Apr 27th, 19:00 UTC. We meet on Skype, please find me under my user name "potsdam_pit".
Preliminary meeting agenda:
1. Meeting secretary for this meeting? 2. Solving remaining issues in DRMAAv2 Draft 3 (see attachment, starting from page 18)
Sorry, I didn't had the time to prepare a new draft.
Best regards, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- Mariusz
Hi, thanks for your huge contribution with this. Here are my comments. If the OGF.ORG SVN works again, I will update the document sources on the server. The new PDF follows today. There are three kinds of reactions I state: (1) Added. No discussion needed, I just performed the according document modification. (2) Ignored. I am pretty confident that this was debated and decided well enough in the group, so I am not willing to re-open discussion again. The group is free to disagree. (3) Obsoleted. Recent document modifications already established the proposal as a fact. Best regards, Peter.
Hi,
I finally managed to read the current version of spec more carefully. Bellow some comments (line numbering corresponds to version annotated as "draft3"):
line 81: DRMAA1 -> DRMAA Version 1 [reference] 94,95: A Exec.. -> An Exec 159: advanced -> advance 296: "Machine structure" - should we include machine state here (e.g. down, administratively down, available, busy, ...) ? 316: consistent... -> consistent among all Machine struct instances.
Added.
Moreover any reported name should be a syntactically correct input for the candidateMachines attribute of the JobTemplate. ???
canddateMachines takes MachineList as input.
361: any jobSubState - is there really any case where this would a complex object? Why just not use string here (Yes i know, in the spec there is a requirement that language binding should define conversion to String for every object, but this may be complex... ;-)
Ignored, this was already discussed and decided.
370: missing \n
Added.
377-383: running, buffered, purged -> i think this sections needs to be more precisely and verbose. In DRMAA 1.0 the wait call was responsible for reaping the jobs. This is important because some DRMS do not "buffer" jobs at all (or do it for a very short time) and the buffering has to be done in the DRMAA library (for the session's jobs only), this implies the question: how long to buffer the job information...
Added as ToDo.
395: exitStatus - should we state here that the valid exitStatus values are 0-125 ?
Ignored, this was already discussed and decided.
445: cpuTime - should we state here that it is cumulative time among all the job processes? i.e. cpu time can be grater than wall clock time for parallel jobs
Added, also for wall clock time.
497: maybe we should add "Dictionary consumableResources;" @see Nadev e-mail I also raised this during one of the last telcos...
See meeting minutes.
594: "execution host" -> "submission host" ???
Why this ? inputPath and friends relate to files that are used by the running job on the execution host.
652: maxSlots should be optional (e.g. Torque do not support range values)
Added as ToDo.
657: SHOULD -> MAY - at least until we don't have predefined JobCategories ;-)
Ignored, this was already discussed and decided.
785: SessionManagementException - what is the added value of this exception? can it be thrown from other operations than open/close/destroy Session? If not then why we don't have WaitException, RunException? ;-)
Added as ToDo.
791: OutOfMemoryException - can we also throw this exception when the user supplied buffer was to small?
Added.
829: reservationSupported - maybe we can move it now to DrmaaReflective interface?
Obsolete.
948: FAILED vs DONE - maybe we should be more precisely for situation when the job was started but: e.g. exited with exitcode != 0 (i believe this should be DONE), was signalled, terminated via DRMAA,
Ignored, this was already discussed and decided.
967: REQUEUED, REQUEUED_HELD and BES states. Because BES state model prohibits transition between the Running to Pending... so it it should be Running state. Also the state names in brackets looks like specialization of one of the BES implementations (i will not say which implementation ;-) so they are definitively non-normative.
Added. And yes, this is why the table title contains "example"
1035: The largest valid value for endIndex MUST be defined by the language binding. - there may be also DRMS constraint.
Added.
1047: "only one of the active thread..." - is this requirement really needed? i'm asking because i'm afraid this would increase complexity of the implementations (do you remember the "session any" and its coincidence with run job operations?). This may be related to comment 377-383.
Ignored, this was already discussed and decided.
1063: "DrmaaCallback Interface"....
I just wonder if the requirement "An implementation SHOULD also disallow any library calls while the callback function is running, to avoid recursion scenarios. It is RECOMMENDED to raise TryLaterException in this case." is really needed. If we want to keep this requirement is the Job object useful at all as we can only read the jobId from it?
Added as ToDo.
1109-1110: why those methods returns the Job objects?
Ignored, this was already discussed and decided.
1262: footnote 30 (what about symmetry ;-) Also last decision was to have separate ReservationInfo struct: http://www.mail-archive.com/drmaa-wg@ogf.org/msg00250.html (when it was revoked?)
Obsoleted.
1508: reservationInfoOpt, reservationInfoImpl - what if one want to provide more information about the reservation?, also the symmetry rule ;-), relates to 1262
Obsoleted.
should we also move the drmsJobCategoryNames here (from MonitoringSession)?
No, since DrmaaReflective is only about introspection support for optional / impl. specific attributes. Added ToDo to clarify if this should move to the new generic capability check.
participants (3)
-
Daniel Gruber -
Mariusz Mamoński -
Peter Tröger