New synchronization approach for DRMAA2
Dear all, in reflection of the last phone conference (check the minutes), here is a possible realization of the new synchronization approach as IDL snippet: --- snip --- enum DrmaaEvent {NEW_STATE_UNDETERMINED, NEW_STATE_QUEUED_ACTIVE, NEW_STATE_HOLD, NEW_STATE_RUNNING, NEW_STATE_SYSTEM_SUSPENDED, NEW_STATE_USER_SUSPENDED, NEW_STATE_USER_SYSTEM_SUSPENDED, NEW_STATE_DONE, NEW_STATE_FAILED, ... }; interface DrmaaCallback { void notify(in DrmaaEvent event, in Job job) interface JobSession{ readonly attribute string contact; void registerEventNotification(in DrmaaCallback callback) raises UnsupportedFeatureExeption, .... JobTemplate createJobTemplate() void deleteJobTemplate(in DRMAA::JobTemplate jobTemplate) Job runJob(in DRMAA::JobTemplate jobTemplate) sequence<Job> runBulkJobs(in DRMAA::JobTemplate jobTemplate,in long beginIndex,in long endIndex,in long step) sequence<Job> waitAnyStarted(in sequence<Job> jobs, in long long timeout) sequence<Job> waitAnyTerminated(in sequence<Job> jobs, in long long timeout) ... interface Job{void suspend() void resume() void hold() void release() void terminate() JobState getState(out native subState) void waitStarted(in long long timeout) void waitTerminated(in long long timeout) JobInfo getInfo() --- snip --- waitAnyStarted() would return if any of the provided jobs has one of the states RUNNING, SYSTEM_SUSPENDED, USER_SUSPENDED, or USER_SYSTEM_SUSPENDED. It returns the according job(s) as result, which allows subsequent calls by the application with a reduced list. waitAnyTerminated() would return if any of the provided jobs has either FAILED or DONE state. waitStarted() and waitTerminated() on job level work in a similar way for only one job. The timeout parameter keeps the DRMAA1 semantics. JobSession::registerEventNotification() would accept the function pointer / object reference for the callback sink implemented by the application. This is the first time that we introduce an optional method in DRMAA, therefore we need the new UnsupportedFeatureExeption to express if this function is supported or not. The callback function signature is also standardized by the language binding (DrmaaCallback), so that all DRMAA libraries for one language can work with any application in a portable way. For the sake of portability, we also need to standardize the possible events then. New / other proposals for this enumeration (DrmaaEvent) are welcome. Please comment. Thanks, Peter.
Just a quick comment: The UnsupportedAttributeException should probably become a subclass of UnsupportedFeatureException. Daniel Peter Tröger wrote:
Dear all,
in reflection of the last phone conference (check the minutes), here is a possible realization of the new synchronization approach as IDL snippet:
--- snip ---
enum DrmaaEvent {NEW_STATE_UNDETERMINED, NEW_STATE_QUEUED_ACTIVE, NEW_STATE_HOLD, NEW_STATE_RUNNING, NEW_STATE_SYSTEM_SUSPENDED, NEW_STATE_USER_SUSPENDED, NEW_STATE_USER_SYSTEM_SUSPENDED, NEW_STATE_DONE, NEW_STATE_FAILED, ... };
interface DrmaaCallback { void notify(in DrmaaEvent event, in Job job)
interface JobSession{ readonly attribute string contact; void registerEventNotification(in DrmaaCallback callback) raises UnsupportedFeatureExeption, .... JobTemplate createJobTemplate() void deleteJobTemplate(in DRMAA::JobTemplate jobTemplate) Job runJob(in DRMAA::JobTemplate jobTemplate) sequence<Job> runBulkJobs(in DRMAA::JobTemplate jobTemplate,in long beginIndex,in long endIndex,in long step) sequence<Job> waitAnyStarted(in sequence<Job> jobs, in long long timeout) sequence<Job> waitAnyTerminated(in sequence<Job> jobs, in long long timeout) ...
interface Job{void suspend() void resume() void hold() void release() void terminate() JobState getState(out native subState) void waitStarted(in long long timeout) void waitTerminated(in long long timeout) JobInfo getInfo()
--- snip ---
waitAnyStarted() would return if any of the provided jobs has one of the states RUNNING, SYSTEM_SUSPENDED, USER_SUSPENDED, or USER_SYSTEM_SUSPENDED. It returns the according job(s) as result, which allows subsequent calls by the application with a reduced list.
waitAnyTerminated() would return if any of the provided jobs has either FAILED or DONE state.
waitStarted() and waitTerminated() on job level work in a similar way for only one job. The timeout parameter keeps the DRMAA1 semantics.
JobSession::registerEventNotification() would accept the function pointer / object reference for the callback sink implemented by the application. This is the first time that we introduce an optional method in DRMAA, therefore we need the new UnsupportedFeatureExeption to express if this function is supported or not. The callback function signature is also standardized by the language binding (DrmaaCallback), so that all DRMAA libraries for one language can work with any application in a portable way. For the sake of portability, we also need to standardize the possible events then. New / other proposals for this enumeration (DrmaaEvent) are welcome.
Please comment.
Thanks, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
Exception hierarchies are intentionally not part of the DRMAA IDL spec. You can do it in the Java binding, if you want: <quote> Language bindings MAY decide to introduce a hierarchical ordering of the DRMAA exceptions through class derivation. In this case it MAY also happen that new exceptions are introduced for behavior aggregation. In this case, those exceptions SHALL be marked as abstract, to prevent them from being thrown. </quote> I am also interested in your "long" comment ;-) Peter.
Just a quick comment: The UnsupportedAttributeException should probably become a subclass of UnsupportedFeatureException.
Daniel
Peter Tröger wrote:
Dear all,
in reflection of the last phone conference (check the minutes), here is a possible realization of the new synchronization approach as IDL snippet:
--- snip ---
enum DrmaaEvent {NEW_STATE_UNDETERMINED, NEW_STATE_QUEUED_ACTIVE, NEW_STATE_HOLD, NEW_STATE_RUNNING, NEW_STATE_SYSTEM_SUSPENDED, NEW_STATE_USER_SUSPENDED, NEW_STATE_USER_SYSTEM_SUSPENDED, NEW_STATE_DONE, NEW_STATE_FAILED, ... };
interface DrmaaCallback { void notify(in DrmaaEvent event, in Job job)
interface JobSession{ readonly attribute string contact; void registerEventNotification(in DrmaaCallback callback) raises UnsupportedFeatureExeption, .... JobTemplate createJobTemplate() void deleteJobTemplate(in DRMAA::JobTemplate jobTemplate) Job runJob(in DRMAA::JobTemplate jobTemplate) sequence<Job> runBulkJobs(in DRMAA::JobTemplate jobTemplate,in long beginIndex,in long endIndex,in long step) sequence<Job> waitAnyStarted(in sequence<Job> jobs, in long long timeout) sequence<Job> waitAnyTerminated(in sequence<Job> jobs, in long long timeout) ...
interface Job{void suspend() void resume() void hold() void release() void terminate() JobState getState(out native subState) void waitStarted(in long long timeout) void waitTerminated(in long long timeout) JobInfo getInfo()
--- snip ---
waitAnyStarted() would return if any of the provided jobs has one of the states RUNNING, SYSTEM_SUSPENDED, USER_SUSPENDED, or USER_SYSTEM_SUSPENDED. It returns the according job(s) as result, which allows subsequent calls by the application with a reduced list.
waitAnyTerminated() would return if any of the provided jobs has either FAILED or DONE state.
waitStarted() and waitTerminated() on job level work in a similar way for only one job. The timeout parameter keeps the DRMAA1 semantics.
JobSession::registerEventNotification() would accept the function pointer / object reference for the callback sink implemented by the application. This is the first time that we introduce an optional method in DRMAA, therefore we need the new UnsupportedFeatureExeption to express if this function is supported or not. The callback function signature is also standardized by the language binding (DrmaaCallback), so that all DRMAA libraries for one language can work with any application in a portable way. For the sake of portability, we also need to standardize the possible events then. New / other proposals for this enumeration (DrmaaEvent) are welcome.
Please comment.
Thanks, Peter.
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
FWIW, this matches the SAGA model, wher we have a 'metric' called job.state, for which callbacks can be registered. Those fire whenever the job state changes. Nice. Best, Andre. Quoting [Peter Tr?ger] (Aug 31 2009):
Dear all,
in reflection of the last phone conference (check the minutes), here is a possible realization of the new synchronization approach as IDL snippet:
--- snip ---
enum DrmaaEvent {NEW_STATE_UNDETERMINED, NEW_STATE_QUEUED_ACTIVE, NEW_STATE_HOLD, NEW_STATE_RUNNING, NEW_STATE_SYSTEM_SUSPENDED, NEW_STATE_USER_SUSPENDED, NEW_STATE_USER_SYSTEM_SUSPENDED, NEW_STATE_DONE, NEW_STATE_FAILED, ... };
interface DrmaaCallback { void notify(in DrmaaEvent event, in Job job)
interface JobSession{ readonly attribute string contact; void registerEventNotification(in DrmaaCallback callback) raises UnsupportedFeatureExeption, .... JobTemplate createJobTemplate() void deleteJobTemplate(in DRMAA::JobTemplate jobTemplate) Job runJob(in DRMAA::JobTemplate jobTemplate) sequence<Job> runBulkJobs(in DRMAA::JobTemplate jobTemplate,in long beginIndex,in long endIndex,in long step) sequence<Job> waitAnyStarted(in sequence<Job> jobs, in long long timeout) sequence<Job> waitAnyTerminated(in sequence<Job> jobs, in long long timeout) ...
interface Job{void suspend() void resume() void hold() void release() void terminate() JobState getState(out native subState) void waitStarted(in long long timeout) void waitTerminated(in long long timeout) JobInfo getInfo()
--- snip ---
waitAnyStarted() would return if any of the provided jobs has one of the states RUNNING, SYSTEM_SUSPENDED, USER_SUSPENDED, or USER_SYSTEM_SUSPENDED. It returns the according job(s) as result, which allows subsequent calls by the application with a reduced list.
waitAnyTerminated() would return if any of the provided jobs has either FAILED or DONE state.
waitStarted() and waitTerminated() on job level work in a similar way for only one job. The timeout parameter keeps the DRMAA1 semantics.
JobSession::registerEventNotification() would accept the function pointer / object reference for the callback sink implemented by the application. This is the first time that we introduce an optional method in DRMAA, therefore we need the new UnsupportedFeatureExeption to express if this function is supported or not. The callback function signature is also standardized by the language binding (DrmaaCallback), so that all DRMAA libraries for one language can work with any application in a portable way. For the sake of portability, we also need to standardize the possible events then. New / other proposals for this enumeration (DrmaaEvent) are welcome.
Please comment.
Thanks, Peter.
-- Nothing is ever easy.
participants (3)
-
Andre Merzky -
Daniel Templeton -
Peter Tröger