Hi, thanks for your huge contribution with this. Here are my comments. If the OGF.ORG SVN works again, I will update the document sources on the server. The new PDF follows today. There are three kinds of reactions I state: (1) Added. No discussion needed, I just performed the according document modification. (2) Ignored. I am pretty confident that this was debated and decided well enough in the group, so I am not willing to re-open discussion again. The group is free to disagree. (3) Obsoleted. Recent document modifications already established the proposal as a fact. Best regards, Peter.
Hi,
I finally managed to read the current version of spec more carefully. Bellow some comments (line numbering corresponds to version annotated as "draft3"):
line 81: DRMAA1 -> DRMAA Version 1 [reference] 94,95: A Exec.. -> An Exec 159: advanced -> advance 296: "Machine structure" - should we include machine state here (e.g. down, administratively down, available, busy, ...) ? 316: consistent... -> consistent among all Machine struct instances.
Added.
Moreover any reported name should be a syntactically correct input for the candidateMachines attribute of the JobTemplate. ???
canddateMachines takes MachineList as input.
361: any jobSubState - is there really any case where this would a complex object? Why just not use string here (Yes i know, in the spec there is a requirement that language binding should define conversion to String for every object, but this may be complex... ;-)
Ignored, this was already discussed and decided.
370: missing \n
Added.
377-383: running, buffered, purged -> i think this sections needs to be more precisely and verbose. In DRMAA 1.0 the wait call was responsible for reaping the jobs. This is important because some DRMS do not "buffer" jobs at all (or do it for a very short time) and the buffering has to be done in the DRMAA library (for the session's jobs only), this implies the question: how long to buffer the job information...
Added as ToDo.
395: exitStatus - should we state here that the valid exitStatus values are 0-125 ?
Ignored, this was already discussed and decided.
445: cpuTime - should we state here that it is cumulative time among all the job processes? i.e. cpu time can be grater than wall clock time for parallel jobs
Added, also for wall clock time.
497: maybe we should add "Dictionary consumableResources;" @see Nadev e-mail I also raised this during one of the last telcos...
See meeting minutes.
594: "execution host" -> "submission host" ???
Why this ? inputPath and friends relate to files that are used by the running job on the execution host.
652: maxSlots should be optional (e.g. Torque do not support range values)
Added as ToDo.
657: SHOULD -> MAY - at least until we don't have predefined JobCategories ;-)
Ignored, this was already discussed and decided.
785: SessionManagementException - what is the added value of this exception? can it be thrown from other operations than open/close/destroy Session? If not then why we don't have WaitException, RunException? ;-)
Added as ToDo.
791: OutOfMemoryException - can we also throw this exception when the user supplied buffer was to small?
Added.
829: reservationSupported - maybe we can move it now to DrmaaReflective interface?
Obsolete.
948: FAILED vs DONE - maybe we should be more precisely for situation when the job was started but: e.g. exited with exitcode != 0 (i believe this should be DONE), was signalled, terminated via DRMAA,
Ignored, this was already discussed and decided.
967: REQUEUED, REQUEUED_HELD and BES states. Because BES state model prohibits transition between the Running to Pending... so it it should be Running state. Also the state names in brackets looks like specialization of one of the BES implementations (i will not say which implementation ;-) so they are definitively non-normative.
Added. And yes, this is why the table title contains "example"
1035: The largest valid value for endIndex MUST be defined by the language binding. - there may be also DRMS constraint.
Added.
1047: "only one of the active thread..." - is this requirement really needed? i'm asking because i'm afraid this would increase complexity of the implementations (do you remember the "session any" and its coincidence with run job operations?). This may be related to comment 377-383.
Ignored, this was already discussed and decided.
1063: "DrmaaCallback Interface"....
I just wonder if the requirement "An implementation SHOULD also disallow any library calls while the callback function is running, to avoid recursion scenarios. It is RECOMMENDED to raise TryLaterException in this case." is really needed. If we want to keep this requirement is the Job object useful at all as we can only read the jobId from it?
Added as ToDo.
1109-1110: why those methods returns the Job objects?
Ignored, this was already discussed and decided.
1262: footnote 30 (what about symmetry ;-) Also last decision was to have separate ReservationInfo struct: http://www.mail-archive.com/drmaa-wg@ogf.org/msg00250.html (when it was revoked?)
Obsoleted.
1508: reservationInfoOpt, reservationInfoImpl - what if one want to provide more information about the reservation?, also the symmetry rule ;-), relates to 1262
Obsoleted.
should we also move the drmsJobCategoryNames here (from MonitoringSession)?
No, since DrmaaReflective is only about introspection support for optional / impl. specific attributes. Added ToDo to clarify if this should move to the new generic capability check.