Fwd: SAGA GridCPR API document review
Folks, I'm forwarding Nathan Stone's comments regarding the SAGA_CPR document: - Derek Begin forwarded message:
From: Nathan Stone
Date: January 6, 2010 4:47:58 PM EST To: Derek Simmel Subject: Re: SAGA GridCPR API document review Hi Derek,
Regarding the SAGA_CPR document:
- CPRFrequency (page 4) should probably have a default more like 1 hour than 1 day, from previous machine experience. - CPRTimeToLive (page 5) sounds like a bad idea. Checkpoints should be kept until the job is complete, without regard for how long that is. Otherwise you may have long-running jobs that will thereby have their recoveries cut out from under them -- while the job innocently and unknowingly gets swapped out for a higher priority job. - class "directory" (page 9) seems to have nothing to do with CPR. I hope that's all just a restatement for purposes of clarity... - class "checkpoint" (page 10) seems to be the same thing as class "directory". If they really are the same, then why define an identical object? Why not just say that a checkpoint is a special instantiation of a directory object? Or, if you're really gung-ho on the object framework, derive a "checkpoint" class from the "directory" class... and add one more special function (or a special constructor status?).
The other CPR_Architecture document appears to be the same as when we last left it, and as such certainly seems fine to me.
Feel free to pass along my comments to the current document steward.
Thanks, Nathan.
--- Derek Simmel Pittsburgh Supercomputing Center (412) 268-1035
here os the latest package draft... A Quoting [Derek Simmel] (Jan 13 2010):
From: Derek Simmel
Date: Wed, 13 Jan 2010 14:21:50 -0500 To: saga-rg@ggf.org Subject: [SAGA-RG] Fwd: SAGA GridCPR API document review Folks,
I'm forwarding Nathan Stone's comments regarding the SAGA_CPR document:
- Derek Begin forwarded message:
From: Nathan Stone <[1]stone@psc.edu>
Date: January 6, 2010 4:47:58 PM EST
To: Derek Simmel <[2]dsimmel@psc.edu>
Subject: Re: SAGA GridCPR API document review
Hi Derek, Regarding the SAGA_CPR document: - CPRFrequency (page 4) should probably have a default more like 1 hour than 1 day, from previous machine experience. - CPRTimeToLive (page 5) sounds like a bad idea. Checkpoints should be kept until the job is complete, without regard for how long that is. Otherwise you may have long-running jobs that will thereby have their recoveries cut out from under them -- while the job innocently and unknowingly gets swapped out for a higher priority job. - class "directory" (page 9) seems to have nothing to do with CPR. I hope that's all just a restatement for purposes of clarity... - class "checkpoint" (page 10) seems to be the same thing as class "directory". If they really are the same, then why define an identical object? Why not just say that a checkpoint is a special instantiation of a directory object? Or, if you're really gung-ho on the object framework, derive a "checkpoint" class from the "directory" class... and add one more special function (or a special constructor status?). The other CPR_Architecture document appears to be the same as when we last left it, and as such certainly seems fine to me. Feel free to pass along my comments to the current document steward. Thanks, Nathan.
--- Derek Simmel Pittsburgh Supercomputing Center (412) 268-1035
References
1. mailto:stone@psc.edu 2. mailto:dsimmel@psc.edu -- Nothing is ever easy.
participants (2)
-
Andre Merzky
-
Derek Simmel