I would also vote for the total amount of cores and sockets :) We could also think about reporting the amount of concurrent threads that are supported by the hardware (hyperthreading in case of Intel or chip-multithreading in case of Sun T2 processors). This could prevent the user for puzzling out what is meant by a core (is it a real one, or the hyperthreading/CMT thing). If not we should at least define that a core is really a physical core. Daniel On 03/25/10 15:44, Daniel Templeton wrote:
I would tend to agree that total core count is more useful. SGE also reports socket count as of 6.2u5, by the way. (That's actually thanks to our own Daniel Gruber.)
Daniel
On 03/25/10 07:03, Mariusz Mamoński wrote:
Also for me. As we are talking about monitoring interface i propose two more changes to the machine monitoring interface:
1. Having a new data struct called "MachineInfo" with attributes like Load, PhysMemory, ... and getMachineInfo(in String machineName) method in the Monitoring interface. Rationale: the same as for the JobInfo (consistency issue, fetching all machines attributes at once is more natural in DRMS APIs then querying for each attribute separately)
2. change machineCoresPerSocket to machinesCores, if one have machineSockets he or she can easily determine the machineCoresPerSocket. The problem with the current API is that if the DRM do not support "machineSockets" (as far i checked only LSF provide this two-level granularity @see Google Doc) we loose the most essential information: "how many single processing units do we have on single machine?"
Cheers,
On 23 March 2010 23:00, Daniel Templeton<daniel.templeton@oracle.com> wrote:
That's fine with me.
Daniel
On 03/23/10 13:51, Peter Tröger wrote:
Any non-SGE opinion ?
Here is mine:
I could only find one single source that explains the load average source in Condor :)
http://www.patentstorm.us/patents/5978829/description.html
Condor provides only the 1-minute load average from the uptime command.
Same holds for Moab: http://www.clusterresources.com/products/mwm/docs/commands/checknode.shtml
And PBS: http://wiki.egee-see.org/index.php/Installing_and_configuring_guide_for_MonA...
And MAUI: https://psiren.cs.nott.ac.uk/projects/procksi/wiki/JobManagement
I vote for reporting only the 1-minute load average.
/Peter.
And BTW, by using the uptime(1) load semantics, we loose Windows support. There is no such attribute there, load is measured in percentage of non-idle time, and has no direct relationship to the ready queue lengths.
Best, Peter.
Am 22.03.2010 um 16:02 schrieb Daniel Templeton:
SGE tends to look at the 5-minute average, although any can be configured. You could solve it the same way we did for SGE -- offer three: machineLoadShort, machineLoadMed, machineLoadLong.
Daniel
On 03/22/10 06:05, Peter Tröger wrote:
> Hi, > > next remaining thing from OGF28: > > We support the determination of machineLoad average in the > MonitoringSession interface. At OGF, we could not agree on which of > the typical intervals (1/5/15 minutes) we want to use here. Maybe > all of them ? > > Best, > Peter. > > > > > -- > drmaa-wg mailing list > drmaa-wg@ogf.org > http://www.ogf.org/mailman/listinfo/drmaa-wg > -- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg