In order to trigger some discussion, here is my first quick evaluation for LSF, based on http://www.zdv.uni-mainz.de/cms-extern/lsf/lsf6.0/pdf/manuals/lsf_ref_6.0.pd... http://www.cisl.ucar.edu/docs/LSF/7.0.3/command_reference/lshosts.cmdref.htm... http://ams.cern.ch/AMS/7/admin/troubleshooting.html http://www.slac.stanford.edu/comp/unix/package/lsf/LSF6.1_doc/html/lsf6.1_ad...
enum OperatingSystem {HPUX, LINUX, TRUE64, DUNIX, OSF1, MACOS, SUNOS, WIN, WINNT, AIX, UNIXWARE, BSD, OTHER}
enum CpuArchitecture {ALPHA, PA-RISC, X86, X64, IA-64, MIPS, PPC, PPC64, SPARC, SPARC64, OTHER}
interface MonitoringSessions{
readonly attribute string[] drmVersionString; readonly attribute string[] drmMachineNames; int machineSockets(in string machineName); int machineCoresPerSocket(in string machineName); int machineLoad(in string machineName, in long coreNumber); int machinePhysMemory(in string machineName); int machineVirtMemory(in string machineName); OperatingSystem machineOS(in string machineName); string machineOSVersion(in string machineName); CpuArchitecture machineArch(in string machineName);
};
LSF supports the "lshosts" command, which can show (beside other things) the following machine information: - host name (== machineName) - type, e.g. CRAYJ, SUNSOL, ALPHA, RS6K, SGI6, HPPA, LINUX86 (== machineOS ???) - model, e.g. Ultra2, SunSparc, DEC3000, IBM350, R10K, HP715, Intel_IA64, Ultra5S, PowerPC_G4, HP300 (== machineArch) - cpuf, the relative CPU performance factor (== ???) - ncpus (== machineSockets * machineCoresPerSocket) - nprocs (== machineSockets) - ncores (== machineCoresPerSocket) - maxmem (== machinePhysMemory) - maxswp (== machineVirtMemory - machinePhysMemory) - ndisks, the number of local disks (== ???) - maxtmp, the maximum available temporary space (== ???) This mapping is still incomplete, but at a first glance, our interface seems to fit. Machine load information analysis is still unclear for me. LSF seems to support a lot of mainframe / Unix architectures that are missing in our current list. The amount of tmp space available might be an interesting addition. Best, Peter. P.S.: In case nobody finds time to contribute, I will skip tomorrow's phone call. We need to do the offline work first.
--- snip
Some rationales:
The list of operating systems is a reduced version of the DMTF list I sent earlier, and currently only considers the supported OS types in Condor. The list of CPU architectures is a combination of the supported identifiers in Condor + Debian.
It is assumed that each OS identifier only makes sense with an OS version number string, which is not standardized by DRMAA. It is tempting to derive this version number string from "uname -r" by default. However, this might be too much of information for a DRMAA application. You would get the Darwin kernel version in MacOS, or the specific minor build revision with a Linux kernel. I think that such information is not really useful for job submission decisions. Instead, I favour the interpretation as true "operating system version", something that does not change when you do software updates on the machine. Some examples:
Snow Leopard: "MACOS" + "10.6" Windows 7: "WINNT" + "6.1" Ubuntu Jaunty Jackalope: "LINUX" + "2.6" Solaris 10: "SUNOS" + "5.10"
Things I am not sure about:
- Do we need to distinguish the different BSD derivations ? - Do we really need support for Non-NT Windows and OSF/1 ? - Is SCO OpenServer something different from SCO UnixWare ? Is this a relevant separation ? - Do we need to add mainframe operating systems ? - Do we need a more fine-grained distinguishing between different Sparc processors ? - What is missing for LSF / PBS / Globus / ... ?
Best, Peter.
------------------------------------------------------------------------
-- drmaa-wg mailing list drmaa-wg@ogf.org http://www.ogf.org/mailman/listinfo/drmaa-wg