Definition of Memory with Muli-core CPUs.

Hi Sergio, I just received the following GGUS ticket. Looking at the Glue 1.3 schema, it seems that the definition of GlueHostMainMemoryRAMSize and GlueHostMainMemoryVirtualSize needs to be improved. Detailed description: I've recently received a request for a VO user to change the memory info published by the site (GlueHostMainMemoryRAMSize, GlueHostMainMemoryVirtualSize) to be per CPU core and not per WN. From the yaim documentation I was under the impression memory info was per WN: CE_MINPHYSMEM - RAM size (Mbytes) (per WN and not per CPU) CE_MINVIRTMEM - Virtual Memory size in (Mbytes) (per WN and not per CPU) but it seems RB/WMS can't schedule jobs correctly unless the information is per CPU core (JDL requirement can't check how much memory a single CPU job gets if info is per WN). Can someone check this and fix the docs?

glue-wg-bounces@ogf.org on behalf of Laurence said:
I just received the following GGUS ticket. Looking at the Glue 1.3 schema, it seems that the definition of GlueHostMainMemoryRAMSize and GlueHostMainMemoryVirtualSize needs to be improved.
This is a known issue - it may be in one of Sergio's trackers already, I know I've mentioned it before. The trouble is that a) the schema does say that it should be the memory per node, so sites shouldn't just unilaterally do something different, and b) you don't really know how much memory a job will get - the physical memory is an upper limit, it may well be less, but not necessarily as low as RAM/slots. Anyway if you want you can always put something like RAM/SMPsize in the JDL. Having said all that I know that some sites do put the estimated memory per job so it isn't very consistent anyway ... Stephen

glue-wg-bounces@ogf.org on behalf of Burke, S (Stephen) said:
This is a known issue - it may be in one of Sergio's trackers already, I know I've mentioned it before.
This was my reply last time this came up (May), probably still about right ... Like many things this is about as clear as mud :) The way the schema is written, the RAMSize comes out of the physical host description and hence was supposed to be the memory for the whole WN. However, for job submission purposes it's more useful to know how much memory a job will get, so many sites divide by the number of job slots per CPU. Of course, in general there is no real guarantee anyway about how much memory a job will actually get, the values are set by hand so may be wrong, and many clusters will have WNs with varying amounts of memory ... the last time this was discussed I think the conclusion was that people should publish the memory per WN since a) it's what the schema says, b) it is a fairly well-defined number, and c) a job might get that much if it happens to get the node to itself or if it shares with a small-memory job. If we ever had the passing of requirements to the LRMS it could also be used to ensure that jobs could get memory up to the physical amount in the machine. If you wanted to estimate what you get per job on average the right value to divide by is not LogicalCPUs but ArchitectureSMPSize - except that that was defined before we had multi-core and hyperthreading so it's not entirely clear how to deal with them (personally I would count them) ... and of course sites don't necessarily configure one slot per CPU, although it's pretty common. SMPSize does at least look to be set vaguely sensibly ... 44 GlueHostArchitectureSMPSize: 1 1 GlueHostArchitectureSMPSize: 16 231 GlueHostArchitectureSMPSize: 2 32 GlueHostArchitectureSMPSize: 4 1 GlueHostArchitectureSMPSize: 8 Stephen

Hi Laurence, Laurence wrote:
Hi Sergio,
I just received the following GGUS ticket. Looking at the Glue 1.3 schema, it seems that the definition of GlueHostMainMemoryRAMSize and GlueHostMainMemoryVirtualSize needs to be improved.
I remember that this issue was raised also in the past. I think the best way to proceed for GLUE 2 is to shape it as a use case to be added in the document. When we will move to attribute definitions driven by use cases, then we'll address it. Can you take care of that?
Detailed description: I've recently received a request for a VO user to change the memory info published by the site (GlueHostMainMemoryRAMSize, GlueHostMainMemoryVirtualSize) to be per CPU core and not per WN.
From the yaim documentation I was under the impression memory info was per WN:
CE_MINPHYSMEM - RAM size (Mbytes) (per WN and not per CPU) CE_MINVIRTMEM - Virtual Memory size in (Mbytes) (per WN and not per CPU)
but it seems RB/WMS can't schedule jobs correctly unless the information is per CPU core (JDL requirement can't check how much memory a single CPU job gets if info is per WN).
Can someone check this and fix the docs?
for GLUE 1.3 I take a note and then I'll verify it together with Stephen comments. Cheers, Sergio
participants (3)
-
Burke, S (Stephen)
-
Laurence
-
Sergio Andreozzi