Re: [ogsa-bes-wg] Questions and potential changes to BES, as seen from HPC Profile point-of-view

6 Jun 2006

      Hi,

In-lined,

Marvin Theimer wrote:
<snip>
...
·        Information model:
·        JSDL seems to inherently be focused on describing a single 
job or a single computational resource.  For example, it has no notion 
of describing all the differing compute nodes of a (heterogeneous) 
compute cluster.  By incorporating JSDL elements into the BES 
information model it seems that BES is foreclosing the ability to 
describe things like compute clusters.  This issue also effects what 
can get returned from GetActivityJSDLDocuments.  If I'm wrong about 
this, then it seems like it would be worth having an explicit 
explanation about how to achieve this functionality somewhere in the 
specification.
If I could say here that JSDL 1.0 is not meant to be a final answer. The 
resource model as used within it is meant to be a "rough place holder" 
as we are expecting that the CIM people will be providing something much 
more robust and appropriate. We simply wanted something simple to get 
things going. We hope that when this model is available we can roll it 
into a later release of JSDL. If you're interested in following this up 
I'd suggest getting involved with the OGSA information model people (and 
obviously) the JSDL people.
...
·        The BES information model now includes various posix-specific 
elements of JSDL.  How would other systems -- such as a Windows system 
-- be described?
The posix elements are only inside the POSIXApplicationType. The best 
way would be to define a WindowsApplicationType (or something similar). 
The JSDL group would be very interested in this.
...
·        Is there any notion of specifying that all compute nodes 
should have the /same/ value for some attribute (e.g. CPU 
architecture, CPU speed, NIC card)?  This seems to be missing from the 
JSDL specification, but seems very important for BES if it is to 
support things like compute clusters.
Again this will hopefully come in the future.
...
·        Some of the elements seem either incompletely specified, have 
definitions that are open to multiple interpretations, or have 
definitions that would be very difficult to implement in practice.  In 
particular:
·        CPU architecture seems like it can't describe all the 
variations -- let alone all the peripherals such as GPUs -- that a 
computing resource might have (let alone a cluster).
·        CPU speed seems like the tip of an iceberg having to do with 
characterizing the performance of a system, which will depend on all 
manner of things like details of the processor chip used, cache sizes, 
bus used, etc.
·        Network bandwidth: is this the theoretical maximum of the NIC 
on a compute node or is it the current bandwidth actually available in 
a (shared) system?  Note that the latter is difficult to measure in a 
practically useful way.  Note also that network bandwidth only 
describes one aspect of communications performance and that several 
others are arguably equally important (e.g. latency).
All this leads to the question of whether BES will have a notion of 
extending the information model that is supplied. If so, then that 
leads to the question of what the base case should be and whether it 
should include a smaller set of things than is currently listed in the 
spec.
Are there any plans to tighten the definitions of some of the more 
vague information elements?  (I guess this really is an issue more for 
the JSDL WG than for BES.)
Again - this is where we are now planning to go with JSDL. JSDL 1.0 
should be seen as a starting point and not the end. We hope most of 
these things can be handled through extensions to JSDL 1.0. Those that 
can't we'll need to add into future versions.
...
·        GetActivityJSDLDocuments returns a JSDL document for each 
specified activity.  Is this sufficient to capture the entire 
"provenance" for what has happened to the activity?  In particular, 
would it be sufficient to allow someone to (a) run the same activity 
on another BES service (assuming same hardware and software) and get 
the same results and (b) debug what has happened to an errant 
activity?  I would argue that both capabilities have proven to be 
important in actual systems.
A JSDL document is (by the definition of our charter) a Job Submission 
document. As such things like provenance was ruled out of scope (not by 
us but by GGF in general - it was felt that this was too much to do all 
in one goal). However, there is now the scope to go back and re-address 
these issues. There was some interesting discussions at the last GGF 
meeting where the ideas of where non submission information could be 
placed. The suggestions included in a wrapper around the JSDL document 
or (and from my recollection) the more popular option was to place it in 
the outer most level of the JSDL document.

Hope this helps,

steve..
...
-- 
------------------------------------------------------------------------
Dr A. Stephen McGough                       http://www.doc.ic.ac.uk/~asm
------------------------------------------------------------------------
Technical Coordinator, London e-Science Centre, Imperial College London,
Department of Computing, 180 Queen's Gate, London SW7 2BZ, UK
tel: +44 (0)207-594-8409                        fax: +44 (0)207-581-8024
------------------------------------------------------------------------

Re: [ogsa-bes-wg] Questions and potential changes to BES, as seen from HPC Profile point-of-view

A S McGough