OGSA HPC Profile Minutes – Fri Aug 25 2006
Participants:
Marty Humphrey (
Marvin Theimer (MSFT)
Glenn Wasson (
Michel Drescher (Fujitsu Europe)
Minutes: Marty Humphrey
* Summary of Actions:
AI-HPCP-0825a: Everyone will re-read Donal's
version of the HPC Profile Application
in the JSDL section of Gridforge by
Friday Sept 1 call.
AI-HPCP-0825b: Check with BES doc and/or WG to
determine the "final word"
on issues of idempotency
and subscriptions (and others).
AI-HPCP-0825c: Marvin will suggest phrasing re:
vectors to the email
group for consensus.
AI-HPCP-0825d: Marvin will mopdify HPC Profile
doc to resolve Chris' comments,
restrict to integer the
total CPU count, total resource count
and individual CPU
count.
AI-HPCP-0825e: Marty to send email to
BES/JSDL/HPCP: "We are looking
for prototype and
interop participants."
* Previous minutes: Approved
* Review of Actions
AI-HPCP-0818a: Donal Fellows to review HPC
Profile Application.
DONE. On Mon Aug-21, Donal Fellows
uploaded a new version to the JSDL space in
Gridforge. The main modification is
a more comprehensive security section.
Our group discussion in today's
telecon on this subject resulted in
AI-HPCP-0825a:
AI-HPCP-0825a: "Everyone"
will re-read it (in the JSDL section of
Gridforge). On the Sept
1 call, we will determine (via vote)
if we believe that it's
ready for ratification via the JSDL
group and the large OGF
at large.
In short, Michel suggested that we
keep it in the HPCP
gridforge section while it is being
actively worked on and transition it to
the JSDL section when it's ready for
"normative radification". Because it's
currently in the JSDL forge section,
we will only "pull it back" to HPCP if,
upon reading it, we decide
(collectively) that it needs modifications (see
AI-HPCP-0825a).
AI-HPCP-0818b: Marvin Theimer will add paragraph
to HPC Basic Profile
stating: "all other
elements of the JSDL 1.0 MAY appear
but MAY also
result in a 'not implemented' fault, but
only the list here must
be supported"
DONE. This is discussed in this
telecon, below.
AI-HPCP-0818c: Marvin Theimer will add first
draft of section 4 to
HPC Basic Profile that
restricts BES operations to
singletons
DONE. This is discussed in this
telecon, below.
* Go over changes to HPC Profile that Marvin sent out
-----------------------------------------------------
Marvin: Before giving an overview of the
modifications, there's an issue/question
to BES guys: what about idempotency
and subscriptions (and perhaps other
issues)? That is, this was either in
a previous version of the BES space *OR*
explicitly mentioned to be included
in a future version (but it's not mentioned
in the current BES spec). We
understand that this will appear as an option in
the BES spec, but this has
implications on the HPC Profile (e.g., we might like
to restrict this and/or claim that
such considerations are out-of-scope).
Marty: Yes, we knew this going in -- that JSDL
is "easier" in that it's fixed in
stone, while BES is a moving target.
This makes profiling difficult.
AI-HPCP-0825b: Check with BES doc
and/or WG to determine the "final word"
on this issue.
Marvin gives an overview of his modifications,
beginning with the proposed text
in the Intro paragrph to Section 3
(consensus: working is fine).
Marvin described Mods to Section 4: changed the
intro, and vectors must be exactly
length 1.
The group discussed at depth the issue of the
treatment of vectors
Initially, the group wanted to
insert something akin to Michel's
proposed test in email
8/25
Glenn: isn't there potential
confusion: the client sends a vector of
size greater than 1...
what exactly does the service do? Does
the service have a
choice which one to choose?
Marvin: BES needs to deal with this
case, although there are two different
classes of faults. There
is a BES fault - "error in
inputs (the following
inputs -- activity IDs -- were unknown)" and
an HPC Profile fault
("the vector is too large")
Glenn: suggests that the base
profile should say 1 (and return a size of 1)
and greater than 1 is a
fault; extension: here's how it maps
differing size inputs to
differing size outputs
Marvin: asks for clarification:
extension to BES, right? this idea that
"I don't return
exactly what was specified" is a BES concept
Glenn: yes
Michel: saying that you'll return a
vector of size 1 retricts composability;
instead, return a vector
of the same size
Marvin: Good point -- it may handle
things of size greater than one -- if
so, then it must return
a vector of the same size. But it may fault.
It MUST handle a vector
of size 1. If it handles a vector of size
larger than one, then it
would publish an attribute indicating so.
Michel suggests wording that the
client "SHOULD" try to use a vector of size 1
but the group consensus
is (while the intent is good) that this is not
the best wording because
it could be interpreted to make clients ONLY
work in terms of vectors
of size 1 -- this is not the intent and might
restrict clients.
AI-HPCP-0825c: Marvin
will suggest phrasing and send it to the email
group for
consensus. Marty to make this into a tracker.
this concluded the discussion of
vectors.
Marty: What about total CPU count?
"non-exact" clearly has value, but it just
seems too complicated for interop,
for compliance
Marvin: In addition, not all schedulers
implement a range, so let's make it exact.
Marty: Section 3.2.5.8. What is "total
resource count"?
Michel: this is for "tiling" - e.g.,
5x10processes.
Marty: Let's make this exact as well. Group
consensus is that this is reasonable.
Marvin: some of these issues can arise and be
resolved during implementations as well
Marty: what about Chris' comments? Re: RAM...
"total physical memory" is certainly easier.
Michel: JSDL had long discussions... it comes
down to requirements and capabilities.
Marvin: this must be clarified in the HPC
profile. in our JSDL section, make these
appear as requirements. That is, the
phrasing of the text in the JSDL section
of the HPC Profile should clearly
indicate that they're *requirements* (not
capabilities). In the BES section of
the HPC profile, make the text clearly
reflect that these are capabilities
(not requirements).
Michel: Section 3 of HPCP is clear. Section 4
needs clarification: if BES exposes this
via JSDL terms, then we need to
clarify that these are capabilities, which
are static values.
Marvin: 3.2.5.6 and 3.2.5.7 need to have
clarifications changed.
Glenn: we need to restrict to integer in the
total CPU count, total resource count
and individual CPU count
Group consensus on this
AI-HPCP-0825d: Marvin will resolve
Chris' comments by making clarifications that
these are requirements.
In addition, Marvin will change the document to
restrict to integer the
total CPU count, total resource count
and individual CPU count
* Discuss prototyping and interop opportunities and
next-steps.
---------------------------------------------------------------
The group agreed that the HPC Profile document
is reasonably close to completion, so it
was suggested that we start
developing prototypes and perform interop testing
to further disambiguate the document
(and start developing a compliance suite)
Michel: this document should get into public
comments first, so we should wait before
creating implementations
Marvin: I propose something slightly
different... I propose that we put up for
public comment and in parallel do
implementations... this is primarily
due to time contraints.
Marty: Public comment is good, of course, but
can we do anything to ensure that
the vendors are engaged/committed?
Marvin: I will send it directly to Sun, Altair,
Condor, Globus, ESI, etc....
"please please review this
carefully and contribute..."
Marty: if we get Marvin's changes in the BES
(and "security section"), we will
SUSPEND development of the HPCP
docment
AI-HPCP-0825e: Marty to send email
to BES/JSDL/HPCP: "We are looking
for prototype and
interop participants. We will be start to
discuss compliance
suite. This will be the agenda for next week's
call. first round of
interop will be attempted BEFORE ggf18
(exactly what? we're not
sure). Two phases: phase 1: quick,
phase 2: for interop
after GGF18. Discussion fo extensions
can be accomodated RIGHT
NOW in parallel to this -- to keep
the momentum going.
Meeting concluded. Next call: Friday Sep 1 2006 7am Pacific
time.