OGSA HPC Profile Minutes – Fri Aug 25 2006

 

Participants:

            Marty Humphrey            (University of Virginia)

            Marvin Theimer             (MSFT)

            Glenn Wasson               (University of Virginia)

            Michel Drescher            (Fujitsu Europe)

 

Minutes: Marty Humphrey

 

 

* Summary of Actions:

 

            AI-HPCP-0825a: Everyone will re-read Donal's version of the HPC Profile Application

                        in the JSDL section of Gridforge by Friday Sept 1 call.

 

            AI-HPCP-0825b: Check with BES doc and/or WG to determine the "final word"

                                    on issues of idempotency and subscriptions (and others).

 

            AI-HPCP-0825c: Marvin will suggest phrasing re: vectors to the email

                                    group for consensus.

 

            AI-HPCP-0825d: Marvin will mopdify HPC Profile doc to resolve Chris' comments, 

                                    restrict to integer the total CPU count, total resource count

                                    and individual CPU count.

 

            AI-HPCP-0825e: Marty to send email to BES/JSDL/HPCP: "We are looking

                                    for prototype and interop participants."

 

 

* Previous minutes: Approved

 

* Review of Actions

 

            AI-HPCP-0818a: Donal Fellows to review HPC Profile Application.

 

                        DONE. On Mon Aug-21, Donal Fellows uploaded a new version to the JSDL space in

                        Gridforge. The main modification is a more comprehensive security section.

                        Our group discussion in today's telecon on this subject resulted in

                        AI-HPCP-0825a:

 

                        AI-HPCP-0825a: "Everyone" will re-read it (in the JSDL section of

                                    Gridforge). On the Sept 1 call, we will determine (via vote)

                                    if we believe that it's ready for ratification via the JSDL

                                    group and the large OGF at large.

 

                        In short,  Michel suggested that we keep it in the HPCP

                        gridforge section while it is being actively worked on and transition it to

                        the JSDL section when it's ready for "normative radification". Because it's

                        currently in the JSDL forge section, we will only "pull it back" to HPCP if,

                        upon reading it, we decide (collectively) that it needs modifications (see

                        AI-HPCP-0825a).

 

            AI-HPCP-0818b: Marvin Theimer will add paragraph to HPC Basic Profile

                                    stating: "all other elements of the JSDL 1.0 MAY appear

                                    but MAY           also result in a 'not implemented' fault, but

                                    only the list here must be supported"

                       

                        DONE. This is discussed in this telecon, below. 

 

            AI-HPCP-0818c: Marvin Theimer will add first draft of section 4 to

                                    HPC Basic Profile that restricts BES operations to

                                    singletons

 

                        DONE. This is discussed in this telecon, below. 

 

* Go over changes to HPC Profile that Marvin sent out

-----------------------------------------------------

 

            Marvin: Before giving an overview of the modifications, there's an issue/question

                        to BES guys: what about idempotency and subscriptions (and perhaps other

                        issues)? That is, this was either in a previous version of the BES space *OR*

                        explicitly mentioned to be included in a future version (but it's not mentioned

                        in the current BES spec). We understand that this will appear as an option in

                        the BES spec, but this has implications on the HPC Profile (e.g., we might like

                        to restrict this and/or claim that such considerations are out-of-scope).

            Marty: Yes, we knew this going in -- that JSDL is "easier" in that it's fixed in

                        stone, while BES is a moving target. This makes profiling difficult.

 

                        AI-HPCP-0825b: Check with BES doc and/or WG to determine the "final word"

                                    on this issue.

 

            Marvin gives an overview of his modifications, beginning with the proposed text

                        in the Intro paragrph to Section 3 (consensus: working is fine).

            Marvin described Mods to Section 4: changed the intro, and vectors must be exactly

                        length 1.

            The group discussed at depth the issue of the treatment of vectors

                        Initially, the group wanted to insert something akin to Michel's

                                    proposed test in email 8/25

                        Glenn: isn't there potential confusion: the client sends a vector of

                                    size greater than 1... what exactly does the service do? Does

                                    the service have a choice which one to choose?

                        Marvin: BES needs to deal with this case, although there are two different

                                    classes of faults. There is a BES fault - "error in

                                    inputs (the following inputs -- activity IDs -- were unknown)" and

                                    an HPC Profile fault ("the vector is too large")

                        Glenn: suggests that the base profile should say 1 (and return a size of 1)

                                    and greater than 1 is a fault; extension: here's how it maps

                                    differing size inputs to differing size outputs

                        Marvin: asks for clarification: extension to BES, right? this idea that

                                    "I don't return exactly what was specified" is a BES concept      

                        Glenn: yes

                        Michel: saying that you'll return a vector of size 1 retricts composability;

                                    instead, return a vector of the same size

                        Marvin: Good point -- it may handle things of size greater than one -- if

                                    so, then it must return a vector of the same size. But it may fault.

                                    It MUST handle a vector of size 1. If it handles a vector of size

                                    larger than one, then it would publish an attribute indicating so.

                        Michel suggests wording that the client "SHOULD" try to use a vector of size 1

                                    but the group consensus is (while the intent is good) that this is not

                                    the best wording because it could be interpreted to make clients ONLY

                                    work in terms of vectors of size 1 -- this is not the intent and might

                                    restrict clients.

 

                                    AI-HPCP-0825c: Marvin will suggest phrasing and send it to the email

                                                group for consensus. Marty to make this into a tracker.

 

                        this concluded the discussion of vectors.

           

            Marty: What about total CPU count? "non-exact" clearly has value, but it just

                        seems too complicated for interop, for compliance

            Marvin: In addition, not all schedulers implement a range, so let's make it exact.

            Marty: Section 3.2.5.8. What is "total resource count"?

            Michel: this is for "tiling" - e.g., 5x10processes.

            Marty: Let's make this exact as well. Group consensus is that this is reasonable.

            Marvin: some of these issues can arise and be resolved during implementations as well

 

            Marty: what about Chris' comments? Re: RAM... "total physical memory" is certainly easier.

            Michel: JSDL had long discussions... it comes down to requirements and capabilities.

            Marvin: this must be clarified in the HPC profile.  in our JSDL section, make these

                        appear as requirements. That is, the phrasing of the text in the JSDL section

                        of the HPC Profile should clearly indicate that they're *requirements* (not

                        capabilities). In the BES section of the HPC profile, make the text clearly

                        reflect that these are capabilities (not requirements).

            Michel: Section 3 of HPCP is clear. Section 4 needs clarification: if BES exposes this

                        via JSDL terms, then we need to clarify that these are capabilities, which

                        are static values.

            Marvin: 3.2.5.6 and 3.2.5.7 need to have clarifications changed.

                       

            Glenn: we need to restrict to integer in the total CPU count, total resource count

                        and individual CPU count

            Group consensus on this

 

                        AI-HPCP-0825d: Marvin will resolve Chris' comments by making clarifications that

                                    these are requirements. In addition, Marvin will change the document to

                                    restrict to integer the total CPU count, total resource count

                                    and individual CPU count

 

* Discuss prototyping and interop opportunities and next-steps.

---------------------------------------------------------------        

 

            The group agreed that the HPC Profile document is reasonably close to completion, so it

                        was suggested that we start developing prototypes and perform interop testing

                        to further disambiguate the document (and start developing a compliance suite)

            Michel: this document should get into public comments first, so we should wait before

                        creating implementations

            Marvin: I propose something slightly different... I propose that we put up for

                        public comment and in parallel do implementations... this is primarily

                        due to time contraints.

 

            Marty: Public comment is good, of course, but can we do anything to ensure that

                        the vendors are engaged/committed?

            Marvin: I will send it directly to Sun, Altair, Condor, Globus, ESI, etc....

                        "please please review this carefully and contribute..."

 

            Marty: if we get Marvin's changes in the BES (and "security section"), we will

                        SUSPEND development of the HPCP docment

 

                        AI-HPCP-0825e: Marty to send email to BES/JSDL/HPCP: "We are looking

                                    for prototype and interop participants. We will be start to

                                    discuss compliance suite. This will be the agenda for next week's

                                    call. first round of interop will be attempted BEFORE ggf18

                                    (exactly what? we're not sure). Two phases: phase 1: quick,

                                    phase 2: for interop after GGF18. Discussion fo extensions

                                    can be accomodated RIGHT NOW in parallel to this -- to keep

                                    the momentum going.

 

Meeting concluded. Next call: Friday Sep 1 2006 7am Pacific time.