
On a GLUE2 call last week or the week before I agreed to review the currently proposed schema for publishing software information and comment on whether it addresses TeraGrid requirements and use cases. The TeraGrid has been publishing software information in information services since the summer of 2007. Our goals were to support the following use cases: 1) users can discover which compute resources offer a specific software package 2) users can discover the version, or versions, of available software packages 3) users can discover if a package is in the default login/execution environment (users or application do not need to do anything to access this software) 4) if a package is not in the default login/execution environment, how can a user or job access the software package We support the above use cases by publishing the following attributes about the software available on each compute resource: - TeraGrid standard software name - TeraGrid standard software version - In default environment (yes/no) - How to access the software: - access technique (the TeraGrid currently only uses the "softenv" technique) - access key (a key/string understood by a technique handler) Some notes: Even though we currently only use the "softenv" technique we abstracted the schema a little so we could eventually support other techniques (i.e. "modules", "path", ..) A software component may include multiple binaries, have man page directories, have various library directories, have multiple binary directories (bin/, sbin/, etc), be parallel/non-parallel, threaded/non-threaded, scripted/compiled, and have other arbitrary but relevant user information. We chose not to design or implement a schema complex enough to communicate all this information, but instead expect users will discover such details thru other methods. What we offer is a way for users/jobs to request that a specific piece of software be available in their login or execution environment, and will set all the appropriate environment variables for them to make that software component available (using the access technique and key listed above). Example: User wants to use "mpich-gm" and must know ahead of time how to compile (mpicc, mpicxx, mpif77, mpif90, etc) and run (mpirun) 1) a single info service query can return the compute resources that offer "mpich-gm", which version(s), the access technique and key for each available "mpich-gm", and the endpoint of the execution service where each "mpich-gm" is available 2 the user then submits jobs to the endpoint with the information: request software "softenv:<an-mpich-gm-key>" mpirun <myprogram> <arguments> The softenv handler will make sure the libraries, paths, and anything else needed by the application are configured correctly. The user doesn't need to know the mpirun binary path because the correct mpirun will be first in their PATH (and LD_LIBRARY_PATH and other variables will be set also) Comparing the proposed GLUE2 ApplicationEnvironment entity attributes: ID Name Version State License LifeTime InstalledRoot EnvironmentSetup Description The attributes that would directly map to TeraGrid attributes, or be generated automatically include: ID Name Version Description For the TeraGrid the "EnvironmentSetup" attribute is a tuple "EnvironmentSetupMethod" and "EnvironmentSetupKey". This could be represented as a compound value inside "EnvironmentSetup" (i.e. softenv:<key>), but it would be better if the schema had the attributes separately. We propose "EnvironmentSetupMethod" and "EnvironmentSetupKey". We currently have no need for License, LifeTime, or InstalledRoot, so would suggest that these attributes all be OPTIONAL. Lastly the TeraGrid's "in default environment" attribute doesn't map to any proposed GLUE2 attributes. It could be added as an OPTIONAL attribute, or the TeraGrid could add it as a local extension in our implementation of GLUE2. Regards, JP