Software publishing - TeraGrid use cases

On a GLUE2 call last week or the week before I agreed to review the currently proposed schema for publishing software information and comment on whether it addresses TeraGrid requirements and use cases. The TeraGrid has been publishing software information in information services since the summer of 2007. Our goals were to support the following use cases: 1) users can discover which compute resources offer a specific software package 2) users can discover the version, or versions, of available software packages 3) users can discover if a package is in the default login/execution environment (users or application do not need to do anything to access this software) 4) if a package is not in the default login/execution environment, how can a user or job access the software package We support the above use cases by publishing the following attributes about the software available on each compute resource: - TeraGrid standard software name - TeraGrid standard software version - In default environment (yes/no) - How to access the software: - access technique (the TeraGrid currently only uses the "softenv" technique) - access key (a key/string understood by a technique handler) Some notes: Even though we currently only use the "softenv" technique we abstracted the schema a little so we could eventually support other techniques (i.e. "modules", "path", ..) A software component may include multiple binaries, have man page directories, have various library directories, have multiple binary directories (bin/, sbin/, etc), be parallel/non-parallel, threaded/non-threaded, scripted/compiled, and have other arbitrary but relevant user information. We chose not to design or implement a schema complex enough to communicate all this information, but instead expect users will discover such details thru other methods. What we offer is a way for users/jobs to request that a specific piece of software be available in their login or execution environment, and will set all the appropriate environment variables for them to make that software component available (using the access technique and key listed above). Example: User wants to use "mpich-gm" and must know ahead of time how to compile (mpicc, mpicxx, mpif77, mpif90, etc) and run (mpirun) 1) a single info service query can return the compute resources that offer "mpich-gm", which version(s), the access technique and key for each available "mpich-gm", and the endpoint of the execution service where each "mpich-gm" is available 2 the user then submits jobs to the endpoint with the information: request software "softenv:<an-mpich-gm-key>" mpirun <myprogram> <arguments> The softenv handler will make sure the libraries, paths, and anything else needed by the application are configured correctly. The user doesn't need to know the mpirun binary path because the correct mpirun will be first in their PATH (and LD_LIBRARY_PATH and other variables will be set also) Comparing the proposed GLUE2 ApplicationEnvironment entity attributes: ID Name Version State License LifeTime InstalledRoot EnvironmentSetup Description The attributes that would directly map to TeraGrid attributes, or be generated automatically include: ID Name Version Description For the TeraGrid the "EnvironmentSetup" attribute is a tuple "EnvironmentSetupMethod" and "EnvironmentSetupKey". This could be represented as a compound value inside "EnvironmentSetup" (i.e. softenv:<key>), but it would be better if the schema had the attributes separately. We propose "EnvironmentSetupMethod" and "EnvironmentSetupKey". We currently have no need for License, LifeTime, or InstalledRoot, so would suggest that these attributes all be OPTIONAL. Lastly the TeraGrid's "in default environment" attribute doesn't map to any proposed GLUE2 attributes. It could be added as an OPTIONAL attribute, or the TeraGrid could add it as a local extension in our implementation of GLUE2. Regards, JP

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of JP Navarro said: Lastly the TeraGrid's "in default environment" attribute doesn't map to any proposed GLUE2 attributes. It could be added as an OPTIONAL attribute, or the TeraGrid could add it as a local extension in our implementation of GLUE2.
You could also treat it as another EnvironmentSetupMethod, e.g. Method=default means that no explicit setup is needed. Stephen

Hi JP, thanks for the detailed feedback. For the proposed changes: ok to split EnvironmentSetup in: 1. SetupMethod, enumeration: softenv, ... (need to investigate other methods/values) 2. Setup[Key/Script]: need to check what this can contains according to the method questions: - can you better clarify what is it a Key in your system? <an-mpich-gm-key> - does the Stephen proposal of having a default value for SetupMethod is a good replacement for the In Default attribute need? Cheers, Sergio JP Navarro ha scritto:
On a GLUE2 call last week or the week before I agreed to review the currently proposed schema for publishing software information and comment on whether it addresses TeraGrid requirements and use cases.
The TeraGrid has been publishing software information in information services since the summer of 2007. Our goals were to support the following use cases:
1) users can discover which compute resources offer a specific software package 2) users can discover the version, or versions, of available software packages 3) users can discover if a package is in the default login/execution environment (users or application do not need to do anything to access this software) 4) if a package is not in the default login/execution environment, how can a user or job access the software package
We support the above use cases by publishing the following attributes about the software available on each compute resource: - TeraGrid standard software name - TeraGrid standard software version - In default environment (yes/no) - How to access the software: - access technique (the TeraGrid currently only uses the "softenv" technique) - access key (a key/string understood by a technique handler)
Some notes:
Even though we currently only use the "softenv" technique we abstracted the schema a little so we could eventually support other techniques (i.e. "modules", "path", ..)
A software component may include multiple binaries, have man page directories, have various library directories, have multiple binary directories (bin/, sbin/, etc), be parallel/non-parallel, threaded/non-threaded, scripted/compiled, and have other arbitrary but relevant user information. We chose not to design or implement a schema complex enough to communicate all this information, but instead expect users will discover such details thru other methods. What we offer is a way for users/jobs to request that a specific piece of software be available in their login or execution environment, and will set all the appropriate environment variables for them to make that software component available (using the access technique and key listed above).
Example: User wants to use "mpich-gm" and must know ahead of time how to compile (mpicc, mpicxx, mpif77, mpif90, etc) and run (mpirun)
1) a single info service query can return the compute resources that offer "mpich-gm", which version(s), the access technique and key for each available "mpich-gm", and the endpoint of the execution service where each "mpich-gm" is available
2 the user then submits jobs to the endpoint with the information: request software "softenv:<an-mpich-gm-key>" mpirun <myprogram> <arguments>
The softenv handler will make sure the libraries, paths, and anything else needed by the application are configured correctly. The user doesn't need to know the mpirun binary path because the correct mpirun will be first in their PATH (and LD_LIBRARY_PATH and other variables will be set also)
Comparing the proposed GLUE2 ApplicationEnvironment entity attributes: ID Name Version State License LifeTime InstalledRoot EnvironmentSetup Description
The attributes that would directly map to TeraGrid attributes, or be generated automatically include: ID Name Version Description
For the TeraGrid the "EnvironmentSetup" attribute is a tuple "EnvironmentSetupMethod" and "EnvironmentSetupKey". This could be represented as a compound value inside "EnvironmentSetup" (i.e. softenv:<key>), but it would be better if the schema had the attributes separately. We propose "EnvironmentSetupMethod" and "EnvironmentSetupKey".
We currently have no need for License, LifeTime, or InstalledRoot, so would suggest that these attributes all be OPTIONAL.
Lastly the TeraGrid's "in default environment" attribute doesn't map to any proposed GLUE2 attributes. It could be added as an OPTIONAL attribute, or the TeraGrid could add it as a local extension in our implementation of GLUE2.
Regards,
JP
_______________________________________________ glue-wg mailing list glue-wg@ogf.org http://www.ogf.org/mailman/listinfo/glue-wg
-- Sergio Andreozzi INFN-CNAF, Tel: +39 051 609 2860 Viale Berti Pichat, 6/2 Fax: +39 051 609 2746 40126 Bologna (Italy) Web: http://www.cnaf.infn.it/~andreozzi

On Feb 19, 2008, at 6:08 AM, Sergio Andreozzi wrote:
Hi JP,
thanks for the detailed feedback. For the proposed changes:
ok to split EnvironmentSetup in:
1. SetupMethod, enumeration: softenv, ... (need to investigate other methods/values) 2. Setup[Key/Script]: need to check what this can contains according to the method
questions: - can you better clarify what is it a Key in your system? <an-mpich- gm-key>
Some examples, including an mpich-gm one: +java-sun-1.4.2 +java-sun-1.5.0 +gcc-3.4.3 +mvapich-0.9.8 +mpich-gm-1.2.5..10-intel
- does the Stephen proposal of having a default value for SetupMethod is a good replacement for the In Default attribute need?
Yes, Stephen's proposal would work great. Thanks, JP
Cheers, Sergio
JP Navarro ha scritto:
On a GLUE2 call last week or the week before I agreed to review the currently proposed schema for publishing software information and comment on whether it addresses TeraGrid requirements and use cases.
The TeraGrid has been publishing software information in information services since the summer of 2007. Our goals were to support the following use cases:
1) users can discover which compute resources offer a specific software package 2) users can discover the version, or versions, of available software packages 3) users can discover if a package is in the default login/ execution environment (users or application do not need to do anything to access this software) 4) if a package is not in the default login/execution environment, how can a user or job access the software package
We support the above use cases by publishing the following attributes about the software available on each compute resource: - TeraGrid standard software name - TeraGrid standard software version - In default environment (yes/no) - How to access the software: - access technique (the TeraGrid currently only uses the "softenv" technique) - access key (a key/string understood by a technique handler)
Some notes:
Even though we currently only use the "softenv" technique we abstracted the schema a little so we could eventually support other techniques (i.e. "modules", "path", ..)
A software component may include multiple binaries, have man page directories, have various library directories, have multiple binary directories (bin/, sbin/, etc), be parallel/non-parallel, threaded/non-threaded, scripted/ compiled, and have other arbitrary but relevant user information. We chose not to design or implement a schema complex enough to communicate all this information, but instead expect users will discover such details thru other methods. What we offer is a way for users/jobs to request that a specific piece of software be available in their login or execution environment, and will set all the appropriate environment variables for them to make that software component available (using the access technique and key listed above).
Example: User wants to use "mpich-gm" and must know ahead of time how to compile (mpicc, mpicxx, mpif77, mpif90, etc) and run (mpirun)
1) a single info service query can return the compute resources that offer "mpich-gm", which version(s), the access technique and key for each available "mpich-gm", and the endpoint of the execution service where each "mpich-gm" is available
2 the user then submits jobs to the endpoint with the information: request software "softenv:<an-mpich-gm-key>" mpirun <myprogram> <arguments>
The softenv handler will make sure the libraries, paths, and anything else needed by the application are configured correctly. The user doesn't need to know the mpirun binary path because the correct mpirun will be first in their PATH (and LD_LIBRARY_PATH and other variables will be set also)
Comparing the proposed GLUE2 ApplicationEnvironment entity attributes: ID Name Version State License LifeTime InstalledRoot EnvironmentSetup Description
The attributes that would directly map to TeraGrid attributes, or be generated automatically include: ID Name Version Description
For the TeraGrid the "EnvironmentSetup" attribute is a tuple "EnvironmentSetupMethod" and "EnvironmentSetupKey". This could be represented as a compound value inside "EnvironmentSetup" (i.e. softenv:<key>), but it would be better if the schema had the attributes separately. We propose "EnvironmentSetupMethod" and "EnvironmentSetupKey".
We currently have no need for License, LifeTime, or InstalledRoot, so would suggest that these attributes all be OPTIONAL.
Lastly the TeraGrid's "in default environment" attribute doesn't map to any proposed GLUE2 attributes. It could be added as an OPTIONAL attribute, or the TeraGrid could add it as a local extension in our implementation of GLUE2.
Regards,
JP
_______________________________________________ glue-wg mailing list glue-wg@ogf.org http://www.ogf.org/mailman/listinfo/glue-wg
-- Sergio Andreozzi INFN-CNAF, Tel: +39 051 609 2860 Viale Berti Pichat, 6/2 Fax: +39 051 609 2746 40126 Bologna (Italy) Web: http://www.cnaf.infn.it/~andreozzi
participants (3)
-
Burke, S (Stephen)
-
JP Navarro
-
Sergio Andreozzi