Comments on GLUE Schema 2.0

Hi, Gabriele and I went through the GLUE document. We have following comments/questions - 1. Figure 1 GLUE main entities and their relationships: Based on the description for Endpoint, cardinality of Endpoint -> Service should be 1 and not 0..1. Maybe cardinality in UML diagram is still in works? 2. AccessPolicy: How can access policy be used to express Access control Base Rule? Additional info on this will be useful. Another question we had was, is it possible to express policies on FQANs and if so how? Is there an extension to this scheme and will it allow to express policies on all the elements of the certificate/proxies? For example, will it be possible to express a policy stating that "As a site I will not accept any jobs from users who have proxies valid for less than a day?" 3. Conceptual Model of the Computing Service: Based on the description of various attributes, cardinality for ComputingEndPoint->ComputingService, ComputingShare->ComputingService and ComputingResource->ComputingService should be 1..* instead of *. It might be useful in general to make sure that the description of different entities are consistent with the cardinality mentioned in the UML and vice-versa. 4. MPI jobs:
From the description of ApplicationEnvironment and ApplicationHandle it is not clear if there is a way to express Compiler Versions used to compile MPI libraries. MPI applications are quite sensitive to the compiler version and the architecture on which the libraries were compiled and need to know this in order to compile against correct set of libraries. We feel that there is no way to express this information. Also it is not clear how one can express connection between ApplicationEnvironment and ApplicationHandle.
5. Storage: We have a means to mention the storage reservation through the GLUE. Also it seems that most of the use cases are covered. 6. Appendix A: UNKNOWN data Is there a particular reason to have multiple UNDEFINED types? Can't we just have UNDEFINED instead of UNDEFINEDVALUE, UNDEFINEDPATH, UNDEFINEDUSER, etc? Are we really buying anything having multiple UNDEFINED types? 7. It will be useful to sort Appendix B alphabetically. -- Thanks & Regards ============================================================= Parag Mhashilkar Fermi National Accelerator Laboratory, MS 120 Wilson & Kirk Road, Batavia, IL - 60510. Location: Wilson Hall, WH863 Phone: +1 (630) 840-6530 Fax: +1 (630) 840-2783 Email: parag@fnal.gov =============================================================

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Parag Mhashilkar said: Based on the description for Endpoint, cardinality of Endpoint -> Service should be 1 and not 0..1. Maybe cardinality in UML diagram is still in works?
I agree that it looks wrong - in rare cases you could have services without endpoints, but I don't think you can have an endpoint without a service.
2. AccessPolicy: How can access policy be used to express Access control Base Rule? Additional info on this will be useful. Another question we had was, is it possible to express policies on FQANs and if so how? Is there an extension to this scheme and will it allow to express policies on all the elements of the certificate/proxies? For example, will it be possible to express a policy stating that "As a site I will not accept any jobs from users who have proxies valid for less than a day?"
You may have noticed that this is quite controversial! As things stand the only real restriction comes from the general structure of the schema, which effectively means that you have a default-deny, and then a set of "allow" rules applied independently, so if any rule matches you assume that access is allowed. There has been some discussion about adding explicit "deny" rules to override the "allow"s, but so far I don't think we have any agreement on that (it's a non-trivial thing to implement). The format of the rules themselves is extensible. Basically they are strings encoding URIs, where we have so far defined three schemes: one for explicit DNs, one for VO names and one for VOMS FQANs. For EGEE that seems to be all we need at the moment (with a rather limited wildcard extension for the FQANs). Other Grids are free to define their own schemes as long as they can fit into the overall schema structure. Basically I would suggest that you try to decide what your real use-cases are and then see if they can be satisfied. On the face of it your test on proxy validity could be difficult because it's effectively a DENY rule, but there are ways you could do it, for example extending the FQAN format to include the minimum lifetime. However, there would be a price to pay in more complex matching rules, so I'd suggest that you consider how important it really is, and whether you need to represent the general case or something more restricted.
3. Conceptual Model of the Computing Service: Based on the description of various attributes, cardinality for ComputingEndPoint->ComputingService, ComputingShare->ComputingService and ComputingResource->ComputingService should be 1..* instead of *.
That diagram looks a bit weird in general, it's not obvious what the *s are attached to, they just float in space ... but indeed there should always be a service object.
4. MPI jobs:
From the description of ApplicationEnvironment and ApplicationHandle it is not clear if there is a way to express Compiler Versions used to compile MPI libraries.
I haven't been following this, but it looks a bit odd to me too, it isn't obvious how the application is specified - there is a Name attribute, but in the rest of the schema that's just a human-readable tag and not something you query on.
Also it is not clear how one can express connection between ApplicationEnvironment and ApplicationHandle.
There is a relationship, as you can see from the diagram but currently not from the text. Basically one AE can have any number of AHs, each of which specifies a way to set up the application environment on the WN. Stephen

Hi Parag, On Friday 02 May 2008 20:24:21 Parag Mhashilkar wrote:
Gabriele and I went through the GLUE document.
Thanks for your comments; they are are all appreciated. I'm concentrating on comment 6. as Stephen has replied to the others.
We have following comments/questions [...] 6. Appendix A: UNKNOWN data Is there a particular reason to have multiple UNDEFINED types?
Yes. The overall idea is to have well-defined "unknown" values for that are specific to each attribute type (URI, integer, email address, etc...) that are both valid (so any conforming client software can parse the data) but have this precise additional semantic meaning. This is to satisfy the two specific use-cases mentioned in the Appendix, but other use-cases may well exist. There are three main reasons for having multiple "unknown" values: 1. to allow the "unknown" value to propagate within the information system. This is desirable as it prevents a site (or a service) from simply disappearing when a single attribute is "unknown". It also reduces the barriers to get something working and allows "intelligent agent" software to check for problems grid-wide (which has several advantages over deploying site-local checks). Information systems may (and many *do*) implement validation of incoming data; this requires that any "unknown" value must be valid for that attribute type. Since no simple string is valid for all attribute types, there must be more than one "unknown" value. 2. to provide a hint of the correct form of the missing data. With the first scenario (no sane default config. value), this provides a hint to the site-admin what the correct value should be; for example, if the unknown value looks like a FQDN, instead of a URI, the site-admin knows what is expected. 3. to allow a standard way of encoding of additional meaning within the unknown value. People might want to specify why a particular value is unavailable (indicating what is causing the problem) or to provide additional hints to site-admin when configuring an info-provider. Trivially, this encoding of additional information requires there to be multiple unknown values.
Can't we just have UNDEFINED instead of UNDEFINEDVALUE, UNDEFINEDPATH, UNDEFINEDUSER, etc? Are we really buying anything having multiple UNDEFINED types?
"UNDEFINED" is fine for an ASCII/UTF-8 string, but is invalid if referring to: a. an absolute path (doesn't start "/" or "\") b. a FQDN (or, at least, not very helpful. See RFC 2606 discussion on "example" FQDNs and the invalid TLD) c. an email address d. a URI e. an IPv4 (or v6) address f. an integer value g. longitude or latitude etc (we went with "UNDEFINEDVALUE" to remain consistent with "UNDEFINEDUSER", "UNDEFINEDPATH", etc.) The information system may simply reject some (or all) of the provided information as it does not validate correctly. This will lead to difficult to debug situations where it isn't clear what is wrong, only that information isn't getting through (perhaps with a baffling validation error message). Using "UNKNOWN" for a URI attribute as a specific example, some GLUE implementations might allow this value to propagate whilst others would not (c.f. URI as a datatype in SQL and XML-Schema). If the value is propagated to the client software then the observed behaviour will be implementation-specific: the URI parsers should reject the "UNKNOWN" string as invalid (as per RFC 3986). This would require client software to also handling invalid entries independent of their main code, increasing complexity of the client software. A related issue is that, for some attribute types, "UNKNOWN" is simple unrepresentable; for example, if the attribute is a counter (represented as a 32-bit integer), how does one represent the UTF-8 string "UNKNOWN" ? I hope this helps explain the motivation for Appendix A. Cheers, Paul.
participants (3)
-
Burke, S (Stephen)
-
Parag Mhashilkar
-
Paul Millar