Updated version of appendix D.

Hi all, I've tried to incorporate the comments from the meeting in the new version of Appendix D. Specifically: o included Stephen's ideas on how to embed additional information, o updated the section on whether metrics are required or not (para 2 of intro), o added the initial slash to the filepath, o included multiple examples, o removed the section on counting numbers, o added a new section on email addresses (previously missing), o tidied up the sections on URIs and DNs, o changed the integer section so it's using "all nines" instead (inc. a mention of Benford's law). Please let me know if I've missed anything. Cheers, Paul. --- Appendix D : place-holder values for unknown data (v1.0) ---- Introduction --- Whilst people endeavour to provide accurate information, there may be situations where specific GLUE values may be assigned place-holder (or dummy) values. These place-holder values carry some additional semantic meaning; specifically, that the correct value is currently unknown and the presented value should be ignored. This appendix describes a recommended set of place-holder values to use. Some metrics within the GLUE schema are required whilst others are optional. If the metric is optional and the corresponding information is unavailable, the information provider may choose to publish a place-holder or it may choose not to publish the metric. If the metric is required, then the information must either publish a place-holder value or refrain from publishing the GLUE object. If a place-holder value is published, it must conform to the scheme described in this appendix. This is to increase the likelihood that software will understand the nature of the information it receives. To avoid confusion, these place-holder values have be chosen so they are obvious "wrong" to humans, unlikely to occur under normal operation and valid within the metric type. This also allows for detection of failing information provider components. Use-cases: --- There are two principle use-cases for place-holder values, although others may exist. Scenario 1. a static value has no good default value and has not been configured for a particular site. Some provisions for GLUE Schema provide templates. These templates may contain static values that have no good default value; for example, a value may require some detailed knowledge of a site. Whilst there may be the expectation that value be configured it is possible that this did not happen, so exposing the application's default configuration. Scenario 2. information provider is unable to obtain a dynamic value. A dynamic value is provided by an information provider by querying the underlying grid resources. This query will use a number of ancillary resources (e.g., DNS, network hardware) that might fail; the grid services might also fail. If a metric is required and the current value is unobtainable, a place-holder must be used. Place-holder values: --- This section describes a number of values that can be represented within a given address space (e.g., UTF-8, Integers, FQDNs, IPv4 address space). Each of the different types are introduced along with the proposed value and a brief discussion on the rational and any other considerations. 1. Simple strings (ASCII/UTF-8) should use "UNDEFINEDVALUE" or should start "UNDEFINEDVALUE:" Upper-case letters make it easier to spot and a single word avoids any white-space issues. A short error message can be incorporated into the message by appending the message after the colon. Examples: UNDEFINEDVALUE UNDEFINEDVALUE: Unable to contact torque daemon. Using UNDEFINEDVALUE is a default option for strings that have no widely-known structure. If a value is of a more restrictive sub-type (e.g., FQDNs, FQANs) described below, then the rules for more restrictive form must be used. 2. Fully qualified domain names: must use a hostname ending either "example.org" for scenario 1, or "invalid" for scenario 2. RFC 2606 reserves the "invalid" Top-Level-Domain (TLD) as always invalid and clearly so. For dynamic information gathering, a value ending "invalid" must be used. It is recommended that this is "unknown.invalid" be used unless the class of machine is known. RFC 2606 also defines two second-level domains: "example.org" and "example.com". These domains have the advantage of ending with a recognisable TLD, so are recognisable as a DNS name. Default configuration (scenario 1, above) must use DNS names that end "example.com" or "example.org" Additional information can be included by specifying a prefix to the more broad part; for example, "your-CE" can be appended to "example.org" in a configuration file to form "your-CE.example.org". This may be used to specify the class of machine that should be present. Examples: www.example.org your-CE.example.org unknown.invalid site-local-BDII.invalid 3. IPv4 addr: should use 192.0.2.250 There are several portions of IPv4 addresses that should not appear on a network, but none that are reserved for documentation or to specify a non-existent address. Using any address leads to the risk of side-effects, should this value be used. The best option is an IP address from the 192.0.2.0/24 subnet. This subnet is defined in RFC 3330 as "TEST-NET" for use in documentation and example code. For consistency, the value 192.0.2.250 must be used. 5. IPv6 addr: should use 2001:DB8::FFFF There is no documented undefined IPv6 address. RFC 3849 reserves the address prefix 2001:DB8::/32 for documentation. For consistency, the address 2001:DB8::FFFF must be used. 6. Integers: must use "all nines" For uint32/int32 this is 999,999,999 " int64/int64 this is 999,999,999,999,999,999 For integers, all numbers expressible within the encoding (int32/uint32/etc...) are valid so there is no safe choice. If an unsigned integer is encoded as a signed integer, it is possible to use negative numbers safely. However, these numbers will be unrepresentable if the number is stored as an unsigned integer. For this reason a negative number place-holder must not be used. The number was chosen for three reasons. First, metric scales are often chosen to reduce the likelihood of overflow: numbers towards MAXINT (the large number representable in an integer domain) are less likely to appear. Second, repeated numbers stand out more clearly to humans. Finally, the statistical frequency of measured values often follows Benford's law, which indicates that numbers starting with "1" occur far more frequently than those starting with "9" (about six times the probability). For these reasons, information providers must use all-nines to indicate an unknown value. 7. Filepath: must start either "/UNDEFINEDPATH" or "\UNDEFINEDPATH". As with the simple string, a single upper-case word is recommended. The initial slash indicates that the value is a path. Implementations must use whichever slash is most appropriate for the corresponding system (Unix-like systems use a forward-slash). Software should accept either value as an unknown-value place-holder. Additional information can be encoded as data beyond the initial UNDEFINEDPATH, separated by the same slash as started the value. Additional comments should not use any of the following characters: \ [ ] ; = " \ : | , * . Examples: /UNDEFINEDPATH \UNDEFINEDPATH /UNDEFINEDPATH/Broker unavailable 8. Email addresses: must use an undefined FQDN for the domain. RFC 2822 defines emails addresses to have the form: <local-part> '@' <domain> The <domain> must be an undefined FQDN; see above for a complete description. For email addresses, information providers should use "example.org" for scenario 1. and "unknown.invalid" for scenario 2. The <local-part> may be used to encode a small amount of additional information, for example, the class of user to whom the email address should be delivered. If no such information is to be encoded the value "user" should be used. Examples: site-local-contact@example.org local-admin@example.org user@unknown.invalid 9. Uniform Resource Identifier (URI): schema-specific RFC 3986 defines URIs as a "federated and extensible naming system." All URIs start with a schema-name part and no schema-name has been reserved for undefined or documenting example values. For any given URI schema ("http", for example), it may be possible to define an undefined value within that name-space. If a GLUE value has only one valid schema, the undefined value must be taken from that schema. If several schemata are possible, one must be chosen from the available options, which should be the most commonly used. Take care with the URI encoding. All unknown URI values must be valid URIs. If additional information is included, it must be encoded so the resulting URI is valid. For schemata that include a FQDN (e.g., a reference to an Internet host), an undefined URI must use an undefined FQDN; see above for details on undefined FQDNs. URI schemata that reference a remote file (e.g., "http", "https"), additional information may be included as the path. The FQDN indicates that the value is a place-holder, indicating an unknown value, so information providers need not specify "UNDEFINEDPATH". For "file" URIs, the path part must identify the value as unknown and must use the forward-slash variant; see above for details on undefined paths. For "mailto" URIs [RFC 2368] encapsulates valid email addresses with additional information (such as email headers and message body). Unknown mailto URIs must use an unknown email address (see above). Any additional information must be included in the email body. There may be other schemata in use that are not explicitly covered in this section. A place-holder value should be agreed upon within whichever domain such schemata are used. This place-holder value should be in the spirit of the place-holder values described so far. Examples: http://www.example.org/ httpg://your-CE.example.org/path/to/end-point mailto:site-admin@example.org mailto:user@maildomain.invalid?body=Problem%20connecting%20to%20RB file:///UNDEFINEDPATH file:///UNDEFINEDPATH/path%20to%20some%20directory 10. X509 Distinguished Names: must include a RDN of CN=UNDEFINEDUSER X509 uses a X500 namespace, represented as several Relative Domain-Names (RDNs) concatenated by forward-slashes. The final RDN is usually a single common name (CN), although multiple CNs are allowed. Unknown DN values must have at least two entries: an initial O=Grid followed immediately by CN=UNDEFINEDUSER. Additional information can be encoded using extra CN entries. These must come after CN=UNDEFINEDUSER. Examples: /O=Grid/CN=UNDEFINEDUSE /O=Grid/CN=UNDEFINEDUSER/CN=Your Grid certificate DN here /O=Grid/CN=UNDEFINEDUSER/CN=Cannot access SE Definition of words: --- The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are used deliberately and take their meaning from RFC 2119. A brief summary is given here. 1. MUST (or "REQUIRED") means that no deviation is allowed from conforming software. 2. MUST NOT means complete prohibition of this behaviour with conforming software. 3. SHOULD (or "RECOMMENDED") means that there may be reasons why conforming software does not to adopt this behaviour, but all the effects of an alternative behaviour must be understood and considered before choosing a different course. 4. SHOULD NOT (or "NOT RECOMMENDED") means that there may be reasons why conforming software adopts this behaviour, but all the effects of an alternative behaviour must be understood and considered before choosing a different course. 5. MAY (or "OPTIONAL") means an item is completely optional.

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Paul Millar said: Some metrics within the GLUE schema are required whilst others are optional.
One minor thing, "metrics" is not really right (they aren't necessarily measuring anything), I think "attributes" is a better term.
Examples: /O=Grid/CN=UNDEFINEDUSE /O=Grid/CN=UNDEFINEDUSER/CN=Your Grid certificate DN here /O=Grid/CN=UNDEFINEDUSER/CN=Cannot access SE
R missing for the first one. Stephen

On Thursday 24 January 2008 14:50:38 Burke, S (Stephen) wrote:
One minor thing, "metrics" is not really right (they aren't necessarily measuring anything), I think "attributes" is a better term.
Thanks. I've replaced all "metric"s with "attribute"s.
Examples: /O=Grid/CN=UNDEFINEDUSE /O=Grid/CN=UNDEFINEDUSER/CN=Your Grid certificate DN here /O=Grid/CN=UNDEFINEDUSER/CN=Cannot access SE
R missing for the first one.
Thanks. That should be fixed now. Also fixed the numbering and tidied up some loose wording. Do we want to consider floating-point numbers, too? Cheers, Paul. Appendix D : place-holder values for unknown data (v1.1) ---- Introduction --- Whilst people endeavour to provide accurate information, there may be situations where specific GLUE values may be assigned place-holder (or dummy) values. These place-holder values carry some additional semantic meaning; specifically, that the correct value is currently unknown and the presented value should be ignored. This appendix describes a recommended set of place-holder values to use. Some attributes within the GLUE schema are required whilst others are optional. If the attribute is optional and the corresponding information is unavailable, the information provider must either publish a place-holder or not to publish the attribute. If the attribute is required, then the information must either publish a place-holder value or refrain from publishing the GLUE object. If a place-holder value is published, it must conform to the scheme described in this appendix. This is to increase the likelihood that software will understand the nature of the information it receives. This appendix describes place-holder values that have be chosen so they are obvious "wrong" to humans, unlikely to occur under normal operation and valid within the attribute type. This also allows for detection of failing information provider components. Use-cases: --- There are two principle use-cases for place-holder values, although others may exist. Scenario 1. a static value has no good default value and has not been configured for a particular site. Some provisions for GLUE Schema provide templates. These templates may contain static values that have no good default value; for example, a value may require some detailed knowledge of a site. Whilst there may be the expectation that value be configured it is possible that this did not happen, so exposing the attribute's default value. Scenario 2. information provider is unable to obtain a dynamic value. A dynamic value is provided by an information provider by querying the underlying grid resources. This query will use a number of ancillary resources (e.g., DNS, network hardware) that might fail; the grid services might also fail. If an attribute is required and the current value is unobtainable, a place-holder must be used. Place-holder values: --- This section describes a number of values that can be represented within a given address space (e.g., Strings/UTF-8, Integers, FQDNs, IPv4 address space). Each of the different types are introduced along with the place-holder value and a brief discussion on usage, rational and any other considerations. 1. Simple strings (ASCII/UTF-8) should use "UNDEFINEDVALUE" or should start "UNDEFINEDVALUE:" Upper-case letters make it easier to spot and a single word avoids any white-space issues. A short error message can be incorporated into the message by appending the message after the colon. Examples: UNDEFINEDVALUE UNDEFINEDVALUE: Unable to contact torque daemon. Using UNDEFINEDVALUE is a default option for strings that have no widely-known structure. If a value is of a more restrictive sub-type (e.g., FQDNs, URIs) described below, then the rules for more restrictive form must be used. 2. Fully qualified domain names: must use a hostname ending either "example.org" for scenario 1, or "invalid" for scenario 2. RFC 2606 defines two second-level domains: "example.org" and "example.com". These domains have the advantage of ending with a recognisable TLD, so are recognisable as a DNS name. Default configuration (scenario 1, above) must use DNS names that end "example.org" RFC 2606 also reserves the "invalid" Top-Level-Domain (TLD) as always invalid and clearly so. For dynamic information gathering, a value ending "invalid" must be used. In both cases, additional information may be included by specifying a prefix to "example.org" or "invalid". This may be used to specify the class of machine that should be present. For dynamic infomation, if the class of machine is not published then the FQDN "unknown.invalid" must be used. Examples: www.example.org your-CE.example.org unknown.invalid site-local-BDII.invalid 3. IPv4 addr: must use 192.0.2.250 There are several portions of IPv4 addresses that should not appear on a network, but none that are reserved for documentation or to specify a non-existent address. Using any address leads to the risk of side-effects, should this value be used. The best option is an IP address from the 192.0.2.0/24 subnet. This subnet is defined in RFC 3330 as "TEST-NET" for use in documentation and example code. For consistency, the value 192.0.2.250 must be used. 4. IPv6 addr: must use 2001:DB8::FFFF There is no documented undefined IPv6 address. RFC 3849 reserves the address prefix 2001:DB8::/32 for documentation. For consistency, the address 2001:DB8::FFFF must be used. 5. Integers: must use "all nines" For uint32/int32 this is 999,999,999 " int64/int64 this is 999,999,999,999,999,999 For integers, all numbers expressible within the encoding (int32/uint32/etc.) are valid so there is no safe choice. If an unsigned integer is encoded as a signed integer, it is possible to use negative numbers safely. However, these numbers will be unrepresentable if the number is stored as an unsigned integer. For this reason a negative number place-holder must not be used. The number was chosen for three reasons. First, attribute scales are often chosen to reduce the likelihood of overflow: numbers towards MAXINT (the large number representable in an integer domain) are less likely to appear. Second, repeated numbers stand out more clearly to humans. Finally, the statistical frequency of measured values often follows Benford's law, which indicates that numbers starting with "1" occur far more frequently than those starting with "9" (about six times more likely). For these reasons, information providers must use all-nines to indicate an unknown value. 6. Filepath: must start either "/UNDEFINEDPATH" or "\UNDEFINEDPATH". As with the simple string, a single upper-case word is recommended. The initial slash indicates that the value is a path. Implementations must use whichever slash is most appropriate for the underlying system (Unix-like systems use a forward-slash). Software should accept either value as an unknown-value place-holder. Additional information can be encoded as data beyond the initial UNDEFINEDPATH, separated by the same slash as started the value. Additional comments should not use any of the following characters: \ [ ] ; = " \ : | , * . Examples: /UNDEFINEDPATH \UNDEFINEDPATH /UNDEFINEDPATH/Broker unavailable 7. Email addresses: must use an undefined FQDN for the domain. RFC 2822 defines emails addresses to have the form: <local-part> '@' <domain> The <domain> must be an undefined FQDN; see above for a complete description. For email addresses, information providers should use "example.org" for scenario 1. and "unknown.invalid" for scenario 2. The <local-part> may be used to encode a small amount of additional information; for example, it may indicate the class of user to whom the email address should be delivered. If no such information is to be encoded the value "user" must be used. Examples: user@example.org user@unknown.invalid site-local-contact@example.org local-admin@example.org 8. Uniform Resource Identifier (URI): schema-specific RFC 3986 defines URIs as a "federated and extensible naming system." All URIs start with a schema-name part (e.g., "http") and no schema-name has been reserved for undefined or documenting example values. For any given URI schema ("http", for example), it may be possible to define an unknown value within that name-space. If a GLUE value has only one valid schema, the undefined value must be taken from that schema. If several schemata are possible, one must be chosen from the available options. This should be the most commonly used. Take care with the URI encoding. All unknown URI values must be valid URIs. If additional information is included, it must be encoded so the resulting URI is valid. For schemata that may include a FQDN (e.g., a reference to an Internet host), an undefined URI must use an undefined FQDN; see above for details on undefined FQDNs. URI schemata that reference a remote file (e.g., "http", "ftp", "https"), additional information may be included as the path. The FQDN indicates that the value is a place-holder, indicating an unknown value, so information providers should not specify "UNDEFINEDPATH". For "file" URIs, the path part must identify the value as unknown and must use the forward-slash variant; see above for details on undefined paths. For "mailto" URIs [RFC 2368] encapsulates valid email addresses with additional information (such as email headers and message body). Unknown mailto URIs must use an unknown email address (see above). Any additional information must be included in the email body. There may be other schemata in use that are not explicitly covered in this section. A place-holder value should be agreed upon within whichever domain such schemata are used. This place-holder value should be in the spirit of the place-holder values described so far. Examples: http://www.example.org/ httpg://your-CE.example.org/path/to/end-point httpg://unknown.invalid/User%20certificate%20has%20expired mailto:site-admin@example.org mailto:user@maildomain.invalid?body=Problem%20connecting%20to%20WLMS file:///UNDEFINEDPATH file:///UNDEFINEDPATH/path%20to%20some%20directory 9. X509 Distinguished Names: must include a RDN of CN=UNDEFINEDUSER X509 uses a X500 namespace, represented as several Relative Domain-Names (RDNs) concatenated by forward-slashes. The final RDN is usually a single common name (CN), although multiple CNs are allowed. Unknown DN values must have at least two entries: an initial O=Grid followed immediately by CN=UNDEFINEDUSER. Additional information can be encoded using extra CN entries. These must come after CN=UNDEFINEDUSER. Examples: /O=Grid/CN=UNDEFINEDUSER /O=Grid/CN=UNDEFINEDUSER/CN=Your Grid certificate DN here /O=Grid/CN=UNDEFINEDUSER/CN=Cannot access SE Definition of words: --- The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are used deliberately and take their meaning from RFC 2119. A brief summary is given here. 1. MUST (or "REQUIRED") means that no deviation is allowed from conforming software. 2. MUST NOT means complete prohibition of this behaviour with conforming software. 3. SHOULD (or "RECOMMENDED") means that there may be reasons why conforming software does not to adopt this behaviour, but all the effects of an alternative behaviour must be understood and considered before choosing a different course. 4. SHOULD NOT (or "NOT RECOMMENDED") means that there may be reasons why conforming software adopts this behaviour, but all the effects of an alternative behaviour must be understood and considered before choosing a different course. 5. MAY (or "OPTIONAL") means an item is completely optional.

Paul Millar [mailto:paul.millar@desy.de] said:
Do we want to consider floating-point numbers, too?
The only ones I can think of off-hand are the latitude and longitude, but they possibly need special treatment anyway ... you might perhaps have a percentage as either float or int, but I don't think we actually have at the moment. Generic floating-point, i.e. with an exponent, is probably unlikely, and for other things we usually choose units small enough to use ints. Stephen

On Thursday 24 January 2008 15:55:13 Burke, S (Stephen) wrote:
Paul Millar [mailto:paul.millar@desy.de] said:
Do we want to consider floating-point numbers, too?
The only ones I can think of off-hand are the latitude and longitude, but they possibly need special treatment anyway ... you might perhaps have a percentage as either float or int, but I don't think we actually have at the moment. Generic floating-point, i.e. with an exponent, is probably unlikely, and for other things we usually choose units small enough to use ints.
OK, we should probably leave floating-point numbers for now. For longitude/latitude, should we use (0,0)? This is located off the west cost of Africa: http://www.mapquest.com/maps/map.adp?latlongtype=decimal&latitude=0&longitude=0 It would seem a safe bet that no one will build a Grid site there. Would it be OK to add this for longitude,latitude attributes? Cheers, Paul.

On Friday 25 January 2008 15:31:57 Burke, S (Stephen) wrote:
Paul Millar [mailto:paul.millar@desy.de] said:
For longitude/latitude, should we use (0,0)? This is located off the west cost of Africa:
Yes, I think that's what we used up to now. (Amazingly it seems that no-one is publishing it right now!)
I was taken aback by that, too! The latest version is copied below. I've added a section for Geographic locations. I've also taken the liberty to include a FQAN section (should people decide to publish these). Cheers, Paul. Appendix D : place-holder values for unknown data (v1.2) ---- Introduction --- Whilst people endeavour to provide accurate information, there may be situations where specific GLUE values may be assigned place-holder (or dummy) values. These place-holder values carry some additional semantic meaning; specifically, that the correct value is currently unknown and the presented value should be ignored. This appendix describes a recommended set of place-holder values to use. Some attributes within the GLUE schema are required whilst others are optional. If the attribute is optional and the corresponding information is unavailable, the information provider must either publish a place-holder or not to publish the attribute. If the attribute is required, then the information must either publish a place-holder value or refrain from publishing the GLUE object. If a place-holder value is published, it must conform to the scheme described in this appendix. This is to increase the likelihood that software will understand the nature of the information it receives. This appendix describes place-holder values that have be chosen so they are obvious "wrong" to humans, unlikely to occur under normal operation and valid within the attribute type. This also allows for detection of failing information provider components. Use-cases: --- There are two principle use-cases for place-holder values, although others may exist. Scenario 1. a static value has no good default value and has not been configured for a particular site. Some provisions for GLUE Schema provide templates. These templates may contain static values that have no good default value; for example, a value may require some detailed knowledge of a site. Whilst there may be the expectation that value be configured it is possible that this did not happen, so exposing the attribute's default value. Scenario 2. information provider is unable to obtain a dynamic value. A dynamic value is provided by an information provider by querying the underlying grid resources. This query will use a number of ancillary resources (e.g., DNS, network hardware) that might fail; the grid services might also fail. If an attribute is required and the current value is unobtainable, a place-holder must be used. Place-holder values: --- This section describes a number of values that can be represented within a given address space (e.g., Strings/UTF-8, Integers, FQDNs, IPv4 address space). Each of the different types are introduced along with the place-holder value and a brief discussion on usage, rational and any other considerations. 1. Simple strings (ASCII/UTF-8) should use "UNDEFINEDVALUE" or should start "UNDEFINEDVALUE:" Upper-case letters make it easier to spot and a single word avoids any white-space issues. A short error message can be incorporated into the message by appending the message after the colon. Examples: UNDEFINEDVALUE UNDEFINEDVALUE: Unable to contact torque daemon. Using UNDEFINEDVALUE is a default option for strings that have no widely-known structure. If a value is of a more restrictive sub-type (e.g., FQDNs, URIs) described below, then the rules for more restrictive form must be used. 2. Fully qualified domain names: must use a hostname ending either "example.org" for scenario 1, or "invalid" for scenario 2. RFC 2606 defines two second-level domains: "example.org" and "example.com". These domains have the advantage of ending with a recognisable TLD, so are recognisable as a DNS name. Default configuration (scenario 1, above) must use DNS names that end "example.org" RFC 2606 also reserves the "invalid" Top-Level-Domain (TLD) as always invalid and clearly so. For dynamic information gathering, a value ending "invalid" must be used. In both cases, additional information may be included by specifying a prefix to "example.org" or "invalid". This may be used to specify the class of machine that should be present. For dynamic infomation, if the class of machine is not published then the FQDN "unknown.invalid" must be used. Examples: www.example.org your-CE.example.org unknown.invalid site-local-BDII.invalid 3. IPv4 addr: must use 192.0.2.250 There are several portions of IPv4 addresses that should not appear on a network, but none that are reserved for documentation or to specify a non-existent address. Using any address leads to the risk of side-effects, should this value be used. The best option is an IP address from the 192.0.2.0/24 subnet. This subnet is defined in RFC 3330 as "TEST-NET" for use in documentation and example code. For consistency, the value 192.0.2.250 must be used. 4. IPv6 addr: must use 2001:DB8::FFFF There is no documented undefined IPv6 address. RFC 3849 reserves the address prefix 2001:DB8::/32 for documentation. For consistency, the address 2001:DB8::FFFF must be used. 5. Integers: must use "all nines" For uint32/int32 this is 999,999,999 " int64/int64 this is 999,999,999,999,999,999 For integers, all numbers expressible within the encoding (int32/uint32/etc.) are valid so there is no safe choice. If an unsigned integer is encoded as a signed integer, it is possible to use negative numbers safely. However, these numbers will be unrepresentable if the number is stored as an unsigned integer. For this reason a negative number place-holder must not be used. The number was chosen for three reasons. First, attribute scales are often chosen to reduce the likelihood of overflow: numbers towards MAXINT (the large number representable in an integer domain) are less likely to appear. Second, repeated numbers stand out more clearly to humans. Finally, the statistical frequency of measured values often follows Benford's law, which indicates that numbers starting with "1" occur far more frequently than those starting with "9" (about six times more likely). For these reasons, information providers must use all-nines to indicate an unknown value. 6. Filepath: must start either "/UNDEFINEDPATH" or "\UNDEFINEDPATH". As with the simple string, a single upper-case word is recommended. The initial slash indicates that the value is a path. Implementations must use whichever slash is most appropriate for the underlying system (Unix-like systems use a forward-slash). Software should accept either value as an unknown-value place-holder. Additional information can be encoded as data beyond the initial UNDEFINEDPATH, separated by the same slash as started the value. Additional comments should not use any of the following characters: \ [ ] ; = " \ : | , * . Examples: /UNDEFINEDPATH \UNDEFINEDPATH /UNDEFINEDPATH/Broker unavailable 7. Email addresses: must use an undefined FQDN for the domain. RFC 2822 defines emails addresses to have the form: <local-part> '@' <domain> The <domain> must be an undefined FQDN; see above for a complete description. For email addresses, information providers should use "example.org" for scenario 1. and "unknown.invalid" for scenario 2. The <local-part> may be used to encode a small amount of additional information; for example, it may indicate the class of user to whom the email address should be delivered. If no such information is to be encoded the value "user" must be used. Examples: user@example.org user@unknown.invalid site-local-contact@example.org local-admin@example.org 8. Uniform Resource Identifier (URI): schema-specific RFC 3986 defines URIs as a "federated and extensible naming system." All URIs start with a schema-name part (e.g., "http") and no schema-name has been reserved for undefined or documenting example values. For any given URI schema ("http", for example), it may be possible to define an unknown value within that name-space. If a GLUE value has only one valid schema, the undefined value must be taken from that schema. If several schemata are possible, one must be chosen from the available options. This should be the most commonly used. Take care with the URI encoding. All unknown URI values must be valid URIs. If additional information is included, it must be encoded so the resulting URI is valid. For schemata that may include a FQDN (e.g., a reference to an Internet host), an undefined URI must use an undefined FQDN; see above for details on undefined FQDNs. URI schemata that reference a remote file (e.g., "http", "ftp", "https"), additional information may be included as the path. The FQDN indicates that the value is a place-holder, indicating an unknown value, so information providers should not specify "UNDEFINEDPATH". For "file" URIs, the path part must identify the value as unknown and must use the forward-slash variant; see above for details on undefined paths. For "mailto" URIs [RFC 2368] encapsulates valid email addresses with additional information (such as email headers and message body). Unknown mailto URIs must use an unknown email address (see above). Any additional information must be included in the email body. There may be other schemata in use that are not explicitly covered in this section. A place-holder value should be agreed upon within whichever domain such schemata are used. This place-holder value should be in the spirit of the place-holder values described so far. Examples: http://www.example.org/ httpg://your-CE.example.org/path/to/end-point httpg://unknown.invalid/User%20certificate%20has%20expired mailto:site-admin@example.org mailto:user@maildomain.invalid?body=Problem%20connecting%20to%20WLMS file:///UNDEFINEDPATH file:///UNDEFINEDPATH/path%20to%20some%20directory 9. X509 Distinguished Names: must include a RDN of CN=UNDEFINEDUSER X509 uses a X500 namespace, represented as several Relative Domain-Names (RDNs) concatenated by forward-slashes. The final RDN is usually a single common name (CN), although multiple CNs are allowed. Unknown DN values must have at least two entries: an initial O=Grid followed immediately by CN=UNDEFINEDUSER. Additional information can be encoded using extra CN entries. These must come after CN=UNDEFINEDUSER. Examples: /O=Grid/CN=UNDEFINEDUSER /O=Grid/CN=UNDEFINEDUSER/CN=Your Grid certificate DN here /O=Grid/CN=UNDEFINEDUSER/CN=Cannot access SE 10. Fully Qualified Attribute Name (FQAN): must use a VO of "vo.example.org" (for scenario 1.) or "unknown.illegal" (for scenario 2). The "VOMS Credential Format" document, http://edg-wp2.web.cern.ch/edg-wp2/security/voms/edg-voms-credential.pdf states that FQANs must have the form: /VO[/group[/subgroup(s)]][/Role=role][/Capability=cap] Where VO is a well-formed DNS name. Unlike DNS names, VO names must be lower-case. The unknown place-holder value for FQAN is derived from the unknown DNS name (see above). It must have no subgroup(s) or Capability specified. Any additional information must be encoded within a single Role name. Care should be taken that only valid characters (A-Z, a-z, 0-9 and dash) are included. Examples: /vo.example.org /vo.example.org/Role=Replace-this-example-with-your-FQAN /unknown.illegal /unknown.illegal/Role=Unable-to-contact-CE-Error-42 11. Geographic locations: must use longitude 0 degrees, latitude 0 degrees. Meridians of longitude are taken from (-180,180] degrees, whilst parallels of latitude are taken from [-90,90] degrees. For a place-holder value to be a valid location, it must also be taken from these ranges. By a happy coincidence, the (0,0) location is within the Atlantic Ocean, some 380 miles (611 kilometers) south of the nearest country (Ghana). Since this location is unlikely to be used and repeated numbers are easier for humans to spot, (0,0) must be used to specify an unknown location. Definition of words: --- The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are used deliberately and take their meaning from RFC 2119. A brief summary is given here. 1. MUST (or "REQUIRED") means that no deviation is allowed from conforming software. 2. MUST NOT means complete prohibition of this behaviour with conforming software. 3. SHOULD (or "RECOMMENDED") means that there may be reasons why conforming software does not to adopt this behaviour, but all the effects of an alternative behaviour must be understood and considered before choosing a different course. 4. SHOULD NOT (or "NOT RECOMMENDED") means that there may be reasons why conforming software adopts this behaviour, but all the effects of an alternative behaviour must be understood and considered before choosing a different course. 5. MAY (or "OPTIONAL") means an item is completely optional.

Paul Millar [mailto:paul.millar@desy.de] said:
I was taken aback by that, too!
I think Gidon Moont (real time monitor) ticketed sites which were publishing 0,0 - so it is useful to be able to see clearly which sites are wrong (although that doesn't help the ones with the wrong sign for the longitude!).
've also taken the liberty to include a FQAN section (should people decide to publish these).
We publish generic authz info so I'm not sure if we should tie this to VOMS - but we are still undecided on the policy representation anyway, so it may make sense to have a default for VOMS as for a specific URL scheme.
10. Fully Qualified Attribute Name (FQAN): must use a VO of "vo.example.org" (for scenario 1.) or "unknown.illegal" (for scenario 2).
Don't you mean unknown.invalid? Stephen

On Saturday 26 January 2008 11:48:28 Burke, S (Stephen) wrote:
I think Gidon Moont (real time monitor) ticketed sites which were publishing 0,0 - so it is useful to be able to see clearly which sites are wrong
Yes, That's a good example for scenario-1. Hopefully standardising these values would help Gstat check for them in their validation tests.
(although that doesn't help the ones with the wrong sign for the longitude!).
Ha! I remember him talking about the difficulty people seemed to have in publishing the correct information. With the longitude, I believe (as a work-around) he had to hard-code that certain locations were a particular side of the primary meridian and flip the sign if they published a value with the wrong sign.
've also taken the liberty to include a FQAN section (should people decide to publish these).
We publish generic authz info so I'm not sure if we should tie this to VOMS - but we are still undecided on the policy representation anyway, so it may make sense to have a default for VOMS as for a specific URL scheme.
OK, I'll leave it in for now. We can always remove it or specify a place-holder value for a more generic authz class. As an idea, would identifying authz info as a URI make sense? This would require specify a schema-name part for FQAN. For example, this could be "fqan", with "fqan:/vo.example.org/Role=An-example" as an example FQAN. I believe no one currently writes FQANs as URIs, but doing so would allow GLUE to support additional authz schemes without redefining the data-type. Since GLUE currently doesn't specify any authz info, this may be a little premature.
10. Fully Qualified Attribute Name (FQAN): must use a VO of "vo.example.org" (for scenario 1.) or "unknown.illegal" (for scenario 2).
Don't you mean unknown.invalid?
Thanks, well spotted. I've also added some minor changes to the introduction. Cheers, Paul. --- Appendix D : place-holder values for unknown data (v1.3) ---- Introduction --- Whilst people endeavour to provide accurate information, there may be situations where specific GLUE attributes may be assigned place-holder (or dummy) values. These place-holder values carry some additional semantic meaning; specifically, that the correct value is currently unknown and the presented value should be ignored. This appendix describes a set of such place-holder values. Some attributes within the GLUE schema are required whilst others are optional. If the attribute is optional and the corresponding information is unavailable, the information provider must either publish a place-holder or not to publish the attribute. If the attribute is required, then the information must either publish a place-holder value or refrain from publishing the GLUE object. If a place-holder value is published, it must conform to the scheme described in this appendix. This is to increase the likelihood that software will understand the nature of the information it receives. This appendix describes place-holder values that have be chosen so they are obvious "wrong" to humans, unlikely to occur under normal operation and valid within the attribute type. This also allows for detection of failing information provider components. Use-cases: --- There are two principle use-cases for place-holder values, although others may exist. Scenario 1. a static value has no good default value and has not been configured for a particular site. Some provisions for GLUE Schema provide templates. These templates may contain attributes that have no good default value; for example, supplying the correct value may require site-specific knowledge. Whilst it is expected that these attributes be configured, it is possible that this does not happen, so exposing the attributes' default values. Scenario 2. information provider is unable to obtain a dynamic value. A dynamic value is provided by an information provider by querying the underlying grid resources. This query will use a number of ancillary resources (e.g., DNS, network hardware) that might fail; the grid services might also fail. If an attribute is required and the current value is unobtainable, a place-holder value must be used. Place-holder values: --- This section describes a number of values that can be represented within a given address space (e.g., Strings/UTF-8, Integers, FQDNs, IPv4 address space). Each of the different types are introduced along with the place-holder value and a brief discussion on usage, rational and any other considerations. 1. Simple strings (ASCII/UTF-8) should use "UNDEFINEDVALUE" or should start "UNDEFINEDVALUE:" Upper-case letters make it easier to spot and a single word avoids any white-space issues. A short error message can be incorporated into the message by appending the message after the colon. Examples: UNDEFINEDVALUE UNDEFINEDVALUE: unable to contact torque daemon. Using UNDEFINEDVALUE is a default option for strings that have no widely-known structure. If a value is of a more restrictive sub-type (e.g., FQDNs, FQANs, URIs) described below, then the rules for more restrictive form must be used. 2. Fully qualified domain names: must use a hostname ending either "example.org" for scenario 1, or "invalid" for scenario 2. RFC 2606 defines two second-level domains: "example.org" and "example.com". These domains have the advantage of ending with a recognisable TLD, so are recognisable as a DNS name. Default configuration (scenario 1, above) must use DNS names that end "example.org" RFC 2606 also reserves the "invalid" Top-Level-Domain (TLD) as always invalid and clearly so. For dynamic information gathering, a value ending "invalid" must be used. In both cases, additional information may be included by specifying a prefix to "example.org" or "invalid". This may be used to specify the class of machine that should be present. For dynamic infomation, if the class of machine is not published then the FQDN "unknown.invalid" must be used. Examples: www.example.org your-CE.example.org unknown.invalid site-local-BDII.invalid 3. IPv4 addr: must use 192.0.2.250 There are several portions of IPv4 addresses that should not appear on a network, but none that are reserved for documentation or to specify a non-existent address. Using any address leads to the risk of side-effects, should this value be used. The best option is an IP address from the 192.0.2.0/24 subnet. This subnet is defined in RFC 3330 as "TEST-NET" for use in documentation and example code. For consistency, the value 192.0.2.250 must be used. 4. IPv6 addr: must use 2001:DB8::FFFF There is no documented undefined IPv6 address. RFC 3849 reserves the address prefix 2001:DB8::/32 for documentation. For consistency, the address 2001:DB8::FFFF must be used. 5. Integers: must use "all nines" For uint32/int32 this is 999,999,999 For uint64/int64 this is 999,999,999,999,999,999 For integers, all numbers expressible within the encoding (int32/uint32/etc.) are valid so there is no safe choice. If an unsigned integer is encoded as a signed integer, it is possible to use negative numbers safely. However, these numbers will be unrepresentable if the number is stored as an unsigned integer. For this reason a negative number place-holder must not be used. The number was chosen for three reasons. First, attribute scales are often chosen to reduce the likelihood of overflow: numbers towards MAXINT (the large number representable in an integer domain) are less likely to appear. Second, repeated numbers stand out more clearly to humans. Finally, the statistical frequency of measured values often follows Benford's law, which indicates that numbers starting with "1" occur far more frequently than those starting with "9" (about six times more likely). For these reasons, information providers must use all-nines to indicate an unknown value. 6. Filepath: must start either "/UNDEFINEDPATH" or "\UNDEFINEDPATH". As with the simple string, a single upper-case word is recommended. The initial slash indicates that the value is a path. Implementations must use whichever slash is most appropriate for the underlying system (Unix-like systems use a forward-slash). Software should accept either value as an unknown-value place-holder. Additional information can be encoded as data beyond the initial UNDEFINEDPATH, separated by the same slash as started the value. Additional comments should not use any of the following characters: \ [ ] ; = " : | , * . Examples: /UNDEFINEDPATH \UNDEFINEDPATH /UNDEFINEDPATH/Path to storage area /UNDEFINEDPATH/Broker unavailable 7. Email addresses: must use an undefined FQDN for the domain. RFC 2822 defines emails addresses to have the form: <local-part> '@' <domain> The <domain> must be an undefined FQDN; see above for a complete description. For email addresses, information providers should use "example.org" for scenario 1. and "unknown.invalid" for scenario 2. The <local-part> may be used to encode a small amount of additional information; for example, it may indicate the class of user to whom the email address should be delivered. If no such information is to be encoded the value "user" must be used. Examples: user@example.org user@unknown.invalid site-local-contact@example.org local-admin@example.org 8. Uniform Resource Identifier (URI): schema-specific RFC 3986 defines URIs as a "federated and extensible naming system." All URIs start with a schema-name part (e.g., "http") and no schema-name has been reserved for undefined or documenting example values. For any given URI schema ("http", for example), it may be possible to define an unknown value within that name-space. If a GLUE value has only one valid schema, the undefined value must be taken from that schema. If several schemata are possible, one must be chosen from the available options. This should be the most commonly used. Take care with the URI encoding. All unknown URI values must be valid URIs. If additional information is included, it must be encoded so the resulting URI is valid. For schemata that may include a FQDN (e.g., a reference to an Internet host), an undefined URI must use an undefined FQDN; see above for details on undefined FQDNs. URI schemata that reference a remote file (e.g., "http", "ftp", "https"), additional information may be included as the path. The FQDN indicates that the value is a place-holder, indicating an unknown value, so information providers should not specify "UNDEFINEDPATH". For "file" URIs, the path part must identify the value as unknown and must use the forward-slash variant; see above for details on undefined paths. For "mailto" URIs [RFC 2368] encapsulates valid email addresses with additional information (such as email headers and message body). Unknown mailto URIs must use an unknown email address (see above). Any additional information must be included in the email body. There may be other schemata in use that are not explicitly covered in this section. A place-holder value should be agreed upon within whichever domain such schemata are used. This place-holder value should be in the spirit of the place-holder values described so far. Examples: http://www.example.org/ httpg://your-CE.example.org/path/to/end-point httpg://unknown.invalid/User%20certificate%20has%20expired mailto:site-admin@example.org mailto:user@maildomain.invalid?body=Problem%20connecting%20to%20WLMS file:///UNDEFINEDPATH file:///UNDEFINEDPATH/path%20to%20some%20directory 9. X509 Distinguished Names: must start /O=Grid/CN=UNDEFINEDUSER X509 uses a X500 namespace, represented as several Relative Domain-Names (RDNs) concatenated by forward-slashes. The final RDN is usually a single common name (CN), although multiple CNs are allowed. Unknown DN values must have at least two entries: an initial O=Grid followed immediately by CN=UNDEFINEDUSER. Additional information can be encoded using extra CN entries. These must come after CN=UNDEFINEDUSER. Examples: /O=Grid/CN=UNDEFINEDUSER /O=Grid/CN=UNDEFINEDUSER/CN=Your Grid certificate DN here /O=Grid/CN=UNDEFINEDUSER/CN=Cannot access SE 10. Fully Qualified Attribute Name (FQAN): must use a VO of "vo.example.org" (for scenario 1.) or "unknown.invalid" (for scenario 2). The "VOMS Credential Format" document, http://edg-wp2.web.cern.ch/edg-wp2/security/voms/edg-voms-credential.pdf states that FQANs must have the form: /VO[/group[/subgroup(s)]][/Role=role][/Capability=cap] Where VO is a well-formed DNS name. Unlike DNS names, VO names must be lower-case. The unknown place-holder value for FQAN is derived from the unknown DNS name (see above). It must have no subgroup(s) or Capability specified. Any additional information must be encoded within a single Role name. Care should be taken that only valid characters (A-Z, a-z, 0-9 and dash) are included. Examples: /vo.example.org /vo.example.org/Role=Replace-this-example-with-your-FQAN /unknown.invalid /unknown.invalid/Role=Unable-to-contact-CE-Error-42 11. Geographic locations: must use longitude 0 degrees, latitude 0 degrees. Meridians of longitude are taken from (-180,180] degrees, whilst parallels of latitude are taken from [-90,90] degrees. For a place-holder value to be a valid location, it must also be taken from these ranges. By a happy coincidence, the (0,0) location is within the Atlantic Ocean, some 380 miles (611 kilometers) south of the nearest country (Ghana). Since this location is unlikely to be used and repeated numbers are easier for humans to spot, (0,0) must be used to specify an unknown location. Definition of words: --- The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are used deliberately and take their meaning from RFC 2119. A brief summary is given here. 1. MUST (or "REQUIRED") means that no deviation is allowed from conforming software. 2. MUST NOT means complete prohibition of this behaviour with conforming software. 3. SHOULD (or "RECOMMENDED") means that there may be reasons why conforming software does not to adopt this behaviour, but all the effects of an alternative behaviour must be understood and considered before choosing a different course. 4. SHOULD NOT (or "NOT RECOMMENDED") means that there may be reasons why conforming software adopts this behaviour, but all the effects of an alternative behaviour must be understood and considered before choosing a different course. 5. MAY (or "OPTIONAL") means an item is completely optional.

Paul Millar [mailto:paul.millar@desy.de] said:
As an idea, would identifying authz info as a URI make sense?
This would require specify a schema-name part for FQAN. For example, this could be "fqan", with "fqan:/vo.example.org/Role=An-example"
This is still under debate, we need some way of representing authz info but no-one is quite sure what the best way is. The current (1.3) solution does do pretty much what you suggest, in fact we publish something like "VOMS:/atlas/Role=Production", as well as the traditional "VO:atlas" form. One question is whether we would ever need to be able to support more than one authz scheme for the same resource/service. Stephen

On Monday 28 January 2008 15:49:48 Burke, S (Stephen) wrote:
Paul Millar [mailto:paul.millar@desy.de] said:
This would require specify a schema-name part for FQAN. For example, this could be "fqan", with "fqan:/vo.example.org/Role=An-example"
This is still under debate, we need some way of representing authz info but no-one is quite sure what the best way is. The current (1.3) solution does do pretty much what you suggest, in fact we publish something like "VOMS:/atlas/Role=Production", as well as the traditional "VO:atlas" form.
Ah, so we could use the "voms" schema-type, rather than "fqan", and perhaps deprecate vo:atlas in favour of voms:/atlas ?
One question is whether we would ever need to be able to support more than one authz scheme for the same resource/service.
I don't know how widely know this is, but there's a UK-base JISC project (VPMan) that is looking into "merging" multiple authorisation schemes. Part of the project involved a use-case capture, which is available here: http://sec.cs.kent.ac.uk/vpman/D1-2v1.doc (I've placed a PDF version here: http://www.desy.de/%7Epaul/tmp/D1-2v1.pdf but some of the diagrams seem to have been lost) In particular, they mention VOMS and PERMIS, but Shibboleth also gets a mention. HTH, Paul.

Paul Millar [mailto:paul.millar@desy.de] said:
Ah, so we could use the "voms" schema-type, rather than "fqan", and perhaps deprecate vo:atlas in favour of voms:/atlas ?
They aren't necessarily the same, it depends on the matching rules - see the authz document I sent the link for. For now I think we should keep both. Stephen
participants (2)
-
Burke, S (Stephen)
-
Paul Millar