More comments on the draft spec.

Hi all, Some more comments. A general comment: To answer some questions, multiple passes through the info-system may be required. For example, see "Which sites have SEs which support the gsidcap protocol?" within Stephen's page, here: http://egee-uig.web.cern.ch/egee-uig/production_pages/Advancedldapsearch.htm... With LDAP, the object's RDN is often built from its ID or LocalID. So, fields derived from an object's RDN (such as GlueChunkKey and GlueForeignKey) will be dependent on LocalID (or ID). I believe Glue currently makes no statement about how long LocalID or ID should remain constant for the same object: the persistence of the LocalID value. Although silly, I believe it would currently be acceptable to generate fresh random LocalIDs for each object every time data is published, provided all references were updated at the same time. However, if one uses the RDN (for example, by following a GlueForeignKey value in a separate query), there is a tacit assumption that the RDN of the target object will not change (or, at least, that it is unlikely to change for the period between successive queries). This is only true if the ID or LocalID used to build the RDN doesn't change. So, it seems we have a requirement for IDs or LocalIDs to be persistent over time. This should be stated somewhere, probably in section 3 (General Comments). Some more comments: *** Page 12 "activity" is misspelt in the description of the Associate End for Share.LocalID *** Page 28 When providing a diagram representing the specialisation of entities (e.g., Fig. 3) the inherited associations are not shown. Could these be added somehow (e.g., within the Main Entities section)? Also, it isn't immediately clear that the entities in "Storage Entities" are the same as those in "Storage Entities - Inheritance". Could this identification be included in the diagram? *** Page 30 It's a little unclear why we have both StorageAccessProtocol and StorageEndpoint since StorageEndpoint can represent access protocols. The StorageAccessProtocol seems to be only of use when talking about the CE-SE-bind objects. If so, perhaps the description of StorageAccessProtocol could be updated to mention this. There's still the question why Capability is a required property of StorageEndpoint: this is an echo of my previous point about this, suggesting that it is optional in Endpoint or simply removed and added to a new subclass of Endpoint. *** Page 31 I feel we should mention in the description of StorageShare that it is: A UserDomain's view of [a utilization target for a set of StorageResources ...] This may not be obvious, especially as the UserDomain--StorageShare association is currently not shown on the Storage entities diagram (Fig.3) as it's inherited. *** Page 33 StorageResource: The Latency is the maximum latency under normal operating conditions, not the maximum under any circumstance. Cheers, Paul.

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Paul Millar said: So, it seems we have a requirement for IDs or LocalIDs to be persistent over time. This should be stated somewhere, probably in section 3
You're right. For UniqueIDs there may be other requirements, because they may be stored in external services like catalogues or databases (e.g. consider the site names, which start in the GOCDB and propagate all over the place - not trivial to change).
It's a little unclear why we have both StorageAccessProtocol and StorageEndpoint since StorageEndpoint can represent access protocols.
Well, it isn't entirely obvious that StorageEndpoint *can* represent APs, that was one of my questions ... but in any case for SRM the protocol endpoints would normally not be published because you don't use them directly, you're supposed to ask the SRM for a TURL. However, you would still like to have the list of protocol types which are supported. Also there could be other complications, for example an SE might have a gridftp server for reasons other than it being an accessprotocol (e.g. the CE currently has one to allow RTE tags to be published ...) Stephen

On Friday 16 May 2008 18:31:40 Burke, S (Stephen) wrote:
Paul Millar said: So, it seems we have a requirement for IDs or LocalIDs to be persistent over time. This should be stated somewhere, probably in section 3
You're right. For UniqueIDs there may be other requirements, because they may be stored in external services like catalogues or databases (e.g. consider the site names, which start in the GOCDB and propagate all over the place - not trivial to change).
Actually, I'd say that storing the UniqueIDs in something like a catalogue is "broken" client behaviour. I certainly feel we (GLUE) should *not* make it a requirement that UniqueIDs never change in order to support this broken behaviour. It's reasonable to require information providers to provide consistent UniqueIDs for (choosing a duration out of the thin air) an hour. Supporting constant UniqueIDs for longer periods would be better, with the expectation being that (for a production system) LocalIDs remain constant for many (many) days. (Stating the same idea, but without choosing a random number: an object's UniqueID value should remain constant for a period far greater than the latency of the underlying information system under normal operating conditions.) However, LocalID values may be derived from the system configuration, which might change in a non-trivial way. If such a system-change takes place, it may prove impossible to maintain LocalIDs, so the values *will* change. This is unavoidable. Of course, clients are free to store these values wherever they wish (we can hardly stop them!), but they should be aware that LocalIDs cannot be guaranteed in perpetuity. If the values are stored in a catalogue, then their software must be prepared for these values to change at an unannounced time in the future. Ideally (IMHO) clients should always derive LocalIDs by querying the underlying system against known values of properties. This process may be optimised by (for example) caching LocalIDs, but if so, these caches are understood to be volatile and not recorded somewhere permanent, such as a catalogue. This is why I feel there should be a clear statement of this somewhere in GLUE specification.
It's a little unclear why we have both StorageAccessProtocol and StorageEndpoint since StorageEndpoint can represent access protocols.
Well, it isn't entirely obvious that StorageEndpoint *can* represent APs, that was one of my questions...
For LAN access, you might want to advertise which machines and from which port the protocol is available (dcap, gsidcap, xrootd, rfio), or you might not (AFS, NFS, Luster, GPFS). However, properties like hostname and port information seems to be missing from the StorageAccessProtocol entities. If you want to publish these properties then currently one must use a StorageEndpoint.
However, you would still like to have the list of protocol types which are supported.
Sure.
Also there could be other complications, for example an SE might have a gridftp server for reasons other than it being an accessprotocol (e.g. the CE currently has one to allow RTE tags to be published ...)
Sorry, what are "RTE tags"? My knowledge of CEs is rather limited, but I thought the current CEs have a gridftp server to allow transfers of user sandboxes. Cheers, Paul.

Paul Millar [mailto:paul.millar@desy.de] said:
Actually, I'd say that storing the UniqueIDs in something like a catalogue is "broken" client behaviour. I certainly feel we (GLUE) should *not* make it a requirement that UniqueIDs never change in order to support this broken behaviour.
The requirement isn't necessarily that they never change, but that there may be consequences if they do. The same is often going to be true for any URL/URI - for example desy could change its domain name from desy.de to desy.org, but there would be many consequences, e.g. that lots of email addresses and web links stored all over the world would become invalid. I think this is an underappreciated point in fact, e.g. there are still plenty of links around that point to http://uimon.cern.ch even though it was moved a couple of years ago (at least the redirection is still alive). Being more of a purist I could argue that if you change the unique name of something it in fact becomes a different thing that just happens to share some of the properties ... however, in practice most Glue UIDs are not that critical, probably SE and Site IDs are the most important to preserve (and of course the VO names).
For LAN access, you might want to advertise which machines and from which port the protocol is available (dcap, gsidcap, xrootd, rfio), or you might not (AFS, NFS, Luster, GPFS).
However, properties like hostname and port information seems to be missing from the StorageAccessProtocol entities.
If you want to publish these properties then currently one must use a StorageEndpoint.
How would that help? You still have to ask the SRM for a TURL, so what would you do with the information? (For a classic SE it's different, there you do need the endpoints.)
Also there could be other complications, for example an SE might have a gridftp server for reasons other than it being an accessprotocol (e.g. the CE currently has one to allow RTE tags to be published ...)
Sorry, what are "RTE tags"?
RunTimeEnvironment is an attribute of a CE used to publish arbitrary information about installed software, much like the OtherInfo attributes. VOs need to be able to add their own tags to advertise VO-specific software installations, so the CE has a gridftp server to allow them to deposit the information where the info provider can find it. Potentially you could have the same kind of thing on the SE.
My knowledge of CEs is rather limited, but I thought the current CEs have a gridftp server to allow transfers of user sandboxes.
That server is on the RB, but that is indeed another example of a gridftp server not used for generic data storage. Stephen

Hi Paul,
It's reasonable to require information providers to provide consistent UniqueIDs for (choosing a duration out of the thin air) an hour. Supporting constant UniqueIDs for longer periods would be better, with the expectation being that (for a production system) LocalIDs remain constant for many (many) days.
(Stating the same idea, but without choosing a random number: an object's UniqueID value should remain constant for a period far greater than the latency of the underlying information system under normal operating conditions.)
The requirement is much stronger for some of the UniqueID values. For example, a site's CE/SE UniqueID values may be used in CS2SS/SS2CS records published by different sites, so the one site should not change its CE/SE UniqueID values on a whim...
participants (3)
-
Burke, S (Stephen)
-
Maarten.Litmaath@cern.ch
-
Paul Millar