
Hi Stephen, I don't think "dCache" has a strong opinion on this. I'm not sure I do, either, but my initial thoughts are that annotations are not the right solution here as they don't fit in with how the rest of GLUE is structured. More comments interleaved below: On Monday 07 April 2008 14:32:20 Burke, S (Stephen) wrote: [...]
The specific thing I was considering was the EGEE "freedom of choice" tool, which currently edits the published data by removing AcessControlBaseRule attributes for CEs which are failing defined tests. [...]
Yes, removing information definitely sounds like a hack. Personally, I'd say the correct approach is that, should ATLAS wish to publish information (I didn't realise FoC was published into GLUE like this), it should do so as well-defined objects rather than as annotations to existing objects. The AccessControlBaseRule is set by a site, say that they believe that they support a UserDomain (ATLAS, for example). The site is authoritative in that statement, no-one should create or alter that information. If the UserDomain (e.g., ATLAS) wants to record that a service is broken for them, that should be recorded elsewhere---somewhere where the UserDomain is authoritative. Alternatively, the UserDomain could publish the services it has checked as working (e.g., within EGEE, via SAM). For example, there could be a ServiceBroken (or similarly named) object that records that some UserDomain (e.g. ATLAS) found the service unusable. Alternatively, the UserDomain could publish a CertifiedSane object that links a UserDomain (ATLAS, ops (for SAM tests), etc.) to a Service. In effect, publishing this object saying the service "works for me". CertifiedSane objects assumes a default-broken model (a service must prove itself) whereas a ServiceBroken object assumes a default-working model. One would hope that publishing ServiceBroken objects would take less storage. Clients can choose to ignore these ServiceBroken / CertifiedSane objects; e.g., the ATLAS testing framework could ignore it whilst normal ATLAS jobs look for it.
Another thing that occurs to me on similar lines is the downtime information. I see that this is now embedded in the Endpoints, but for EGEE that wouldn't be usable because our downtime information is in a central database (GOC DB), so if we wanted to publish it it would again need to be done by modifying/annotating published information from a different place.
I believe this is a long-standing issue that arose for two reasons: first, it was "difficult" for a site to publish down-time through GLUE; second, if a site is going into emergency down-time (a JCB has just cut the fibre-optic link), then publishing information through GLUE would be difficult. The first one is (I think) purely an implementation issue. If the information was readily available, then GOC could switch from being definitive to acting as a cache for the information published by the site ... but this is really an EGEE decision. For the second issue, I'm not sure what the correct solution is here. Perhaps, with most technologies, the site should drop out of the info. system, which would be "good enough".
This could maybe also feed into our CESEBind discussion, e.g. rather than the CE and SE publishing disconnected pieces of information the CE could in effect add/modify things published by the SE, i.e. we could separate the schema structure from the question of who publishes it.
Hmmm, not completely sure about this. I believe we currently assume that an object is published by a single component. Maybe this needn't be an assumption, but I'm not sure how well the different underlying storage technologies would handle this I suspect, in most cases, we would need to split the information anyway into the different subsets to allow partial results to validate correctly (objectClass in LDAP, XSD types with XML, etc). If this is so, then (in effect) we've split the object into parts, whether we label them as separate objects or not.
What do people think? Maybe it's too complicated to fit in at this stage, but it might be worth a bit of thought - at the abstract schema level it wouldn't necessarily mean huge changes. Just thinking about it quickly I suspect the problems would come at the implementation level, e.g. in LDAP the annotations would still be in a different part of the tree so it would take some extra effort in queries to find them.
I'm not sure annotations really gains us much, but I'm also willing to be convinced differently :-) HTH, Paul.