Re: [glue-wg] Publishing annotations

7 Apr 2008

      Hi Stephen,

I don't think "dCache" has a strong opinion on this.

I'm not sure I do, either, but my initial thoughts are that annotations are 
not the right solution here as they don't fit in with how the rest of GLUE is 
structured.

More comments interleaved below:

On Monday 07 April 2008 14:32:20 Burke, S (Stephen) wrote:
[...]
...
The specific thing I was considering was the EGEE "freedom of choice" tool,
which currently edits the published data by removing
AcessControlBaseRule attributes for CEs which are failing defined tests.
[...]
Yes, removing information definitely sounds like a hack.

Personally, I'd say the correct approach is that, should ATLAS wish to publish 
information (I didn't realise FoC was published into GLUE like this), it 
should do so as well-defined objects rather than as annotations to existing 
objects.

The AccessControlBaseRule is set by a site, say that they believe that they 
support a UserDomain (ATLAS, for example).  The site is authoritative in that 
statement, no-one should create or alter that information.

If the UserDomain (e.g., ATLAS) wants to record that a service is broken for 
them, that should be recorded elsewhere---somewhere where the UserDomain is 
authoritative.  Alternatively, the UserDomain could publish the services it 
has checked as working (e.g., within EGEE, via SAM).

For example, there could be a ServiceBroken (or similarly named) object that 
records that some UserDomain (e.g. ATLAS) found the service unusable.

Alternatively, the UserDomain could publish a CertifiedSane object that links 
a UserDomain (ATLAS, ops (for SAM tests), etc.) to a Service.  In effect, 
publishing this object saying the service "works for me".

CertifiedSane objects assumes a default-broken model (a service must prove 
itself) whereas a ServiceBroken object assumes a default-working model. One 
would hope that publishing ServiceBroken objects would take less storage.

Clients can choose to ignore these ServiceBroken / CertifiedSane objects; 
e.g., the ATLAS testing framework could ignore it whilst normal ATLAS jobs 
look for it.
...
Another thing that occurs to me on similar lines is the downtime
information. I see that this is now embedded in the Endpoints, but for
EGEE that wouldn't be usable because our downtime information is in a
central database (GOC DB), so if we wanted to publish it it would again
need to be done by modifying/annotating published information from a
different place.
I believe this is a long-standing issue that arose for two reasons: first, it 
was "difficult" for a site to publish down-time through GLUE; second, if a 
site is going into emergency down-time (a JCB has just cut the fibre-optic 
link), then publishing information through GLUE would be difficult.

The first one is (I think) purely an implementation issue.  If the information 
was readily available, then GOC could switch from being definitive to acting 
as a cache for the information published by the site ... but this is really 
an EGEE decision.

For the second issue, I'm not sure what the correct solution is here.  
Perhaps, with most technologies, the site should drop out of the info. 
system, which would be "good enough".
...
This could maybe also feed into our CESEBind discussion, e.g. rather
than the CE and SE publishing disconnected pieces of information the CE
could in effect add/modify things published by the SE, i.e. we could
separate the schema structure from the question of who publishes it.
Hmmm, not completely sure about this.

I believe we currently assume that an object is published by a single 
component.  Maybe this needn't be an assumption, but I'm not sure how well 
the different underlying storage technologies would handle this 

I suspect, in most cases, we would need to split the information anyway into 
the different subsets to allow partial results to validate correctly 
(objectClass in LDAP, XSD types with XML, etc).  If this is so, then (in 
effect) we've split the object into parts, whether we label them as separate 
objects or not.
...
What do people think? Maybe it's too complicated to fit in at this
stage, but it might be worth a bit of thought - at the abstract schema
level it wouldn't necessarily mean huge changes. Just thinking about it
quickly I suspect the problems would come at the implementation level,
e.g. in LDAP the annotations would still be in a different part of the
tree so it would take some extra effort in queries to find them.
I'm not sure annotations really gains us much, but I'm also willing to be 
convinced differently :-)

HTH,

Paul.

Re: [glue-wg] Publishing annotations

Paul Millar