
Hi, I don't know if this has been discussed already, or is considered in scope, but I was just thinking about whether you could have a way for a third party to publish "annotations" to glue schema objects. The specific thing I was considering was the EGEE "freedom of choice" tool, which currently edits the published data by removing AcessControlBaseRule attributes for CEs which are failing defined tests. That works, but it isn't entirely satisfactory in that it isn't transparent, you can't tell that the published data has been modified or who did it (or indeed why). On the face of it it would be better to do this kind of thing explicitly, so you could see where the modifications were coming from and choose whether to use them or not. Another thing that occurs to me on similar lines is the downtime information. I see that this is now embedded in the Endpoints, but for EGEE that wouldn't be usable because our downtime information is in a central database (GOC DB), so if we wanted to publish it it would again need to be done by modifying/annotating published information from a different place. This could maybe also feed into our CESEBind discussion, e.g. rather than the CE and SE publishing disconnected pieces of information the CE could in effect add/modify things published by the SE, i.e. we could separate the schema structure from the question of who publishes it. What do people think? Maybe it's too complicated to fit in at this stage, but it might be worth a bit of thought - at the abstract schema level it wouldn't necessarily mean huge changes. Just thinking about it quickly I suspect the problems would come at the implementation level, e.g. in LDAP the annotations would still be in a different part of the tree so it would take some extra effort in queries to find them. Stephen

Hi Stephen, I don't think "dCache" has a strong opinion on this. I'm not sure I do, either, but my initial thoughts are that annotations are not the right solution here as they don't fit in with how the rest of GLUE is structured. More comments interleaved below: On Monday 07 April 2008 14:32:20 Burke, S (Stephen) wrote: [...]
The specific thing I was considering was the EGEE "freedom of choice" tool, which currently edits the published data by removing AcessControlBaseRule attributes for CEs which are failing defined tests. [...]
Yes, removing information definitely sounds like a hack. Personally, I'd say the correct approach is that, should ATLAS wish to publish information (I didn't realise FoC was published into GLUE like this), it should do so as well-defined objects rather than as annotations to existing objects. The AccessControlBaseRule is set by a site, say that they believe that they support a UserDomain (ATLAS, for example). The site is authoritative in that statement, no-one should create or alter that information. If the UserDomain (e.g., ATLAS) wants to record that a service is broken for them, that should be recorded elsewhere---somewhere where the UserDomain is authoritative. Alternatively, the UserDomain could publish the services it has checked as working (e.g., within EGEE, via SAM). For example, there could be a ServiceBroken (or similarly named) object that records that some UserDomain (e.g. ATLAS) found the service unusable. Alternatively, the UserDomain could publish a CertifiedSane object that links a UserDomain (ATLAS, ops (for SAM tests), etc.) to a Service. In effect, publishing this object saying the service "works for me". CertifiedSane objects assumes a default-broken model (a service must prove itself) whereas a ServiceBroken object assumes a default-working model. One would hope that publishing ServiceBroken objects would take less storage. Clients can choose to ignore these ServiceBroken / CertifiedSane objects; e.g., the ATLAS testing framework could ignore it whilst normal ATLAS jobs look for it.
Another thing that occurs to me on similar lines is the downtime information. I see that this is now embedded in the Endpoints, but for EGEE that wouldn't be usable because our downtime information is in a central database (GOC DB), so if we wanted to publish it it would again need to be done by modifying/annotating published information from a different place.
I believe this is a long-standing issue that arose for two reasons: first, it was "difficult" for a site to publish down-time through GLUE; second, if a site is going into emergency down-time (a JCB has just cut the fibre-optic link), then publishing information through GLUE would be difficult. The first one is (I think) purely an implementation issue. If the information was readily available, then GOC could switch from being definitive to acting as a cache for the information published by the site ... but this is really an EGEE decision. For the second issue, I'm not sure what the correct solution is here. Perhaps, with most technologies, the site should drop out of the info. system, which would be "good enough".
This could maybe also feed into our CESEBind discussion, e.g. rather than the CE and SE publishing disconnected pieces of information the CE could in effect add/modify things published by the SE, i.e. we could separate the schema structure from the question of who publishes it.
Hmmm, not completely sure about this. I believe we currently assume that an object is published by a single component. Maybe this needn't be an assumption, but I'm not sure how well the different underlying storage technologies would handle this I suspect, in most cases, we would need to split the information anyway into the different subsets to allow partial results to validate correctly (objectClass in LDAP, XSD types with XML, etc). If this is so, then (in effect) we've split the object into parts, whether we label them as separate objects or not.
What do people think? Maybe it's too complicated to fit in at this stage, but it might be worth a bit of thought - at the abstract schema level it wouldn't necessarily mean huge changes. Just thinking about it quickly I suspect the problems would come at the implementation level, e.g. in LDAP the annotations would still be in a different part of the tree so it would take some extra effort in queries to find them.
I'm not sure annotations really gains us much, but I'm also willing to be convinced differently :-) HTH, Paul.

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Paul Millar said: I'm not sure I do, either, but my initial thoughts are that annotations are not the right solution here as they don't fit in with how the rest of GLUE is structured.
That's certainly true, but we can in principle change the structure. Actually I think at the abstract schema level it's almost trivial, the only real wrinkle is that you couldn't simply publish the existing object types, you'd need modified versions with all attributes optional and probably with UniqueIDs changed to LocalIDs (at least semantically).
Personally, I'd say the correct approach is that, should ATLAS wish to publish information (I didn't realise FoC was published into GLUE like this),
FoC isn't publishing into Glue as such, what it publishes (on a web server) are LDAP edits which are applied to top-level BDIIs behind the scenes, with the effect that some attributes published by the sites vanish from queries. My suggestion is essentially to make those edits visible and optional (and independent of LDAP).
it should do so as well-defined objects rather than as annotations to existing objects.
As it stands that wouldn't work because it doesn't fit with how the broker does the matchmaking. In principle that could be changed in the kind of way you describe - but you'd then have a special-purpose solution for this specific case rather than something which would work in general. Actually in practice I doubt it will happen, it would be too much effort for something which already works reasonably well.
I believe this is a long-standing issue that arose for two reasons: first, it was "difficult" for a site to publish down-time through GLUE;
It's more that glue never considered it as a use case in the past.
second, if a site is going into emergency down-time (a JCB has just cut the fibre-optic link), then publishing information through GLUE would be difficult.
Indeed - although the persistency of information when services are down is another question anyway, e.g. the FTS would like to continue to know about SEs even when they're down (currently done by writing a cache file). Similarly it might be nice to be able to select a fall-back BDII when your main one is down, so you query the info system to find one ... which doesn't work if you don't have a cache.
The first one is (I think) purely an implementation issue.
To some extent, but the GOC DB is now deeply embedded and is not going to change in practice. Given the general reduction in manpower for grid projects things that are "just implementation issues" may well be major obstacles!
For the second issue, I'm not sure what the correct solution is here. Perhaps, with most technologies, the site should drop out of the info. system, which would be "good enough".
On the face of it it isn't really good enough for downtime information to be unavailable when the service is down! (Although of course the recent power cut at RAL which hosts the GOC DB made that rather problematic ...) Stephen

On Monday 07 April 2008 17:50:27 Burke, S (Stephen) wrote:
[moving towards using annotations] That's certainly true, but we can in principle change the structure. Actually I think at the abstract schema level it's almost trivial, the only real wrinkle is that you couldn't simply publish the existing object types, you'd need modified versions with all attributes optional and probably with UniqueIDs changed to LocalIDs (at least semantically).
OK. My (personal) preference would be to stay as-is for now: have info-providers publish information that is limited in scope to what they can (with some authority) say and (as far as possible) group this information into classes that do not overlap with ones that other info-providers publish.
[...] (I didn't realise FoC was published into GLUE like this), FoC isn't publishing into Glue as such, what it publishes (on [...]
Ah, OK. Thanks for the info.
it should do so as well-defined objects rather than as annotations to existing objects.
As it stands that wouldn't work because it doesn't fit with how the broker does the matchmaking.
Aye,
In principle that could be changed in the kind of way you describe - but you'd then have a special-purpose solution for this specific case rather than something which would work in general.
Well, I don't think this is so specialised. In general, a UserDomain might want to mark any Service as "Sane" (i.e., has been tested). This isn't limited to any specific type of service. In essence, it's just recording the output of the SAM tests (and FoC, if different) within GLUE.
Actually in practice I doubt it will happen, it would be too much effort for something which already works reasonably well.
True.
I believe this is a long-standing issue that arose for two reasons: first, it was "difficult" for a site to publish down-time through GLUE;
It's more that glue never considered it as a use case in the past.
Ah, OK. Sorry, I thought GLUE had these fields in the past.
second, if a site is going into emergency down-time (a JCB has just cut the fibre-optic link), then publishing information through GLUE would be difficult.
Indeed - although the persistency of information when services are down is another question anyway, e.g. the FTS would like to continue to know about SEs even when they're down (currently done by writing a cache file).
Yes, I meant to ask out of idle curiosity: why is this? If an SE unavailable, (drops out of info-system) it's almost necessarily unavailable. Why does FTS care about it?
Similarly it might be nice to be able to select a fall-back BDII when your main one is down, so you query the info system to find one ... which doesn't work if you don't have a cache.
Well OK, but the problem is (to my mind) that that process is broken anyway: use the info system to discover how to ask the info system.
The first one is (I think) purely an implementation issue.
To some extent, but the GOC DB is now deeply embedded and is not going to change in practice. Given the general reduction in manpower for grid projects things that are "just implementation issues" may well be major obstacles!
Yup, fair point.
For the second issue, I'm not sure what the correct solution is here. Perhaps, with most technologies, the site should drop out of the info. system, which would be "good enough".
On the face of it it isn't really good enough for downtime information to be unavailable when the service is down! (Although of course the recent power cut at RAL which hosts the GOC DB made that rather problematic ...)
Yes, and that's where (in an idea world) the GOC DB might act as a cache for the down-time information published through Glue. (... meanwhile, back in reality) Cheers, Paul.

Paul Millar [mailto:paul.millar@desy.de] said:
In principle that could be changed in the kind of way you describe - but you'd then have a special-purpose solution for this specific case rather than something which would work in general.
Well, I don't think this is so specialised. In general, a UserDomain might want to mark any Service as "Sane" (i.e., has been tested). This isn't limited to any specific type of service.
Indeed, but it doesn't help with other kinds of things - in particular with downtime, which is unusable by EGEE as it's currently represented. Perhaps we just say that we don't care, but it still seems odd to me to introduce something in a way which can't work for one of the major grids.
Yes, I meant to ask out of idle curiosity: why is this? If an SE unavailable, (drops out of info-system) it's almost necessarily unavailable. Why does FTS care about it?
Because it does the scheduling of transfers in advance of when it actually performs them. Actually you could have the same with the WMS, in theory it could schedule jobs and then hold them if the CE wasn't available. The difference is that jobs can often run on many CEs so it's better to pick one which is working, whereas file transfers always involve specific SEs so if they're down you have to wait.
Well OK, but the problem is (to my mind) that that process is broken anyway: use the info system to discover how to ask the info system.
I had a discussion on this with Maarten recently, and he also favoured direct configuration vs discovery. My point would be that in a large grid you may have hundreds of top-level BDIIs and hand-configuration becomes very hard - although you obviously have to configure at least one to get you started. You can compare with R-GMA, where the client is configured with the address of the local MON box but everything else is automatic, although the discovery is internal rather than via the Glue schema. Stephen

Hi,
I had a discussion on this with Maarten recently, and he also favoured direct configuration vs discovery. My point would be that in a large grid you may have hundreds of top-level BDIIs and hand-configuration becomes very hard - although you
I think at the time when we have several hundreds of top-level bdiis we already think about GLUE 3, maybe 4 :-) Currently, we harvested 70 instances and here are ambitions to even decrease this number. Felix
participants (3)
-
Burke, S (Stephen)
-
Felix Nikolaus Ehm
-
Paul Millar