Fwd: Re: New attribute for GLUE 2.1: StorageShare.ViewID

Hi all, Sorry, my email client decided to sent the reply directly to Maria, so I'm forwarding it here. -------- Forwarded Message -------- Hi Maria, I started off replying generally, rather than point-for-point, but have included some specific points where it makes sense. Yes, dCache instances would move to GLUE 2.1 info-providers, but I would hope to do that anyway.
Do you mean SRM shares could overlap with Physical Shares or do you mean that SRM shares overlap among themselves and the same with Physical Shares?
I mean that, for dCache: a. an SRM share do not overlap with any other SRM share. b. a Physical share do not overlap with any other physical share. c. a Physical share may overlap with a SRM share. Statement b. isn't completely true currently (as expressed by the SharingID), but I aim to fix that soon.
In any case, if only Physical Shares are intended to be used for installed capacity, this should be the ones used by sites who may want to rely on the BDII for installed capacities, right?
Yes, that's exactly right! The problem is GLUE 2.0 has no way for me to say "these shares are different from those ones", or "these shares are the ones to use for storage accounting". The ViewID is aimed exactly at fixing this: I give all the physical shares the same ViewID ("physical" say) and all the SRM shares the same ViewID ("srm-reservations", say) and then it's easy to select the Physical shares.
What is then published in GLUE2StorageServiceCapacity? The total of Physical Shares?
Yes.
But who has asked for this? My understanding is that nobody is interested on relying on the BDII for this.
I had to smile inwardly here ... Yes, there was an effort, starting around five years ago, in precisely this direction: https://twiki.cern.ch/twiki/pub/LCG/WLCGCommonComputingReadinessChallenges/W... "The goal of this document is to detail how to use the Glue Schema version 1.3 in order to publish information to provide the WLCG management with a view of the total installed capacity and resource usage by the VOs at sites. This information can also be used by VO operations and management in order to monitor the VO usage of the resource" Myself (and others) invested considerable time in trying to provide these numbers with GLUE 1.3. GLUE 2.0 was developed with this use-case in mind, albeit (from memory) not extensively reviewed.
[...] I´m not sure is a use case we would like to address.
It seems at least one site wishes to make use of these numbers and to automate the process. I don't think we should give up, just because the problem isn't solved yet.
I made a comparison in February between WLCG Accounting reports and what the BDII and REBUS were publishing in terms of installed capacities. And I think there was no single match for any T1:
http://wlcg-mon.cern.ch/dashboard/request.py/siteview#currentView=WLCG+Accou... I believe there are many factors influencing such dependencies; however, this would suggest a need for more automation, not less. Such dependencies should be investigated and understood. Yes, more work is needed, investigations, etc; however, we have to start somewhere, so I gave a concrete proposal so people have something they can understand: a straw-man they can hack away at. Cheers, Paul.

Dear Paul,
In any case, if only Physical Shares are intended to be used for installed capacity, this should be the ones used by sites who may want to rely on the BDII for installed capacities, right?
Yes, that's exactly right!
OK, so why do you need the ViewID? Just tell the site interested in using the BDII to calculate installed capacities to use the Physical Shares. As you say, Physical shares do not publish the Tag attribute, so it's easy to know which ones they are.
The problem is GLUE 2.0 has no way for me to say "these shares are different from those ones", or "these shares are the ones to use for storage accounting".
Well, as I said before, Physical Shares are the ones to be used for the installed capacity and there is a way to know whether a share is SRM or Physical. So where is the problem?
I had to smile inwardly here ... Yes, there was an effort, starting around five years ago, in precisely this direction:
https://twiki.cern.ch/twiki/pub/LCG/WLCGCommonComputingReadinessChallen ges/WLCG_GlueSchemaUsage-1.8.pdf
That document is for GLUE 1.3. We don't have something similar for GLUE 2.0. And I can tell you WLCG is not interested in getting installed capacities correctly published in the BDII. They have given up and they have the REBUS interface instead.
It seems at least one site wishes to make use of these numbers and to automate the process. I don't think we should give up, just because the problem isn't solved yet.
I wouldn't introduce this change in the schema only because one site has asked for it. Moreover, and after your answer to Stephan (which by the way it's a bit confusing) there are ways for that site to calculate the installed capacity using the current attributes. Regards, Maria

Hi Maria, On 26/09/14 14:27, Maria Alandes Pradillo wrote:
As you say, Physical shares do not publish the Tag attribute, so it's easy to know which ones they are.
It doesn't say that in the GLUE 2.0. It says "An identifier defined by a User Domain which identifies a Share with a specific set of properties." There's nothing written *anywhere* that says only shares without a Tag should be used for accounting. Also, introducing such a rule could break the existing info-providers: what about DPM and StoRM? The point is: yes, I can publish something in dCache info-provider that allows a client to get the right answer, but either the client needs to know some dCache-special thing, or the solution is fragile and will break in the future. What I want (and what you said you wanted) was all storage systems to be handled in the same way.
That document is for GLUE 1.3. We don't have something similar for GLUE 2.0.
True, we hoped it wouldn't be necessary to have a special document for GLUE 2.0 as GLUE 2.0 should handle this use-case without profiling.
And I can tell you WLCG is not interested in getting installed capacities correctly published in the BDII. They have given up and they have the REBUS interface instead.
Yes, I guess they gave up forcing everyone to use it because the initiative failed --- although nobody in WLCG management has made this explicit. What I'm saying is that not everyone has given up: some people still want to get the numbers this way. If we can fix the accounting then it becomes possible. As you say, the current REBUS accounting is broken: the numbers don't match up. Let's try to fix it.
I wouldn't introduce this change in the schema only because one site has asked for it. Moreover, and after your answer to Stephan (which by the way it's a bit confusing) there are ways for that site to calculate the installed capacity using the current attributes.
The point is that one can always publish "something" and "somehow get it to work". This is how we published SRM information in GLUE 1.3: naming conventions and duplicating information. With GLUE 2.0, we tried to fix that: to get rid of fragile work-arounds like that. The result is much better but there are still problems. We should try to fix those problems. Cheers, Paul.

Dear Paul,
The point is: yes, I can publish something in dCache info-provider that allows a client to get the right answer, but either the client needs to know some dCache- special thing, or the solution is fragile and will break in the future.
Which clients are you talking about here?
True, we hoped it wouldn't be necessary to have a special document for GLUE 2.0 as GLUE 2.0 should handle this use-case without profiling.
Then what you may want to start is a discussion on writing a similar document for GLUE 2.0 and involve more people from WLCG on this task. The GLUE WG is not the right place to bring this discussion until you get agreement from WLCG.
Yes, I guess they gave up forcing everyone to use it because the initiative failed - -- although nobody in WLCG management has made this explicit.
Well, it´s very explicit. Sites need to manually send their installed capacities every month using the REBUS reporting feature to be able to generate the WLCG accounting reports. They don´t use the BDII for this and they are not planning to use it.
As you say, the current REBUS accounting is broken: the numbers don't match up. Let's try to fix it.
No, I haven´t said that. I have said BDII and WLCG accounting reports do not show the same numbers in terms of installed capacities. But what WLCG relies on is WLCG accounting reports. Regards, Maria

Maria Alandes Pradillo [mailto:Maria.Alandes.Pradillo@cern.ch] said:
Well, it´s very explicit. Sites need to manually send their installed capacities every month using the REBUS reporting feature to be able to generate the WLCG accounting reports. They don´t use the BDII for this and they are not planning to use it.
Providing the numbers manually every month sounds quite labour-intensive - if we could show that a simple query would reliably produce the information automatically it seems hard to believe that people wouldn't prefer to use it. But anyway, the question of whether we have current uses for information isn't really the right argument - we have many attributes which aren't used at the moment. The question is whether the information is potentially useful - changing the schema is so slow and infrequent that we need to look at possibilities a long way in the future. Adding one new attribute is a rather lightweight thing to do so I don't think it needs a very strong justification. What we do need to be sure of is that the attribute is well-defined, satisfies the identified use-case and would not disrupt the existing usage. Stephen -- Scanned by iCritical.

Dear Stephen,
Providing the numbers manually every month sounds quite labour-intensive - if we could show that a simple query would reliably produce the information automatically it seems hard to believe that people wouldn't prefer to use it. But anyway, the question of whether we have current uses for information isn't really the right argument - we have many attributes which aren't used at the moment. The question is whether the information is potentially useful - changing the schema is so slow and infrequent that we need to look at possibilities a long way in the future. Adding one new attribute is a rather lightweight thing to do so I don't think it needs a very strong justification. What we do need to be sure of is that the attribute is well-defined, satisfies the identified use-case and would not disrupt the existing usage.
I don´t think there is a use case. Only one site has asked for it. If Paul would like to convince WLCG to use the BDII to rely on installed capacities, he would need to present this at the GDB, after having agreed with all other storage systems that it is feasible to have reliable numbers published by them too. Something which we don´t know. Otherwise, there is little chance that all this effort is useful at all. Making this work for dCache is not enough. It has to work for all available storage systems. That is why requesting this new attribute in the GLUE WG at this point doesn´t make sense. And I can already tell you from conversations with the WLCG management that there is no interest to have this in the BDII because there is this manual mechanism in place which have been used for many months now and which they rely on. Adding one attribute may be a rather lightweight thing to do, but I think it needs a very strong justification. What happens if tomorrow StoRM asks for another one, DPM for a different one, and maybe CREAM CE too? It may be optional attributes for the others that may not disrupt existing info providers but it´s fundamentally wrong. The schema should be the same for all the services, we can´t make exceptions. The schema shouldn´t adapt to the services but the services to the schema, and if something is missing, it should be something missing for all, not for one service. I´m a bit scared with this change, because if we agree we do this for dCache, we may have to face similar requests in the future as well and what are we going to say? Why is dCache publishing space tokens and physical space? Is this needed? How is this done in DPM or StoRM? Do they have the same problem? I think we have another forum to discuss this things before coming to the GLUE WG with a concrete proposal. I would start there first. I am interested in getting storage capacity properly published in the BDII and I have attempted to get this right for many months. I´m happy to see interest on this particular subject. In any case, I´m not interested in getting this right for the WLCG accounting use case, as I said it´s not needed there, but because experiments have expressed they wouldn´t like to rely on SRM for getting storage capacities. This is why I have set up the BDII vs SRM comparators in the dashboard for ATLAS and LHCb, which by the way, look pretty well, which means that for these use cases we are doing pretty well, also dCache: http://wlcg-mon.cern.ch/dashboard/request.py/siteview#currentView=BDII+vs+SR... http://wlcg-mon.cern.ch/dashboard/request.py/siteview#currentView=BDII+vs+SR... Regards, Maria

Maria Alandes Pradillo [mailto:Maria.Alandes.Pradillo@cern.ch] said:
I don´t think there is a use case. Only one site has asked for it.
The use case is from dcache itself - Paul is saying that there is significant information about dcache that he can't publish now and would like to. The work to implement that would be on the dcache team, so as far as I can see it's mainly a matter for them whether they think it's worthwhile. I don't see why LCG would have a case to object to it, as long as the new information would not interfere with anything that LCG does care about. What exactly is your objection to Paul's proposal, i.e. what harm do you think it does? "Not needed by LCG" is not an objection - the Grid world is not only LCG!
Adding one attribute may be a rather lightweight thing to do, but I think it needs a very strong justification. What happens if tomorrow StoRM asks for another one, DPM for a different one, and maybe CREAM CE too?
If they did I would be pleased, because it would show they were taking an interest in GLUE. This is the first chance we've had to add things to GLUE since 2008, and it could easily be many years before there's another one. People should indeed take this opportunity to consider whether there are things which were missed in the original specification but which are potentially useful. If something gets added which turns out not to be useful it does no particular harm, it just doesn't get published, but if you leave something out and then realise you do want it there's nothing that can be done until the next revision.
It may be optional attributes for the others that may not disrupt existing info providers but it´s fundamentally wrong. The schema should be the same for all the services, we can´t make exceptions. The schema shouldn´t adapt to the services but the services to the schema, and if something is missing, it should be something missing for all, not for one service.
I think that's the wrong way to look at it. One of the basic problems with GLUE 1 was that we tied it too specifically to the particular services we had at the time. For GLUE 2 we tried to have a much more general structure which could be adapted to the needs of many different services. The schema itself is of course the same for all services, but the details of how it's used may well not be. In this particular case it has always been true that dcache implements reservations [1] (SRM spaces) in a fundamentally different way to the other SEs. I don't think it's in any way realistic to say that services have to adapt their mode of operation to the schema - the point of the schema is to be able to describe services as they are. The reason something here is missing for dcache and not, say, DPM, is that dcache doesn't work like DPM, so assumptions which are true for DPM are not true for dcache.
I´m a bit scared with this change, because if we agree we do this for dCache, we may have to face similar requests in the future as well and what are we going to say?
I would say the same things I'm saying now! Stephen [1] My analogy is hotels vs restaurants. A restaurant reserves a specific table, so you can query any of its properties in advance. A hotel usually only picks which room to give you when you arrive at the desk, so before that you can't query its properties, only the properties of the reservation (e.g. non-smoking). -- Scanned by iCritical.

Dear Stephen,
The use case is from dcache itself - Paul is saying that there is significant information about dcache that he can't publish now and would like to. The work to implement that would be on the dcache team, so as far as I can see it's mainly a matter for them whether they think it's worthwhile. I don't see why LCG would have a case to object to it, as long as the new information would not interfere with anything that LCG does care about. What exactly is your objection to Paul's proposal, i.e. what harm do you think it does? "Not needed by LCG" is not an objection - the Grid world is not only LCG!
If I understood it correctly, Paul is saying he wants to change this because one site would like to calculate installed capacities for WLCG using the BDII. I know the grid world is not only WLCG. I'm saying WLCG has other means to collect this information, so I'm not sure the effort required to implement and deploy this is worth it, WLCG is not going to use it. If there are other use cases apart for WLCG, Paul hasn't mentioned them. I also said dCache is of course free to choose where they put their effort and what new features they want to implement. But when it comes to something that is used by other services and it's meant to help in terms of interoperability, I think further discussion is needed. Not only with the GLUE WG.
If they did I would be pleased, because it would show they were taking an interest in GLUE. This is the first chance we've had to add things to GLUE since 2008, and it could easily be many years before there's another one. People should indeed take this opportunity to consider whether there are things which were missed in the original specification but which are potentially useful. If something gets added which turns out not to be useful it does no particular harm, it just doesn't get published, but if you leave something out and then realise you do want it there's nothing that can be done until the next revision.
As I already said, for me it would make more sense if Paul presents this to other storage developers and see whether the idea could be interesting for them too and whether it fits for them or needs to be adapted a little bit. Coming directly to the GLUE WG, getting this approved and then having it in GLUE 2.1, doesn't seem to be very fair for the others. When they want to react, maybe it will be too late. When will we have another GLUE 2.2 or something similar? Getting new versions of the schema approved and deployed is a very long process.
I think that's the wrong way to look at it. One of the basic problems with GLUE 1 was that we tied it too specifically to the particular services we had at the time. For GLUE 2 we tried to have a much more general structure which could be adapted to the needs of many different services. The schema itself is of course the same for all services, but the details of how it's used may well not be.
Well, this is not so clear to me. If dCache introduces the ViewID, what should ginfo do? Should it have an option to get this information? How will users know that actually only dCache provides ViewID? It will be misleading. That is why for me it's dangerous to say, "this is only for dCache". Well, it's mandatory for dCache and dCache is going to rely on it, whereas the others are not going to publish it at all. This doesn't sound very consistent to me. Regards, Maria

Paul Millar [mailto:paul.millar@desy.de] said:
The ViewID is aimed exactly at fixing this: I give all the physical shares the same ViewID ("physical" say) and all the SRM shares the same ViewID ("srm-reservations", say) and then it's easy to select the Physical shares.
Hmm ... calling this an ID suggests that it should just be an opaque identifier, but your values seem to have some semantics - are you sure this shouldn't be an enumerated type? More generally: having it as an internally generated identifier is useful if the number of possible values is large or undefined, or where you want the values to be defined by the implementation or even an instance. However, your example seems to have only two cases with rather well-defined semantics, basically physical and logical. On top of that it seems that clients would need to know the semantics to use the information, which means the values would need to be defined, or else the semantics would have to be deducible in some indirect way - which is what you were objecting to with the CapacityType ... Stephen -- Scanned by iCritical.
participants (3)
-
Maria Alandes Pradillo
-
Paul Millar
-
stephen.burke@stfc.ac.uk