Suggestion for splitting the StorageShare.

On further investigation of the Storage Schema an unmet use case has been found. Use Case: Consider two VOs who use two different paths to access the same StorageShare or alternatively one VO who can access the same StorageShare via two different paths. Where there could be an asymmetry with the ACLs for the SRM v2.2 space and the Path. The latest schema would require a StorageShare to be instantiated for each combination of path and SRM v2.2 space ACL, leading to n*m storage shares duplicating a great deal of information. The suggested solution is to split the StorageShare into the StorageShare and StorageEnvironment where there is a one to many relationship between the Environment and the Share. Both entities inherit the ACLs from the AccessPolicy StorageShare LocalID Path Tag StorageEnvironment LocalID ServingState AccessLatency RetentionPolicy ExpirationMode DefaultLifeTime MaximumLifeTime At the same time this allows the AggregationID to be removed.

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Laurence Field said: The latest schema would require a StorageShare to be instantiated for each combination of path and SRM v2.2 space ACL, leading to n*m storage shares duplicating a great deal of information.
Err, yes, that's the point I've been making repeatedly, including at the end of the meeting yesterday! So far the idea of keeping the schema simple has prevailed.
The suggested solution is to split the StorageShare into the StorageShare and StorageEnvironment where there is a one to many relationship between the Environment and the Share. Both entities inherit the ACLs from the AccessPolicy
Indeed - basically we would go back to what we have in the 1.3 schema (more or less), i.e. SA == Share == space token and VOInfo == <new thing> == space token description. Stephen

Hi Laurence, Laurence Field wrote:
On further investigation of the Storage Schema an unmet use case has been found.
Use Case: Consider two VOs who use two different paths to access the same StorageShare or alternatively one VO who can access the same StorageShare via two different paths. Where there could be an asymmetry with the ACLs for the SRM v2.2 space and the Path.
The latest schema would require a StorageShare to be instantiated for each combination of path and SRM v2.2 space ACL, leading to n*m storage shares duplicating a great deal of information.
it is better to state that the use cases are supported, but at the price or redundancy of information in certain situations. This was discussed several times and a final choice was made around a month ago. We opted to go for simplicity at the cost of redundancy. The criterion for choosing among the different approaches was to privilege simplicity in querying the information even though redundant data may need to be published in some situation. There is no real best solution. As Maarten described, you may have either big VO's with their own dedicated shares (so no redundancy) and small VO's sharing the same storage shares (redundancy occurs). For a while, we had a more "normalized" schema. The price to pay is a complexity of relationships and an higher cost at selection time (thing about making all the combinations in order to find out the best solution). And also keep in mind that representing associations does have a fixed cost of two attributes. If you really want to discuss again this, you should first draw a complete picture with all the entities, relationships and multiplicity (e.g., what about Capacity-like entities?). Otherwise, it may look simpler but it isn't when you go deeper into the details. Cheers, Sergio
The suggested solution is to split the StorageShare into the StorageShare and StorageEnvironment where there is a one to many relationship between the Environment and the Share. Both entities inherit the ACLs from the AccessPolicy
StorageShare LocalID Path Tag
StorageEnvironment LocalID ServingState AccessLatency RetentionPolicy ExpirationMode DefaultLifeTime MaximumLifeTime
At the same time this allows the AggregationID to be removed.
_______________________________________________ glue-wg mailing list glue-wg@ogf.org http://www.ogf.org/mailman/listinfo/glue-wg
-- Sergio Andreozzi INFN-CNAF, Tel: +39 051 609 2860 Viale Berti Pichat, 6/2 Fax: +39 051 609 2746 40126 Bologna (Italy) Web: http://www.cnaf.infn.it/~andreozzi

Ciao Sergio,
Use Case: Consider two VOs who use two different paths to access the same StorageShare or alternatively one VO who can access the same StorageShare via two different paths. Where there could be an asymmetry with the ACLs for the SRM v2.2 space and the Path.
Note the asymmetry! The ACLs for the name space need not be equal to those of the SRM v2.2 space. Putting both kinds of ACLs into a single StorageShare object implies they _must_ have different schemes. If they are in different objects, they can have the same scheme, e.g. "posix".
The latest schema would require a StorageShare to be instantiated for each combination of path and SRM v2.2 space ACL, leading to n*m storage shares duplicating a great deal of information.
it is better to state that the use cases are supported, but at the price or redundancy of information in certain situations.
This was discussed several times and a final choice was made around a month ago. We opted to go for simplicity at the cost of redundancy.
The criterion for choosing among the different approaches was to privilege simplicity in querying the information even though redundant data may need to be published in some situation.
Yes, this seemed the best compromise at that time. Now, however, we have another reason for splitting off the common attributes into a separate object that is _only_ linked to the StorageShare.
There is no real best solution. As Maarten described, you may have either big VO's with their own dedicated shares (so no redundancy) and small VO's sharing the same storage shares (redundancy occurs).
For a while, we had a more "normalized" schema. The price to pay is a complexity of relationships and an higher cost at selection time (thing about making all the combinations in order to find out the best solution). And also keep in mind that representing associations does have a fixed cost of two attributes.
If you really want to discuss again this, you should first draw a complete picture with all the entities, relationships and multiplicity (e.g., what about Capacity-like entities?). Otherwise, it may look simpler but it isn't when you go deeper into the details.
StorageShare * --> 1 StorageEnvironment Capacity can just stay with StorageShare for simplicity. It will then keep reporting the numbers as experienced by the FQANs mentioned in the ACL entries. Thanks, Maarten
The suggested solution is to split the StorageShare into the StorageShare and StorageEnvironment where there is a one to many relationship between the Environment and the Share. Both entities inherit the ACLs from the AccessPolicy
StorageShare LocalID Path Tag
StorageEnvironment LocalID ServingState AccessLatency RetentionPolicy ExpirationMode DefaultLifeTime MaximumLifeTime
At the same time this allows the AggregationID to be removed.
_______________________________________________ glue-wg mailing list glue-wg@ogf.org http://www.ogf.org/mailman/listinfo/glue-wg

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Maarten Litmaath said: Note the asymmetry! The ACLs for the name space need not be equal to those of the SRM v2.2 space. Putting both kinds of ACLs into a single StorageShare object implies they _must_ have different schemes.
If I've understood what you're saying I don't think the namespace ACLs should be in GLUE at all, the granularity is too fine - for example the ACLs could be different for every directory in the tree. In theory we are always told that paths don't matter for SRM, so the client should just negotiate with the server once it finds a space it can use. For VOs (or sites) which insist on having a fixed path mapping I think they need to ensure that the ACLs are set appropriately without needing to have them published explicitly.
StorageShare * --> 1 StorageEnvironment
Capacity can just stay with StorageShare for simplicity. It will then keep reporting the numbers as experienced by the FQANs mentioned in the ACL entries.
Well, if we do it properly I think we'd need to split the Capacity into pieces - the Used space would be per VO (or FQAN), but the Total and Free space are by definition shared and hence should be in the Share (Reserved may need more thought). For me the main reason to go this way is to reduce the duplication of information, so keeping many-times-duplicated values for the sizes doesn't seem that good. Stephen

Hi Stephen,
If I've understood what you're saying I don't think the namespace ACLs should be in GLUE at all, the granularity is too fine - for example the ACLs could be different for every directory in the tree. In theory we are always told that paths don't matter for SRM, so the client should just negotiate with the server once it finds a space it can use. For VOs (or sites) which insist on having a fixed path mapping I think they need to ensure that the ACLs are set appropriately without needing to have them published explicitly.
This is what I was trying to express the other day when I said that we need something like the LFC. Where as the LFC maps to logical names to SURLs, is this notmapping namespaces (directories) to physical spaces (or is it logical space?). I agree that this might not belong in the information system on the other hand this doesn't necessarily mean that it should not be in the information model. Laurence

This is what I was trying to express the other day when I said that we need something like the LFC. Where as the LFC maps to logical names to SURLs, is this notmapping namespaces (directories) to
Laurence Field [mailto:Laurence.Field@cern.ch] said: physical spaces
(or is it logical space?).
I agree that this might not belong in the information system on the other hand this doesn't necessarily mean
To some extent that's right - in fact the LFC is to some approximation a DPM without any storage, just a namespace mapping. However, things are getting more complicated since it seems that we're introducing a new concept of authorisation on spaces themselves which is orthogonal to the namespace. In the unix filesystem world I think this would be like having a permission to write to a mounted partition which was independent of the permissions on the files and directories inside it. That isn't really analogous to the LFC or any other catalogue - in a sense it's much simpler because the namespace for spaces (sic) is flat, each space has a name (space token) but there is no structure. (Well, until we get hierarchical spaces!) that
it should not be in the information model
Maybe, but I think we need to be clear about what we're representing. To me the Path in the SA/Share is a prefix for use when writing a file, i.e. when you write a file you are supposed to construct a SURL by appending your choice of file path/name to the published Path. In the general case the Path would be absent/NULL or equal to / which would mean you can use any SURL you like. That concept seems to me to be very different to publishing general properties of a filesystem/namespace, which could e.g. extend to publishing every directory with its permissions - I think that is not something which is likely to be useful in GLUE. (Remember that the old GLUE does have FileSystem and File objects but we have never used them.) Stephen

You do not need to publish all possible paths but only the "relevant"ones. And I agree with Laurence. Flavia Laurence Field wrote:
Hi Stephen,
If I've understood what you're saying I don't think the namespace ACLs should be in GLUE at all, the granularity is too fine - for example the ACLs could be different for every directory in the tree. In theory we are always told that paths don't matter for SRM, so the client should just negotiate with the server once it finds a space it can use. For VOs (or sites) which insist on having a fixed path mapping I think they need to ensure that the ACLs are set appropriately without needing to have them published explicitly.
This is what I was trying to express the other day when I said that we need something like the LFC. Where as the LFC maps to logical names to SURLs, is this notmapping namespaces (directories) to physical spaces (or is it logical space?). I agree that this might not belong in the information system on the other hand this doesn't necessarily mean that it should not be in the information model.
Laurence _______________________________________________ glue-wg mailing list glue-wg@ogf.org http://www.ogf.org/mailman/listinfo/glue-wg

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Flavia Donno said: You do not need to publish all possible paths but only the "relevant"ones.
Well, yes, but relevant for what? Do you want to go beyond the current use case, i.e. a root path in the namespace? Stephen

No, just the root. The point is that, as you have heard during the workshop, there might be 2 roots for the same share, like in the PIC case. Flavia Burke, S (Stephen) wrote:
glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Flavia Donno said: You do not need to publish all possible paths but only the "relevant"ones.
Well, yes, but relevant for what? Do you want to go beyond the current use case, i.e. a root path in the namespace?
Stephen

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Flavia Donno said: No, just the root. The point is that, as you have heard during the workshop, there might be 2 roots for the same share, like in the PIC case.
OK, we discussed this a bit ... we can clearly do it, but it's likely to make queries significantly more complicated. The suggestion was to come up with an explicit proposal to see what the impact would be. However, I think there's also a question of how common this use case would really be; we've been constantly told that spaces are orthogonal to paths (re-iterated in your MOU addendum proposal!) and yet you're now suggesting we need multiple defined paths within spaces. Does this just mean that something is wrong in the way the SRM is being used? After all, fundamentally SURLs have no real significance, it's the LFN/GUID in the catalogue which really identifies the file. Stephen

Just because they are orthogonal, there is no association between paths and spaces. Therefore, I was proposing 2 classes, one to describe the spaces with their ACLs and one to describe the namespace with its ACLs, if needed, of course. The space described in which physical pool the file ends up. The namespace describes how logically the files are organized and who has access to them. Flavia Burke, S (Stephen) wrote:
glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Flavia Donno said: No, just the root. The point is that, as you have heard during the workshop, there might be 2 roots for the same share, like in the PIC case.
OK, we discussed this a bit ... we can clearly do it, but it's likely to make queries significantly more complicated. The suggestion was to come up with an explicit proposal to see what the impact would be. However, I think there's also a question of how common this use case would really be; we've been constantly told that spaces are orthogonal to paths (re-iterated in your MOU addendum proposal!) and yet you're now suggesting we need multiple defined paths within spaces. Does this just mean that something is wrong in the way the SRM is being used? After all, fundamentally SURLs have no real significance, it's the LFN/GUID in the catalogue which really identifies the file.
Stephen

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Flavia Donno said: Just because they are orthogonal, there is no association between paths and spaces. Therefore, I was proposing 2 classes, one to describe the spaces with their ACLs and one to describe the namespace with its ACLs, if needed, of course. The space described in which physical pool the file ends up. The namespace describes how logically the files are organized and who has access to them.
But that's going to make for a fairly complicated query, effectively you have to do a join between the authz on the space (token) and the authz on the path - which are perhaps not even expressed the same way, e.g. VO:atlas on one and VOMS:/atlas/Role=Production on the other. And you would have to code support for such queries in all clients even if in most cases they weren't needed. The question is whether we have any serious use cases for this kind of thing - and if we have, whether we can support them with something simpler than the fully generic structure. Stephen

Hi Stephen, others.. On Monday 28 April 2008 17:47:20 Burke, S (Stephen) wrote:
glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Maarten Litmaath said: Note the asymmetry! The ACLs for the name space need not be equal to those of the SRM v2.2 space. Putting both kinds of ACLs into a single StorageShare object implies they _must_ have different schemes.
If I've understood what you're saying I don't think the namespace ACLs should be in GLUE at all, the granularity is too fine
Moreover, the ACL semantics will (in general) be filesystem-specific. To illustrate this, consider: what is the default policy for user X (or, users undertaking role Y) if they have no listed ACE for operation Z (default allow or default deny)? What are the allowed ACE operation types (how do I distinguish between an SE that doesn't understand "restage" operations vs a user that happens not to have an ACE for "restage", so the default policy should apply)? If conflicting ACEs are available (user X, under role Y is attempting operation Z; X is deny doing Z, Y is allowed to do Z), what is the policy? (default deny, default allow, use default policy for operation Z, ...) Should the ACL be interpreted as an ordered list (first match wins) or some other policy (try all "denies" then all "allows" then use default, first match wins; or, try all "allows" then "denies" then use default, first match wins; or, ...)? ... etc ... The point isn't what you or I might think of as "correct" behaviour, just that different people (and different filesystems) will interpret these questions differently. Perhaps GLUE can only publish ACLs if it also publishes some identifier that specifies how to interpret the ACLs. The identifier namespace could be a URI (to be general), or simply list the filesystem, although that's sub-optimal as multiple filesystems might implement the same ACL. Basically, I don't believe GLUE can publish meaningful ACLs without the schema becoming sufficiently complex that no one could publish the information correctly, or correctly interpret the information they receive.
StorageShare * --> 1 StorageEnvironment
Capacity can just stay with StorageShare for simplicity. It will then keep reporting the numbers as experienced by the FQANs mentioned in the ACL entries.
Well, if we do it properly I think we'd need to split the Capacity into pieces - the Used space would be per VO (or FQAN), but the Total and Free space are by definition shared and hence should be in the Share (Reserved may need more thought).
Yup, sure. This comes from the subclassing of the parent StorageCapacity (iirc) class. If we revert StorageShare back to (something like) an SRM Space, then a StorageShareCapacity can have different attributes to a Storage<VOView>Capacity, whilst both being a similar kind of "thing" (both are subclasses of StorageCapacity). [Queue continued discussion on whether StorageCapacity is a "thing" ;-] Just my 2c worth. Paul.

Paul Millar [mailto:paul.millar@desy.de] said:
Moreover, the ACL semantics will (in general) be filesystem-specific.
Well, in the medium term I think the implementations will have to converge on a common authorisation model, or at least allow a grid like WLCG to have a common profile which gives the same behaviour at all sites. However, we seem to be quite a way from that at the moment, and the recent decision seems to be that we back off from that and approach it from a different angle.
The point isn't what you or I might think of as "correct" behaviour, just that different people (and different filesystems) will interpret these questions differently.
Err, yes - and if it is like that (as often seems to be the case in the grid world) it will be totally unusable in practice! (Cue another couple of years of Flavia's tests to get everything to have consistent behaviour ...) Stephen

On Mon, 28 Apr 2008 17:36:49 +0100 "Burke, S (Stephen)" <S.Burke@rl.ac.uk> wrote:
Paul Millar [mailto:paul.millar@desy.de] said:
Moreover, the ACL semantics will (in general) be filesystem-specific.
Hello Steven,
Well, in the medium term I think the implementations will have to converge on a common authorisation model,
I hope so, but cannot see a reason why this as a desire, should be driven by Glue. It is definitely not a requirement for Glue to be useful. The desire though is noble provided it is practical as it may make grid end user application development easier. Why will Glue be less useful if it does not include these details?
or at least allow a grid like WLCG to have a common profile which gives the same behaviour at all sites. However, we seem to be quite a way from that at the moment, and the recent decision seems to be that we back off from that and approach it from a different angle.
The point isn't what you or I might think of as "correct" behaviour, just that different people (and different filesystems) will interpret these questions differently.
Err, yes - and if it is like that (as often seems to be the case in the grid world) it will be totally unusable in practice!
Here I disagree, We have abstraction as our friend here, Grid catalogues and FTS allow data to be transferred. Can you explain why expressing ACL's is a requirement for Glue?
(Cue another couple of years of Flavia's tests to get everything to have consistent behaviour ...)
Stephen
Since 2/3 of the storage of the wLCG grid is stored in dCache, and 100% of NorduGrid and OSG is mostly represented dCache which will support a standard ACL, I do not believe Glue is enhanced by ignoring the use cases such as representing Castor and DPM which make up 1/3 of Glue storage in wLCG, as this is a significant minority, so we should not publish ACL's particularly since the grid has moved on (FOC and pilot jobs) and further markup can be provided for VO's outside the Glue schema for using the legacy of a wLCG interpretation of ACL's. I see no problem with experiments working around the differences of minority storage services such as Castor (with so few deployments it cannot be the other way around), provided the implementation and version can be discovered. Expecting Castor and DPM to change their getting and setting ACL model at this stage of development just to be pure in our Glue implementation I think is a bad idea. Regards Owen Synge

Owen Synge [mailto:owen.synge@desy.de] said:
Well, in the medium term I think the implementations will have to converge on a common authorisation model,
I hope so, but cannot see a reason why this as a desire, should be driven by Glue.
It shouldn't - that was a general observation about the usefulness of grids.
Why will Glue be less useful if it does not include these details?
The *grid* will be less useful if authz is not interoperable. The extent to which glue needs to reflect authz depends on the use cases - but if the underlying middleware can't support a particular use case then glue is pretty much powerless to help.
Err, yes - and if it is like that (as often seems to be the case in the grid world) it will be totally unusable in practice!
Here I disagree, We have abstraction as our friend here, Grid catalogues and FTS allow data to be transferred.
Only to the extent that they make assumptions that everything is configured suitably - we regularly have failures because permissions are set incorrectly (in a different mailing list I just saw a request for a sysadmin to reconfigure a DPM for exactly that reason).
Can you explain why expressing ACL's is a requirement for Glue?
Expressing *some* ACLs is a requirement to support use cases like "find me a CE where I'm authorised to run a job and where my job will run as quickly as possible", or "find me an SE where I'm authorised to write atlas DPDs and which has enough space for 10 Tb of data".
Expecting Castor and DPM to change their getting and setting ACL model at this stage of development just to be pure in our Glue implementation I think is a bad idea.
I don't, I expect (or at least hope) that dcache and castor (and storm) will change to support the DPM model, and the reason for that is nothing to do with glue, it's because anything else makes life very difficult for a VO like atlas which has to use all of them (and because DPM seems to have the best model for LCG purposes). The same goes for any necessary piece of functionality whether represented in glue or not, e.g. the fact that castor treats pinning differently from all the others also makes life difficult (ultimately it's the reason for all the complications of D1T1 spaces, which does impact on Glue and many other things). Stephen
participants (7)
-
Burke, S (Stephen)
-
Flavia Donno
-
Laurence Field
-
Maarten Litmaath
-
Owen Synge
-
Paul Millar
-
Sergio Andreozzi