
Hi all, I've tried updated the storage objects "thoughts" document with feedback from people. (Jen, my apologies, I now see what you meant about covering sets: we share the same understanding but what I had written was just wrong .. I've tried to get it more right this time). Sorry if I've missed out someone's a comment, please yell. Also, I've tried to separate the more informative paragraphs by marking them differently. One way of using this information would be to fold the more normative parts into the GLUE spec proper. The informative paragraphs could then form an informative annex to the document; c.f. http://standards.ieee.org/guides/companion/part1.html#annexes Cheers, Paul. --- Some thoughts on GLUE Storage objects (v0.2) --- These are some notes on the different objects in the forthcoming GLUE v2.0 information model, concentrating on those responsible for storage. These notes provide a mixture of normative and informative information. The non-normative (informative) paragraphs are red-lined (marked with a "|" on the left-most edge). This should be read with the GLUE schema and, if there are any discrepancies, the GLUE schema document is authoritative. Please email corrections to <paul.millar@desy.de> or the GLUE working group <glue-wg@ogf.org>. UserDomain: A collection of one or more end-users. All end-users that interact with the physical storage are a member of a UserDomain. [much more can (and perhaps should) be said here] | A Virtual Organisation (VO) is an instance of a UserDomain. A group | within a VO may be represented as a UserDomain. In general, users | derive their authorisation from their membership. StorageCapacity: A StorageCapacity object describes the ability to store data within a homogeneous storage technology. Each object provides a view of that physical storage medium with a common access latency. All StorageCapacity objects are specified within a certain context. The context is determined by an association between the StorageCapacity object and precisely one other higher-level object. | These associations are not listed here, but are described in later | sections. | | In general, a StorageCapacity object will record some | context-specific information. Examples of such information include | the total storage capacity of the underlying technology and how much | of that total has been used. | | The underlying storage technology may affect which of the | context-specific attributes are available. For example, tape storage | may be considered semi-infinite, so the total and free attributes have | no meaning. If this is so, then it affects all StorageCapacity objects with | the same underlying technology, independent of their context. | | Different contexts may also affect what context-specific attributes | are recorded. This is a policy decision when implementing GLUE, as | recording all possible information may be costly and provide no great | benefit. | | [Aside: these two reasons are why many of the attributes within | StorageCapacity are optional. Rather than explicitly subclassing | the objects and making the values required, it is left deliberately | vague which attributes are published.] | | A StorageCapacity may represent a logical aggregation of multiple | underlying storage technology instances; for example, a | StorageCapacity might represent many disk storage nodes, or many | tapes stored within a tape silo. GLUE makes no effort to record | information at this deeper level; but by not doing so, it requires | that the underlying storage technology be homogeneous. Homogeneous | means that the underlying storage technology is either identical or | sufficiently similar that the differences don't matter. | | In most cases, the homogeneity is fairly obvious (e.g., tape storage | vs disk-based storage), but there may be times where this distinction | becomes contentious and judgement may be required; for example, | the quality of disk-base storage might indicate that one subset is | useful for a higher-quality service. If this is so, then it may make | sense to represent the different class of disk by different | SpaceCapacities. StorageEnvironment: A StorageEnvironment is a collection of one or more StorageCapacities with a set of associated (enforced) storage management policies. | Examples of these policies are Type (Volatile, Durable, Permanent) | and RetentionPolicy (Custodial, Output, Replica). | | In general, a StorageEnvironment may have one or more | RetentionPolicy values. If it has more than one, all data stored | using this StorageEnvironment will adopt precisely one of the values | present. The policy describing how data is assigned a | RetentionPolicy value is not specified; for example, GLUE does not | record a default RetentionPolicy. GLUE also does not specify | whether it is possible to migrate data from one RetentionPolicy | value to a different value whilst using the same | StorageEnvirionment. StorageEnvironments act as a logical aggregation of StorageCapacities, so each StorageEnvironment must have at least one associated StorageCapacity. | It is the associated StorageCapacities that allow a | StorageEnvironment to store data with its advertised policies; for | example, to act as (Permanent, Custodial) storage of data. | | Since a StorageEnvironment may contain multiple StorageCapacities, | it may describe a heterogeneous environment. An example of this is | "tape storage", which has both tape back-end and disk front-end into | which users can pin files. Such a StorageEnvironment would have two | associated StorageCapacities: one describing the disk storage and | another describing the tape. If a StorageCapacity is associated with a StorageEnvironment, it is associated with only one. A StorageCapacity may not be shared between different StorageEnvironments. | StorageCapacities associated with a StorageEnvironment must be | non-overlapping with any other such StorageCapacity and the set of | all such StorageCapacities must represent the complete storage | available to end-users. Each physical storage device (e.g., | individual disk drive or tape) that an end-user can utilise must be | represented by (some part of) precisely one StorageCapacity | associated with a StorageEnvironment. | | Nevertheless, the StorageCapacities associated with | StorageEnvironments may be incomplete as a site may deploy physical | storage devices that are not directly under end-user control; for | example, disk storage used to cache incoming transfers. GLUE makes | no effort to record information about such storage. StorageResource: A StorageResource is a logically separable component that provides the management functionality described in one or more StorageEnvironments. More than one StorageResource may cooperate to provide an advertised StorageEnvironment. | Typically, a StorageResource will describe a running instance of | some software, but it may describe some commodity hardware. A | StorageResource should, under normal circumstances, have at least | one StorageEnvironment, otherwise there wouldn't be much point | publishing information about it. | | All StorageEnvironments must be part of at least one StorageResource | as the StorageResource is how the management attributes described in | the StorageResource are provided. | | GLUE makes no attempt to record which physical storage (as | represented by StorageCapacity objects) is under control of which | StorageResource. StorageShare: A StorageShare is a logical partitioning of one or more StorageEnvironments. | Perhaps the simplest example of a StorageShare is one associated | with a single StorageEnvironment with a single associated | StorageCapacity, and that represents all the available storage of | that StorageCapacity. An example of a storage that could be | represented by this trivial StorageShare is the classic-SE. StorageShare must have one or more associated StorageCapacities. These StorageCapacities provide a view of the different homogeneous underlying technologies that are available under the space. | The StorageCapacities within the StorageShare context need not | describe all storage: the number of StorageCapacities associated | with a StorageShare may be less than the sum of the number of | StorageCapacities associated with each of the StorageShare's | associated StorageEnvironments. | | There is an *implicit* association between the StorageCapacity | associated with a StorageShare and the corresponding StorageCapacity | associated with a StorageEnvironment. Intuitively, this association | is from the fact that the two StorageCapacities are views of the | same underlying physical storage. This implicit association is not | recorded in GLUE. | | Any pair of StorageShares may be (pair-wise) overlapping; that is, | they have at least one pair of implicitly associated | StorageCapacities (both within a different StorageShare context) | that provide a view onto the a common portion of underlying storage. | A pair of shared StorageCapacities indicate common access to a | shared underlying storage and, in general, storage operations within | one shared StorageShare will affect the other. | | A pair of StorageShares may be partially shared, that is, they have | at least one pair StorageCapacities that are shared and at least one | that is not. Partially shared StorageCapacities could represent two | UserDomain's access to a tape store, where they share a common set | of disk pools but the tape storage is distinct. | | StorageShares may covering or not. A covering set of StorageShares | will make all of the underlying storage available to the end-users. | When a set of StorageShares are not covering, the site-admin has | refrained from allocating some of the underlying storage. | In general, given a StorageCapacity (SC_E) that is associated with | some StorageEnvironment and which has totalSize TS_E, let TS_S be | the sum of the totalSize attributes for all StorageCapacities that | are: | | 1. associated with a StorageShare, and | 2. that are implicitly associated with SC_E. | | If the StorageShares are covering and non-overlapping then TS_S = | TS_E. If they are covering and overlapping the TS_S > TS_E. If they | are not covering and non-overlapping, then TS_S < TS_E. If they are | not covering and overlapping no general statement can be made about | the totalSize. | | End-users within a UserDomain may wish to store or retrieve files. | The StorageShares provides a complete, abstract description of the | underlying storage at their disposal. No member of a UserDomain may | interact with the physical hardware except through a StorageShare. | | In general, one should not draw strong conclusions about how these | attributes will alter under storage operations. The behaviour of | these attribtues may be governed by site level policies, software | implementation, underlying storage technology, concurrent usage, | etc; for example, deleting files may or may not result in freeSpace | increasing. | | A grid may impose more strict requirements on the attributes | behavior. This may be due to good knowledge of deployed software, | management policies and underlying technologies in used. Care | should be taken when sharing monitoring information that the storage | models are compatible. | | A single StorageShare may allow multiple UserDomains to access | storage; if so, the StorageShare is "shared" between the different | UserDomains. Such a shared StorageShare is typical if a site | provides storage described by the trivial StorageShare (one that | covers a complete StorageEnvironment) whilst supporting multiple | UserDomains. StorageMappingPolicy: The StorageMappingPolicy describes how a particular UserDomain is allowed to access a particular StorageShare. | No member of a UserDomain may interact with a StorageShare except as | described by a StorageMappingPolicy. | | The StorageMappingPolicies may contain information that is specific | to that UserDomain, such as one or more associated | StorageCapacities. If provided, these provide a UserDomain-specific | view of their usage of the underlying physical storage technology as | a result of their usage within the StorageShare. | | If StorageCapacities are available with a StorageMappingPolicy | context, there may be the same number as are associated with the | corresponding StorageShare or less. StorageEndpoint: A StorageEndpoint specifies that storage may be controlled through a particular interface. | In general, such interfaces do not allow data transfer. The SRM | protocol is an example of such an interface and a StorageEndpoint | would be advertised for each instance of SRM. | | The access policies describing which users of a UserDomain may use | the StorageEndpoint are not published. On observing that a site | publishes a StorageEndpoint, one may deduce only that it is valid | for at least one user of one supported UserDomain. StorageAccessProtocol: A StorageAccessProtocol describes one method by which end-users may sent data to be stored, received stored data, or undertake both operations. | Access to the interface may be localised; that is, only available | from certain computers. It may also be restricted to specified | UserDomains. However, neither policy restrictions are published in | GLUE. On observing a StorageAccessProcol, one may deduce only that | it is valid for at least one user of one supported UserDomain from | at least one computer. StorageService: A StorageService is an aggregation of StorageEndpoints, StorageAccessProtocols and StorageResources. | It is the top-level description of the ability to transfer files to | and from a site, and manipulate the files once stored.

Hi Paul, thanks for this effort. We should address this issue soon. Separating in different annexes can be more clean, nevertheless it would make the document less readable (a person should jump back and forwards). What about if we add the informative part for each class just before or after each table and we mark it using a different typographical settings? Cheers, Sergio Paul Millar wrote:
I've tried updated the storage objects "thoughts" document with feedback from people.
(Jen, my apologies, I now see what you meant about covering sets: we share the same understanding but what I had written was just wrong .. I've tried to get it more right this time).
Sorry if I've missed out someone's a comment, please yell.
Also, I've tried to separate the more informative paragraphs by marking them differently.
One way of using this information would be to fold the more normative parts into the GLUE spec proper. The informative paragraphs could then form an informative annex to the document; c.f.
http://standards.ieee.org/guides/companion/part1.html#annexes
Cheers,
Paul.

On Tuesday 08 April 2008 13:26:30 Sergio Andreozzi wrote:
Separating in different annexes can be more clean, nevertheless it would make the document less readable (a person should jump back and forwards).
What about if we add the informative part for each class just before or after each table and we mark it using a different typographical settings?
Yes, that's fine, too. The annex was just a suggestion; I'm happy to go with whatever people feel works best. Cheers, Paul.

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Paul Millar said: Sorry if I've missed out someone's a comment, please yell.
I haven't made comments yet, but this seems like a good place to start ... sorry there are quite a lot but I think it's worth trying to nail things down as much as possible.
UserDomain:
A collection of one or more end-users. All end-users that interact with the physical storage are a member of a UserDomain.
Perhaps opening a can of worms, but it may also be possible for a UserDomain to include services, i.e. you might have services registered in VOMS as well as users (even with delegated credentials you may want to give privileges to services which the users don't have).
StorageCapacity:
A StorageCapacity object describes the ability to store data within a homogeneous storage technology. Each object provides a view of that physical storage medium with a common access latency.
It isn't necessarily just the latency that matters, for example it may be useful to publish the Capacity of the disk cache in front of a tape system (see further comments below) - the latency is Online but the functionality is different from Disk1 Online storage. (Similarly a Disk1 storage system might make extra cache copies to help with load balancing.) I think the phraseology should be something like "a common category of storage" (although maybe "category" still isn't the right word). I'd also like to go back to the question I posed in one of the meetings ... say that a site implements Custodial/Online by ensuring three distinct disk copies, how would we represent that? What about mirrored RAID, how much space do we record? Another thing is that I think there is some mission creep going on in the Capacity concept. When I suggested introducing it it was really as a complex data type, i.e. as an alternative to putting maybe 20 separate attributes into each object that can have a size you would effectively have one multivalued "attribute" with type "Capacity" rather than int. However, your descriptions suggest that you're thinking more in terms of a Capacity representing a real thing (a bunch of storage units) which indeed have sizes but may have other attributes too. That isn't necessarily a bad thing, but we should probably be clear in our minds about what we intend.
The context is determined by an association between the StorageCapacity object and precisely one other higher-level object.
What was the decision about Shares for different VOs which share the same physical space? (I haven't really read all the mails yet so this may already be answered ... actually there is more on this further down.)
| The underlying storage technology may affect which of the | context-specific attributes are available. For example, tape storage | may be considered semi-infinite, so the total and free attributes have | no meaning. If this is so, then it affects all StorageCapacity objects with | the same underlying technology, independent of their context.
I'm not quite sure what you're saying here. It seems to me that the schema itself should not be defining this - I would still maintain that tape systems do in fact have a finite capacity at any given time so it isn't conceptually absurd (and "nearline" may not necessarily mean "tape" anyway"). Individual Grids may wish to make their own decisions about what to publish, and equally it seems possible that, say, dcache may decide not to publish something but Castor may. All the schema should do is say that the attributes are optional, but *if* they are published the meaning should be well-defined and common across all Grids/implementations/... (and maybe we also want a special value to mean quasi-infinite?)
| that the underlying storage technology be homogeneous. Homogeneous | means that the underlying storage technology is either identical or | sufficiently similar that the differences don't matter.
I think the real point is more that it's treated uniformly by the SRM (or other storage manager) - even if the differences do matter there won't be anything you can do about it if the SRM gives you no control over it! (e.g. to put your file on RAID 6 rather than RAID 0.)
A StorageEnvironment is a collection of one or more StorageCapacities with a set of associated (enforced) storage management policies.
Hmm ... I could suggest that the Environment now also looks more like a data type than a real object (and is also rather SRM2-specific as it stands). And why are the attributes optional, i.e. what would it mean if one or both is missing? Should there be an OtherInfo attribute? What would we do for classic SEs, or SRB, or for that matter SRM 1? [What actually seems to have happened here is that things have gradually turned inside out. We started with the SA as the main representation of a set of hardware, with size, policy and ACL information embedded in it and subsequently with the VOInfo added as a dependent object. Now the size (Capacity), ACL (MappingPolicy) and VOInfo (Share) are getting carved out as separate objects with an independent "life" and most of the policy attributes have been obsoleted, so we're left with something that carries almost no information and a role which, to me at least, is not totally clear. I'm not saying there's anything wrong with this, but it may lead to misconceptions derived from trying to relate the Glue 2 objects to their Glue 1 equivalents.]
| Examples of these policies are Type (Volatile, Durable, Permanent) | and RetentionPolicy (Custodial, Output, Replica).
Except that Type (or ExpirationMode) doesn't seem to be an attribute in the current draft ... what about other policies, e.g. the old schema had MinFileSize - if we ever wanted to implement such a thing would it go here? Conversely Latency isn't a policy, it's a feature of the hardware. If we really want a Policy object should we call it that rather than Environment?
| In general, a StorageEnvironment may have one or more | RetentionPolicy values.
Not what it says in the current draft (0..1). Does this correspond with SRM usage, i.e. can you have spaces with multiple RPs?
| GLUE does not | record a default RetentionPolicy.
Should it? What about defaults for other things, e.g. ExpirationMode?
| It is the associated StorageCapacities that allow a | StorageEnvironment to store data with its advertised policies; for | example, to act as (Permanent, Custodial) storage of data.
But can you tell how that works, i.e. which Capacity serves which policy? This is another case where our mind tends to think Custodial -> tape -> Nearline, but intrinsically it doesn't have to be like that.
| Since a StorageEnvironment may contain multiple StorageCapacities, | it may describe a heterogeneous environment. An example of this is | "tape storage", which has both tape back-end and disk front-end into | which users can pin files. Such a StorageEnvironment would have two | associated StorageCapacities: one describing the disk storage and | another describing the tape.
But can you have more than one Capacity of the same type? (see the comments earlier). Anyway I think we removed the storage type from the Capability so at the moment you can't really tell what it is. Maybe we should look back at the proposal for Storage Components made by Flavia, Maarten et al in the 1.3 discussion, or has someone already done that?
| StorageCapacities associated with a StorageEnvironment must be | non-overlapping with any other such StorageCapacity and the set of | all such StorageCapacities must represent the complete storage | available to end-users.
Conceptually that may be true, but there's no guarantee that all of them are actually published. You could also wonder about space which is installed but not currently allocated to any VO ...
| Nevertheless, the StorageCapacities associated with | StorageEnvironments may be incomplete as a site may deploy physical | storage devices that are not directly under end-user control; for | example, disk storage used to cache incoming transfers. GLUE makes | no effort to record information about such storage.
Actually part of my reason to introduce Capacity objects is that they can do just that if people want them to (as they may since it can be useful to know about cache usage). For such cases the CapacityType would be Cache, or maybe something else if you wanted to distinguish more than one kind of cache. As always there's no compulsion to publish that if you don't want it, but the schema makes it possible.
| GLUE makes no attempt to record which physical storage (as | represented by StorageCapacity objects) is under control of which | StorageResource.
Should it? As it stands you might not care, but if you wanted to consider monitoring use cases (whether the software is running at the most basic!) it would probably be useful to know how that relates to the actual storage.
StorageShare:
A StorageShare is a logical partitioning of one or more StorageEnvironments.
Maybe I'm missing something, but how could you have more than one Environment for a single Share? Certainly our current structure doesn't allow it (one SA per many VOInfos but not vice versa), although as I said above that might be misleading.
| The StorageCapacities within the StorageShare context need not | describe all storage: the number of StorageCapacities associated | with a StorageShare may be less than the sum of the number of | StorageCapacities associated with each of the StorageShare's | associated StorageEnvironments.
Err, why? As always you may choose not to publish everything, but conceptually the space is all there somewhere ...
| A pair of StorageShares may be partially shared, that is, they have | at least one pair StorageCapacities that are shared and at least one | that is not. Partially shared StorageCapacities could represent two | UserDomain's access to a tape store, where they share a common set | of disk pools but the tape storage is distinct.
I'm not sure I like this bit. In general I would assume that storage (SAs in the current parlance) is either shared or not - allowing the disk part of a custodial/online space to be shared and the tape part not sounds rather weird to me, and I don't think that's how SRM works. Do we really have such cases? Bear in mind that the point is not about sharing the physical disks, but having a shared allocation (and for Disk1/Online permanent storage, not cache). If the system is guaranteeing to store, say, 100 Tb on both disk and tape (custodial/online) there is no way it can do that if the disk part of the reservation is shared, and if it doesn't guarantee it overall then having a reserved tape pool is pointless, in general it would just mean that some tapes are unusable. Another question, what do we do about hierarchical spaces? At the moment we at least have the case of the "base space" or whatever you call it from which the space tokens are reserved, and in future I believe we're considering being able to reserve spaces inside spaces. How could that be represented? (There are also questions we've discussed in the past about things like dynamic spaces and default spaces which tend to produce more heat than light :)
StorageMappingPolicy:
The StorageMappingPolicy describes how a particular UserDomain is allowed to access a particular StorageShare.
Should we say how this relates to the AccessPolicy? (which doesn't seem to appear explicitly in either the Computing or Storage diagrams but is presumably there anyway.)
| No member of a UserDomain may interact with a StorageShare except as | described by a StorageMappingPolicy.
As stated I don't think that can really be true, the SRM could potentially allow all kinds of things not explicitly published. The things which should be true are that there is an agreed set of things (maybe per grid?) which are published, and that the published values should be a superset of the "real" permissions - i.e. the SRM may in fact not authorise me even if the published value says that it will, but the reverse shouldn't be true.
| The StorageMappingPolicies may contain information that is specific | to that UserDomain, such as one or more associated | StorageCapacities. If provided, these provide a UserDomain-specific | view of their usage of the underlying physical storage technology as | a result of their usage within the StorageShare.
I don't think I understand how this can be different from the Share to Capacity relation ... if you are saying that the Share can be multi-VO then I think something has gone wrong somewhere given that the Path and Tag can be VO-specific. In the 1.3 schema the whole point of the VOInfo (which has become the Share) was to split out the information specific to each mapping policy (ACBR) from the generic information in the SA ...
| The access policies describing which users of a UserDomain may use | the StorageEndpoint are not published.
Are you sure? (see comment above)
A StorageAccessProtocol describes one method by which end-users may sent data to be stored, received stored data, or undertake both operations.
sent -> send, received -> retrieve
| Access to the interface may be localised; that is, only available | from certain computers. It may also be restricted to specified | UserDomains.
It might also only apply to certain storage components ... Phew .. I spent over two hours writing that, I hope someone reads it :) Stephen

Hi all, I am not sure if I can make it for the todays telcon. Therefore, I've updated the agenda page. Thanks Stephen for the input! I've added some comments:
-----Original Message----- From: glue-wg-bounces@ogf.org [mailto:glue-wg-bounces@ogf.org] On Behalf Of Burke, S (Stephen) Sent: Dienstag, 8. April 2008 22:46 To: Paul Millar; GLUE WG Subject: Re: [glue-wg] Updated thoughts...
glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Paul Millar said: Sorry if I've missed out someone's a comment, please yell.
I haven't made comments yet, but this seems like a good place to start ... sorry there are quite a lot but I think it's worth trying to nail things down as much as possible.
UserDomain:
A collection of one or more end-users. All end-users that interact with the physical storage are a member of a UserDomain.
Perhaps opening a can of worms, but it may also be possible for a UserDomain to include services, i.e. you might have services registered in VOMS as well as users (even with delegated credentials you may want to give privileges to services which the users don't have). I strongly assume that this implies changing the main entities (service ->userdomain relation). I'm not sure if we can do this before public comments deadline.
StorageCapacity:
A StorageCapacity object describes the ability to store data within a homogeneous storage technology. Each object provides a view of that physical storage medium with a common access latency.
It isn't necessarily just the latency that matters, for example it may be useful to publish the Capacity of the disk cache in front of a tape system (see further comments below) - the latency is Online but the functionality is different from Disk1 Online storage. (Similarly a Disk1 storage system might make extra cache copies to help with load balancing.) I think the phraseology should be something like "a common category of storage" (although maybe "category" still isn't the right word).
I'd also like to go back to the question I posed in one of the meetings ... say that a site implements Custodial/Online by ensuring three distinct disk copies, how would we represent that? What about mirrored RAID, how much space do we record? If the storage system needs to keep more than one instance of a file to ensure a certain storage quality of service then I don't feel this should be published into the grid information system. Why should the user care if a system has theoretically 100GB but due to the QoS agreement (3 copies of a file) he can only use ~30GB? In this case the system MAY publish 100GB at the beginning but would then decrease the free space by the order of 3*file_size when a file has been pushed into
see two paragraphs below.. the system.
Another thing is that I think there is some mission creep going on in the Capacity concept. When I suggested introducing it it was really as a complex data type, i.e. as an alternative to putting maybe 20 separate attributes into each object that can have a size you would effectively have one multivalued "attribute" with type "Capacity" rather than int. However, your descriptions suggest that you're thinking more in terms of a Capacity representing a real thing (a bunch of storage units) which indeed have sizes but may have other attributes too. That isn't necessarily a bad thing, but we should probably be clear in our minds about what we intend.
The context is determined by an association between the StorageCapacity object and precisely one other higher-level object.
I agree with your idea of having this thing as a complex data type enriching higher-level entities by accounting/status data. I still see it as you're intended to have it and that the description should rather go into a direction of this: "The StorageCapacity enriches a higher-level entity by status/accounting information which must not be instanciated without this relation."
What was the decision about Shares for different VOs which share the same physical space? (I haven't really read all the mails yet so this may already be answered ... actually there is more on this further down.)
I must admit that I may have pushed this idea. I have the feeling (although not necessary for now) that it should be possible to share a 'Share' among VOs.
| The underlying storage technology may affect which of the | context-specific attributes are available. For example, tape storage | may be considered semi-infinite, so the total and free attributes have | no meaning. If this is so, then it affects all StorageCapacity objects with | the same underlying technology, independent of their context.
I'm not quite sure what you're saying here. It seems to me that the schema itself should not be defining this - I would still maintain that tape systems do in fact have a finite capacity at any given time so it isn't conceptually absurd (and "nearline" may not necessarily mean "tape" anyway"). Individual Grids may wish to make their own decisions about what to publish, and equally it seems possible that, say, dcache may decide not to publish something but Castor may. All the schema should do is say that the attributes are optional, but *if* they are published the meaning should be well-defined and common across all Grids/implementations/... (and maybe we also want a special value to mean quasi-infinite?)
I think Paul wants to point out with this _example_ that some attributes may not have a senseful meaning in a Grid infrastructure and that in this case the related Capacity objects should also not publish these values. I'm not sure if we can define that a tape based storage system should have finite space in GLUE (although I would like to have it!) because I expect resistance from the community. Please correct me if I am wrong. If not, then we need to add a line to the schema about this definition.
A StorageEnvironment is a collection of one or more StorageCapacities with a set of associated (enforced) storage management policies. A StorageEnvironment describes non-overlapping physical storage quality offerd by a StorageResource and must be referenced by StorageShares which are deployed fully or partly in a StorageEnvironment. It additionally may offer accounting information represented by one (or more) StorageCapacity objects which themselfs should re flect the (quality)type of the enviroments is gives the capacity info for.
Hmm ... I could suggest that the Environment now also looks more like a data type than a real object (and is also rather SRM2-specific as it stands). And why are the attributes optional, i.e. what would it mean if one or both is missing?
Well, I assume that nobody would publish something which has no meaning except a localID. I could be wrong. But I recommend to add a text which states that at least one attribute (AccessLatency or RetentionPolicy) must be given.
Should there be an OtherInfo attribute? What would we do for classic SEs, or SRB, or for that matter SRM 1? Fortunatly, here in Taipei there is a developer from iRODS (enhanced SRB). I was smiling when I saw that they use the same terminology. He stated the SRB as a StorageResource. I haven't had the opportunity to talk to him but I'll do soon.
[What actually seems to have happened here is that things have gradually turned inside out. We started with the SA as the main representation of a set of hardware, with size, policy and ACL information embedded in it and subsequently with the VOInfo added as a dependent object. Now the size (Capacity), ACL (MappingPolicy) and VOInfo (Share) are
getting carved out as separate objects with an independent "life" and most of the policy attributes have been obsoleted, so we're left with something that carries almost no information and a role which, to me at least, is not totally clear. I'm not saying there's anything wrong with this, but it may lead to misconceptions derived from trying to relate the Glue 2 objects to their Glue 1 equivalents.] Yes, I agree with you. But from our discussions I have the feeling that
VOInfo information went into MappingPolicy the current solution can satisfy most of the use cases (e.g. see NorduGrid #1) and I somehow also ran out of ideas for new entities and -much more important- meaningful names for these.
| Examples of these policies are Type (Volatile, Durable, Permanent) | and RetentionPolicy (Custodial, Output, Replica).
Except that Type (or ExpirationMode) doesn't seem to be an attribute in the current draft ... what about other policies,
Type has been depicated in Environment ExpirationMode went into Share
| In general, a StorageEnvironment may have one or more | RetentionPolicy values.
Not what it says in the current draft (0..1). Does this correspond with SRM usage, i.e. can you have spaces with multiple RPs?
Sorry, I may have missed this out. But to be sure I put it onto the agenda for next telcon 9.4.2008.
| GLUE does not | record a default RetentionPolicy.
Should it? What about defaults for other things, e.g. ExpirationMode?
I'm also happy with having no retention policy as a default. (This however, may not be the case for WLCG. But GLUE shouldn't define this)
| It is the associated StorageCapacities that allow a | StorageEnvironment to store data with its advertised policies; for | example, to act as (Permanent, Custodial) storage of data.
But can you tell how that works, i.e. which Capacity serves which policy? This is another case where our mind tends to think Custodial -> tape -> Nearline, but intrinsically it doesn't have to be like that.
| Since a StorageEnvironment may contain multiple StorageCapacities, | it may describe a heterogeneous environment. An example of this is | "tape storage", which has both tape back-end and disk front-end into | which users can pin files. Such a StorageEnvironment would have two | associated StorageCapacities: one describing the disk storage and | another describing the tape.
But can you have more than one Capacity of the same type? (see the comments earlier). Anyway I think we removed the storage type from the Capability so at the moment you can't really tell what it is. Maybe we should look back at the proposal for Storage Components made by Flavia, Maarten et al in the 1.3 discussion, or has someone already done that?
No, the Type is still in the Capacity and there were no plans in the last sessions to remove it. There is no restriction on publishing another Capacity object with the same type for e.g. an enviroment. I wondering if this makes sence if I would have two online capacities for a enviroment. But I wouldn't consider this as an error or major problem.
| Nevertheless, the StorageCapacities associated with | StorageEnvironments may be incomplete as a site may deploy physical | storage devices that are not directly under end-user control; for | example, disk storage used to cache incoming transfers. GLUE makes | no effort to record information about such storage. Fine with me.
Actually part of my reason to introduce Capacity objects is that they can do just that if people want them to (as they may since it can be useful to know about cache usage). For such cases the CapacityType would be Cache, or maybe something else if you wanted to distinguish more than one kind of cache. As always there's no compulsion to publish that if you don't want it, but the schema makes it possible.
| GLUE makes no attempt to record which physical storage (as | represented by StorageCapacity objects) is under control of which | StorageResource.
Should it? As it stands you might not care, but if you wanted to consider monitoring use cases (whether the software is running at the most basic!) it would probably be useful to know how that relates to the actual storage.
StorageShare:
A StorageShare is a logical partitioning of one or more StorageEnvironments.
Maybe I'm missing something, but how could you have more than one Environment for a single Share? Certainly our current structure doesn't allow it (one SA per many VOInfos but not vice versa), although as I said above that might be misleading.
| The StorageCapacities within the StorageShare context need not | describe all storage: the number of StorageCapacities associated | with a StorageShare may be less than the sum of the number of | StorageCapacities associated with each of the StorageShare's | associated StorageEnvironments.
Err, why? As always you may choose not to publish everything, but conceptually the space is all there somewhere ...
Fine with me
| A pair of StorageShares may be partially shared, that is, they have | at least one pair StorageCapacities that are shared and at least one | that is not. Partially shared StorageCapacities could represent two | UserDomain's access to a tape store, where they share a common set | of disk pools but the tape storage is distinct.
I'm not sure I like this bit. In general I would assume that storage (SAs in the current parlance) is either shared or not - allowing the disk part of a custodial/online space to be shared and the tape part not sounds rather weird to me, and I don't think that's how SRM works. Do we really have such cases? Bear in mind that the point is not about sharing the physical disks, but having a shared allocation (and for Disk1/Online permanent storage, not cache). If the system is guaranteeing to store, say, 100 Tb on both disk and tape (custodial/online) there is no way it can do that if the disk part of the reservation is shared, and if it doesn't guarantee it overall then having a reserved tape pool is pointless, in general it would just mean that some tapes are unusable.
Another question, what do we do about hierarchical spaces? At the moment we at least have the case of the "base space" or whatever you call it from which the space tokens are reserved, and in future I believe we're considering being able to reserve spaces inside spaces. How could that be represented? (There are also questions we've discussed in the past about things like dynamic spaces and default spaces which tend to produce more heat than light :)
I fear that there are no plans to consider this concept for now. And I don't think that we are able to implement it before end of April.
StorageMappingPolicy:
The StorageMappingPolicy describes how a particular UserDomain is allowed to access a particular StorageShare.
Should we say how this relates to the AccessPolicy? (which doesn't seem to appear explicitly in either the Computing or Storage diagrams but is presumably there anyway.) The StorageMappingPolicy describes how a UserDomain may utilize a Share. Its main purpose is to publish access control but it may also keep additional information for accounting as well as UserDomain specific namespace access information for the associated share.
| No member of a UserDomain may interact with a StorageShare except as | described by a StorageMappingPolicy.
As stated I don't think that can really be true, the SRM could potentially allow all kinds of things not explicitly published. The things which should be true are that there is an agreed set of things (maybe per grid?) which are published, and that the published values should be a superset of the "real" permissions - i.e. the SRM may in fact not authorise me even if the published value says that it will, but the reverse shouldn't be true.
Yes, I agree.
| The StorageMappingPolicies may contain information that is specific | to that UserDomain, such as one or more associated | StorageCapacities. If provided, these provide a UserDomain-specific | view of their usage of the underlying physical storage technology as | a result of their usage within the StorageShare.
I don't think I understand how this can be different from the Share to Capacity relation ... if you are saying that the Share can be multi-VO then I think something has gone wrong somewhere given that the Path and Tag can be VO-specific. In the 1.3 schema the whole point of the VOInfo (which has become the Share) was to split out the information specific to each mapping policy (ACBR) from the generic information in the SA ...
| The access policies describing which users of a UserDomain may use | the StorageEndpoint are not published.
Are you sure? (see comment above)
Please check my mail sent to the list today ("RE: [glue-wg] Some thoughts on storage objects").
A StorageAccessProtocol describes one method by which end-users may sent data to be stored, received stored data, or undertake both operations.
sent -> send, received -> retrieve
| Access to the interface may be localised; that is, only available | from certain computers. It may also be restricted to specified | UserDomains.
It might also only apply to certain storage components ...
Fine with me.
Phew .. I spent over two hours writing that, I hope someone reads it :)
Stephen
I did! But lucky you, it took me longer interrupted by lecture breaks and interesting presentations. Cheerio, Felix

Still getting back to some things ... Felix Nikolaus Ehm [mailto:Felix.Ehm@cern.ch] said:
Perhaps opening a can of worms, but it may also be possible for a UserDomain to include services, i.e. you might have services registered in VOMS as well as users (even with delegated credentials you may want to give privileges to services which the users don't have). I strongly assume that this implies changing the main entities (service ->userdomain relation). I'm not sure if we can do this before public comments deadline.
I'm not sure if it needs any change in the schema, but at least the explanatory text could indicate the possibility.
I'd also like to go back to the question I posed in one of the meetings ... say that a site implements Custodial/Online by ensuring three distinct disk copies, how would we represent that? What about mirrored RAID, how much space do we record? If the storage system needs to keep more than one instance of a file to ensure a certain storage quality of service then I don't feel this should be published into the grid information system.
Well, maybe - but for the standard LCG custodial/online (Disk1Tape1) we *are* publishing both copies, at least assuming we publish the tape sizes at all. Secondly there's an accounting issue, the VO is probably going to be asked to pay for multiple permananent copies. Also there could be an impact on the meaning of the free space - you may think you have 100 Tb free but it may only be 50Tb if all files are duplicated. At least I think we need a clear and consistent definition so people know what to publish in a given case, e.g. if file sizes are published only once as used space then the free space should be divided by any duplication factor.
Why should the user care if a system has theoretically 100GB but due to the QoS agreement (3 copies of a file) he can only use ~30GB?
Potentially the user cares twice: a) because they get charged more, and b) because they have less free space than they thought (it's similar to the "shared Share" argument).
I'm not sure if we can define that a tape based storage system should have finite space in GLUE (although I would like to have it!)
What I think we can define is that if you publish a size at all it should be a finite number, and it should be the current capacity, i.e. how many tapes are in the robot right now. If people don't or can't publish it that's fine, but we shouldn't have people publishing meaningless large numbers as some sites have done in the past, so when you add up the storage in the whole Grid you get something silly - these things sometimes get into high-level management presentations, where someone just reads a number without realising that it can't be right! Stephen

Burke, S (Stephen) wrote:
If the storage system needs to keep more than one instance of a file to ensure a certain storage quality of service then I don't feel this should be published into the grid information system.
Well, maybe - but for the standard LCG custodial/online (Disk1Tape1) we *are* publishing both copies, at least assuming we publish the tape sizes at all. Secondly there's an accounting issue, the VO is probably going to be asked to pay for multiple permananent copies. Also there
The cost is mostly determined by the RP being Custodial; 3 copies at one site may be cheaper than 1 copy at another, so we would rather have to publish the actual pricing. Something for 2.1 maybe.
could be an impact on the meaning of the free space - you may think you have 100 Tb free but it may only be 50Tb if all files are duplicated. At least I think we need a clear and consistent definition so people know what to publish in a given case, e.g. if file sizes are published only once as used space then the free space should be divided by any duplication factor.
One could argue that the info provider should do that division for you!

Hi Stephen, all, On Tuesday 08 April 2008 16:45:48 Burke, S (Stephen) wrote:
I haven't made comments yet, but this seems like a good place to start ... sorry there are quite a lot but I think it's worth trying to nail things down as much as possible.
Absolutely, this is a "trying to nail things down" document :) In case it isn't obvious, I'm trying to describe only the object classes, not the attributes (not always successful here). The hope is, once the object classes are clearly defined, the attributes become more obvious.
UserDomain:
A collection of one or more end-users. All end-users that interact with the physical storage are a member of a UserDomain.
Perhaps opening a can of worms, but it may also be possible for a UserDomain to include services, i.e. you might have services registered in VOMS as well as users (even with delegated credentials you may want to give privileges to services which the users don't have).
As a proposal, if we decide to include non-carbon-based entities as members of UserDomain, we could use "agents" instead of end-users. An end-users being an example of agent. For example, various production CAs are experimenting with issuing robot certificates. These allow programs to achieve a certain amount of autonomy, but in an accountable (and, in some sense, controlled) way. I believe these robot certificates are currently always tie to a specific person (or, at least, to that person's identity). So, as a suggestion, we could replace "end-user" with "agent" and have an informative description saying something like: | The end-users are one possible agent. A grid, may choose to allow | only end-users as agents, or it may decide to allow other | agents, such as semi-autonomous programs that interact with the grid. Alternatively, we could postpone this until 2.1.
StorageCapacity:
A StorageCapacity object describes the ability to store data within a homogeneous storage technology. Each object provides a view of that physical storage medium with a common access latency.
It isn't necessarily just the latency that matters, for example it may be useful to publish the Capacity of the disk cache in front of a tape system (see further comments below) - the latency is Online but the functionality is different from Disk1 Online storage.
I think we discussed this during the phone call. The proposal was that type be an open enumeration with "cache" as one option. If that's OK with everyone, I'll try to update StorageCapacity accordingly.
(Similarly a Disk1 storage system might make extra cache copies to help with load balancing.)
True, they might well do this (dCache certainly does under various conditions); but I'd say that this is purely an internal issue and shouldn't be published in GLUE. I don't think we have any use-cases for publishing this information.
I think the phraseology should be something like "a common category of storage" (although maybe "category" still isn't the right word).
Well, maybe. I personally find "category" is too vague and a somewhat circular definition. I feel we need to be more precise than that here: these object classes must represent some clearly identifiable concepts if people are going to implement info-providers that are interoperable. At the risk of sounding like a broken record: I'm currently understanding of StorageCapacity is as a light-weight view of some homogeneous storage, providing only the minimal amount of information needed within a certain context (StorageShare, ...). I guess everyone else views Capacities as a "hack" to get around UML.
I'd also like to go back to the question I posed in one of the meetings ... say that a site implements Custodial/Online by ensuring three distinct disk copies, how would we represent that? What about mirrored RAID, how much space do we record?
Err, I don't see the problem here; they should report the numbers that make sense, no? I guess I'm missing something... Using RAID storage as a specific example, any RAID system (0,1,5,6,1+0, etc) is (ultimately) just a block-device, albeit one with some slightly odd properties. The RAID system stores data as one or more blocks each of a fixed size, so the total is just nBlocks * sizeOf(Block). The filesystem will exert some overhead, so the totalSpace reported in the StorageCapacity will likely be a smidgen less than this, but the correct value is easily discoverable: just do "df" on the filesystem. For the case where three disks provide a RAID system with online latency and which a grid considers sufficient for custodial storage, I feel GLUE should report a single StorageCapacity for the three-disk-system. The person in Timbuktu should neither know nor care that the online-custodial system is built from three disks in a RAID configuration or from some other technology.
Another thing is that I think there is some mission creep going on in the Capacity concept. When I suggested introducing it it was really as a complex data type, i.e. as an alternative to putting maybe 20 separate attributes into each object that can have a size you would effectively have one multivalued "attribute" with type "Capacity" rather than int.
Yes, but there is a slightly deeper question: why do we find ourselves doing this? Why do we have a StorageCapacity object class? From the phone confersation, it seems Stephen you view this as simply a work-around because UML doesn't support complex data types (is that a fair summary?) Not wishing to be seen promoting or defending UML particularly, but I suspect this is omission is deliberate: if one is modelling something that needs a complex data-type then that complex data-type *is* representing something.
However, your descriptions suggest that you're thinking more in terms of a Capacity representing a real thing (a bunch of storage units) which indeed have sizes but may have other attributes too.
Yes, I'm currently thinking about this as a (light-weight) view of some physical storage. It may be a view of all the storage those physical devices make available (e.g., under StorageEnvironment), or only a subset (e.g. under a StorageShare).
That isn't necessarily a bad thing, but we should probably be clear in our minds about what we intend.
Yes, absolutely... I agree this needs to be clear. FWIW, I don't think there's any mission creep here; rather, what we've got is a more precise definition of what a StorageCapacity *is*. The description is not extending the concept, but better defining it; so, rather than saying "its a bunch of numbers we might want to record", the document offers an explanation of why the object class exists.
The context is determined by an association between the StorageCapacity object and precisely one other higher-level object.
What was the decision about Shares for different VOs which share the same physical space? (I haven't really read all the mails yet so this may already be answered ... actually there is more on this further down.)
Is this not supported by representing this as different StorageMappingPolicies pointing to the same StorageShare? There is even support for per-VO space-utilisation information through the StorageCapacity attached to the StorageMappingPolicy object.
| The underlying storage technology may affect which of the | context-specific attributes are available. For example, tape storage | may be considered semi-infinite, so the total and free attributes have | no meaning. If this is so, then it affects all StorageCapacity objects with | the same underlying technology, independent of their context.
I'm not quite sure what you're saying here. It seems to me that the schema itself should not be defining this
It doesn't define this: the (semi-) infinite tape is meant as an informative example where not publishing totalSize might make sense.
I would still maintain that tape systems do in fact have a finite capacity at any given time so it isn't conceptually absurd (and "nearline" may not necessarily mean "tape" anyway")
Both are, in general, true. However, from chatting with our tape people here: a. not all tape systems provide an easy (or sometimes, any) mechanism for discovering the current totalSize, b. some places have operation practise that they "just add more tapes" when the space looks like its filling up, The argument for making totalSize optional is that a) sometimes it's impossible to discover, b) sometimes it's a meaningless concept.
Individual Grids may wish to make their own decisions about what to publish, and equally it seems possible that, say, dcache may decide not to publish something but Castor may. All the schema should do is say that the attributes are optional, but *if* they are published the meaning should be well-defined and common across all Grids/implementations/...
Yes, absolutely!
(and maybe we also want a special value to mean quasi-infinite?)
Yes, we could. My preference would be simply not publishing totalSize. I think this is more in keeping with the model adopted elsewhere in GLUE: not publishing information where it doesn't make sense.
| that the underlying storage technology be homogeneous. Homogeneous | means that the underlying storage technology is either identical or | sufficiently similar that the differences don't matter.
I think the real point is more that it's treated uniformly by the SRM (or other storage manager) - even if the differences do matter there won't be anything you can do about it if the SRM gives you no control over it! (e.g. to put your file on RAID 6 rather than RAID 0.)
True; although, ideally, the definition should go beyond current SRM protocol definition. The RAID-6 vs RAID-0 is an interesting one. Suppending disbelief for a moment, and suppose that the SE is configured so it labels RAID-6 as suitable Replica-Online and RAID-0 as suitable for Custodial-Online (whether anyone would do this is a separate question!). Since the RAID-6 and RAID-0 storage have different management properties (Replica vs Custodial), they must be in different StorageEnvironments. Since a StorageCapacity is associated with only one StorageEnvironment, the two RAID systems would be represented as two StorageCapacities --- both with Online latency (a property of their underlying storage) but as part of different StorageEnvironment. If you like, because SRM can distinguish between them that they must be separated into two different StorageCapacity objects. If both RAID devices were considered only good enough for Replica storage, both RAID systems could be represented within the same StorageEnvironment. They should be considered "sufficiently similar that the differences don't matter", so represented by a single StorageCapacity.
A StorageEnvironment is a collection of one or more StorageCapacities with a set of associated (enforced) storage management policies.
Hmm ... I could suggest that the Environment now also looks more like a data type than a real object (and is also rather SRM2-specific as it stands).
I guess all GLUE object classes are logical abstractions of something that (hopefully) makes sense. So, a member of a UserDomain (an end-user) stores data in a StorageShare. A StorageShare is some part of a StoreEnvironment (maybe all of). A StorageEnvironment is built from one or more StorageCapacities (yes, I'm considering StorageCapacities as real objects here) and functions due to one or more StorageResources.
And why are the attributes optional,
I believe, in the actual draft spec., these are optional because they may not make sense within all grids.
i.e. what would it mean if one or both is missing?
I would imagine the answer would be grid specific. An alternative would be to make the Retention policy mandatory, but as an open enumeration.
Should there be an OtherInfo attribute?
Perhaps, yes. Providing for it (probably) wouldn't hurt.
What would we do for classic SEs, or SRB, or for that matter SRM 1?
I believe a classic SE has a single StorageEnvironment. I don't know SRB well enough (perhaps Jens could comment?). SRM1 doesn't understand spaces, so one could publish a "default space" the same size as the StorageEnvironment and link the SRMv1 interface to that space.
[What actually seems to have happened here is that things have gradually turned inside out. We started with the SA as the main representation of a set of hardware, with size, policy and ACL information embedded in it and subsequently with the VOInfo added as a dependent object. Now the size (Capacity), ACL (MappingPolicy) and VOInfo (Share) are getting carved out as separate objects with an independent "life" and most of the policy attributes have been obsoleted, so we're left with something that carries almost no information and a role which, to me at least, is not totally clear. I'm not saying there's anything wrong with this, but it may lead to misconceptions derived from trying to relate the Glue 2 objects to their Glue 1 equivalents.]
I think I've a fairly concrete idea of what a StorageEnvironment is. In plain(-ish) language, its represents the complete ability to store data with certain management policies. In general, it is built from combining hardware with different access latencies. It has some implicit and explicit management policies that result in it being described as "custodial" or "volatile". Users may be allocated some portion of this StorageEnvironment as a StorageShare. A StorageEnvironment has a lifetime equal to or greater than its StorageShares.
| Examples of these policies are Type (Volatile, Durable, Permanent) | and RetentionPolicy (Custodial, Output, Replica).
Except that Type (or ExpirationMode) doesn't seem to be an attribute in the current draft ... what about other policies, e.g. the old schema had MinFileSize - if we ever wanted to implement such a thing would it go here?
(BTW, is this a though experiment, or an actual proposal?) Mostly like MinFileSize would go in the StorageCapacity, but it depends on the actual use-case for recording this information. If MinFileSize comes from a limitation of the underlying storage, then it should go in the corresponding StorageCapacity object. If it is an "arbitrary" management policy, then it should go in StorageEnvironment.
Conversely Latency isn't a policy, it's a feature of the hardware.
True. This is why i felt it belongs in the StorageCapacity and not the StorageEnvironment: the StorageEnvironment would necessarily be "nearline" if it has an attached StorageCapacity with nearline latency. But, this wasn't accepted during the phone conference, so I can't give an answer to this.
If we really want a Policy object should we call it that rather than Environment?
Well, maybe. The names don't bother me too much, provided the concepts, attributes and relationships are very precisely defined. One could say that: "the data is stored in an environment defined by the management policies and the underlying hardware." So, perhaps StorageEnvironment isn't perfect, but it isn't too bad.
| In general, a StorageEnvironment may have one or more | RetentionPolicy values.
Not what it says in the current draft (0..1).
True ... I thought we had agreed that StorageEnvironment had 0..* multiplicity, but the docs don't seem reflect this.
Does this correspond with SRM usage, i.e. can you have spaces with multiple RPs?
From memory, I believe this was a request from Maartin: that a StorageEnvironment could have multiple RPs. I'm not sure precisely why: perhaps indicating that a StorageShare may have different RPs, but this would be covered by the (potentially) one-to-many link between StorageShare and StorageEnvironment.
| GLUE does not record a default RetentionPolicy.
Should it?
No. Yes. Who can say? What use-cases result in us storing multiple RPs for a single StorageEnvironment? Under those circumstances, do we need to record a primary/default?
What about defaults for other things, e.g. ExpirationMode?
I'm not sure having multiple ExpirationMode makes sense. If, for some reason, two ExpirationModes need to be indicated, could this be done with two different StorageEnvironments?
| It is the associated StorageCapacities that allow a | StorageEnvironment to store data with its advertised policies; for | example, to act as (Permanent, Custodial) storage of data.
But can you tell how that works, i.e. which Capacity serves which policy? This is another case where our mind tends to think Custodial -> tape -> Nearline, but intrinsically it doesn't have to be like that.
Would dropping the "Permanent" from the above example (it shouldn't have been there) fix this problem? If a StorageEnvironment is advertised as having a RetentionPolicy Custodial (only) and has two StorageCapacity (a nearline one and an online one), would that be OK?
| Since a StorageEnvironment may contain multiple StorageCapacities, | it may describe a heterogeneous environment. An example of this is | "tape storage", which has both tape back-end and disk front-end into | which users can pin files. Such a StorageEnvironment would have two | associated StorageCapacities: one describing the disk storage and | another describing the tape.
But can you have more than one Capacity of the same type? (see the comments earlier).
This is currently an open question. I believe most people feel the answer is "no" ("sufficiently similar").
Anyway I think we removed the storage type from the Capability so at the moment you can't really tell what it is.
Sorry, I think I may have suggested putting it back: I felt it didn't really sit well in the StorageEnvironment. This comes back to the question of what is the StorageCapacity? My personal feeling is "a hack to get around UML" is not a satisfactory answer ;-)
Maybe we should look back at the proposal for Storage Components made by Flavia, Maarten et al in the 1.3 discussion, or has someone already done that?
I'm not sure ... I don't think I've seen it. Do you a copy somewhere?
| StorageCapacities associated with a StorageEnvironment must be | non-overlapping with any other such StorageCapacity and the set of | all such StorageCapacities must represent the complete storage | available to end-users.
Conceptually that may be true, but there's no guarantee that all of them are actually published.
True. But, no such guarantees are required in the doc. A site (or an information provider) may choose to publish everything (including unallocated space), or it may choose to publish only allocated space. This might be an operational decision made by a grid.
You could also wonder about space which is installed but not currently allocated to any VO ...
Yes, but do we have a use-case for this?
| Nevertheless, the StorageCapacities associated with | StorageEnvironments may be incomplete as a site may deploy physical | storage devices that are not directly under end-user control; for | example, disk storage used to cache incoming transfers. GLUE makes | no effort to record information about such storage.
Actually part of my reason to introduce Capacity objects is that they can do just that if people want them to (as they may since it can be useful to know about cache usage). For such cases the CapacityType would be Cache, or maybe something else if you wanted to distinguish more than one kind of cache. As always there's no compulsion to publish that if you don't want it, but the schema makes it possible.
OK, but I think there are two specific things here: a) cache storage specifically for a StorageEnvironment b) general cache storage available to multiple StorageEnvironments An example of a) is the disks that "front" a D0T1-like storage, and example of b) is the general cache for storing all incoming WAN transfers as they are being written to tape. Whilst one could represent a) as a StorageCapacity (e.g., with type "cache"), one could not do so for b), as it is not exclusively part of any one StorageEnvironment. When I said GLUE makes no effort to record information... this is specifically part b) ... disk cache that is common between multiple StorageEnvironments.
| GLUE makes no attempt to record which physical storage (as | represented by StorageCapacity objects) is under control of which | StorageResource.
Should it?
Dono: what are the use-cases for it doing so?
As it stands you might not care, but if you wanted to consider monitoring use cases (whether the software is running at the most basic!)
True, and this is covered: software StorageResource(s) are available.
it would probably be useful to know how that relates to the actual storage.
Sure, I agree it might be interesting. However, I don't think this is covered by any use-case / requirements.
StorageShare:
A StorageShare is a logical partitioning of one or more StorageEnvironments.
Maybe I'm missing something, but how could you have more than one Environment for a single Share?
I think this isn't something useful to EGEE and it comes from one of the other grids.
Certainly our current structure doesn't allow it (one SA per many VOInfos but not vice versa), although as I said above that might be misleading.
Yes, I believe this was to allow a new, non-WLCG use-case. I have a vague memory of Maarten mentioning this, but I could be wrong.
| The StorageCapacities within the StorageShare context need not | describe all storage: the number of StorageCapacities associated | with a StorageShare may be less than the sum of the number of | StorageCapacities associated with each of the StorageShare's | associated StorageEnvironments.
Err, why? As always you may choose not to publish everything, but conceptually the space is all there somewhere ...
Well, as you say, you may choose not to publish everything, simply that. A concrete example would be: Consider a StorageEnvironment (StorEnv1) that (using WLCG terminology) is D0T1. When published, it two associated StorageCapacities: one with type="nearline" and one with type="cache" (which has online access latency). A grid (not WLCG) decides that they are not publishing the cache information for StorageShares. This is because they don't want to record that level of detail for D0T1, they only want to record the actual tape usage. In this example, all StorageShares associated with the StorEnv1 would have only one StorageCapacity. Each one would have type="nearline" and describe the tape usage of that StorageShare.
| A pair of StorageShares may be partially shared, that is, they have | at least one pair StorageCapacities that are shared and at least one | that is not. Partially shared StorageCapacities could represent two | UserDomain's access to a tape store, where they share a common set | of disk pools but the tape storage is distinct.
I'm not sure I like this bit. In general I would assume that storage (SAs in the current parlance) is either shared or not - allowing the disk part of a custodial/online space to be shared and the tape part not sounds rather weird to me, and I don't think that's how SRM works.
Perhaps, but custodial/nearline storage (D0T1) might have a shared diskpool for staged files.
Do we really have such cases? Bear in mind that the point is not about sharing the physical disks, but having a shared allocation (and for Disk1/Online permanent storage, not cache).
No, I'd say it's more about whether we describe the cache space. If we do, then a site may choose to share its cache amongst all StorageShare objects. With D1T1-like storage, it doesn't make sense to have a pair of partially shared StorageShares.
If the system is guaranteeing to store, say, 100 Tb on both disk and tape (custodial/online) there is no way it can do that if the disk part of the reservation is shared, and if it doesn't guarantee it overall then having a reserved tape pool is pointless, in general it would just mean that some tapes are unusable.
Yes, this is true for D1T1. Partially shared StorageSpaces, if they have a place, are more for D0T1 with a shared staging area.
Another question, what do we do about hierarchical spaces? At the moment we at least have the case of the "base space" or whatever you call it from which the space tokens are reserved, and in future I believe we're considering being able to reserve spaces inside spaces. How could that be represented?
Currently I believe we can't store this information. If we want to do this, it might be possible to do so by allowing a StorageShare to link to another StorageShare in place of a StorageEnvironment. For example, one could create an abstract class StorageProvision (err, better name anyone?) that has two subclasses: StorageShare and StorageEnvironment. A StorageShare is linked to a StorageProvision, providing a hierarchy of objects that may (should?) end with a StorageEnvironment.
(There are also questions we've discussed in the past about things like dynamic spaces and default spaces which tend to produce more heat than light :)
From what I've heard, this is true! :-)
StorageMappingPolicy:
The StorageMappingPolicy describes how a particular UserDomain is allowed to access a particular StorageShare.
Should we say how this relates to the AccessPolicy? (which doesn't seem to appear explicitly in either the Computing or Storage diagrams but is presumably there anyway.)
Well, I thought the idea was that we could get by without the access policies being published. For example: (0. An end-user is a member of a UserDomain) 1.a A UserDomain has access to a StorageShare (discovered via a StorageMappingPolicy) 1.b A user already knows the ID of the StorageShare 1.c The user asks the StorageEndpoint for the StorageShare ID. 2. The StorageShare may have an associated StorageEndpoint. 3.a if so, ask the StorageEndpoint what protocols are available. 3.b if not, try each advertised AccessProtocol in user's order of preference.
| No member of a UserDomain may interact with a StorageShare except as | described by a StorageMappingPolicy.
As stated I don't think that can really be true, the SRM could potentially allow all kinds of things not explicitly published.
Sure, I've no problem changing this to a weaker statement, but...
The things which should be true are that there is an agreed set of things (maybe per grid?) which are published, and that the published values should be a superset of the "real" permissions - i.e. the SRM may in fact not authorise me even if the published value says that it will, but the reverse shouldn't be true.
I think your example here doesn't contradict the above statement "no member of a UserDomain may interact with a StorageShare except as [...]". If I've understood you correctly (with "published value should be a superset of the "real" permissions") then the SRM / TransferProtocols may choose not to honour the StorageMappingPolicy (e.g., ban certain users), but they won't allow some apparently random person in.
| The StorageMappingPolicies may contain information that is specific | to that UserDomain, such as one or more associated | StorageCapacities. If provided, these provide a UserDomain-specific | view of their usage of the underlying physical storage technology as | a result of their usage within the StorageShare.
I don't think I understand how this can be different from the Share to Capacity relation ...
We are present VO-specific information here. The Share-Capacity relation provides Share-centric view of storage. The MappingPolicy-Capacity relation (if present) provides Share-VO-centric view of storage, as needed by Nordogrid (iirc).
if you are saying that the Share can be multi-VO then I think something has gone wrong somewhere given that the Path and Tag can be VO-specific.
Yes, and the VO-specific Path and Tag *are* present in the StorageMappingPolicies for exactly this reason.
In the 1.3 schema the whole point of the VOInfo (which has become the Share) was to split out the information specific to each mapping policy (ACBR) from the generic information in the SA ...
Perhaps the assertion VOInfo has become the Share is not correct in this instance?
| The access policies describing which users of a UserDomain may use | the StorageEndpoint are not published.
Are you sure? (see comment above)
Currently, yes it's true: they're not published. One may deduce them, in some (most) cases, but not always. If no StorageSpaces are published, then the mapping cannot be deduced.
A StorageAccessProtocol describes one method by which end-users may sent data to be stored, received stored data, or undertake both operations.
sent -> send, received -> retrieve
Ta!
| Access to the interface may be localised; that is, only available | from certain computers. It may also be restricted to specified | UserDomains.
It might also only apply to certain storage components ...
"storage components" == StorageShares, right?
Phew .. I spent over two hours writing that, I hope someone reads it :)
(I'm reminded of a story about a student going for his (so the story goes) PhD viva / defence. The student walks into the room, places a bottle of champaign on the table and sits down. After successfully completing the viva he stands up, picks up the bottle and is about to leave the room when one of the examiners asks about the bottle. The student referred the examiner to approx. half way through the thesis, which included a short paragraph saying "if you mention this paragraph you get to keep the champaign" :-) Cheers, Paul.

On Wed, 9 Apr 2008, Paul Millar wrote:
b. some places have operation practise that they "just add more tapes" when the space looks like its filling up,
In that case it may still be interesting to publish the current total.
| In general, a StorageEnvironment may have one or more | RetentionPolicy values.
Not what it says in the current draft (0..1).
True ... I thought we had agreed that StorageEnvironment had 0..* multiplicity, but the docs don't seem reflect this.
I argued that GLUE probably should allow for the RP being multi-valued, and probably the AccessLatency as well. In WLCG/EGEE we would have a single value for each normally, so that the Storage Class is clear.
Does this correspond with SRM usage, i.e. can you have spaces with multiple RPs?
From memory, I believe this was a request from Maartin: that a StorageEnvironment could have multiple RPs. I'm not sure precisely why: perhaps indicating that a StorageShare may have different RPs, but this would be covered by the (potentially) one-to-many link between StorageShare and StorageEnvironment.
I did not ask for that (see my comment about the StorageEnvironment). In fact, I think a Share must belong to a single Environment, as I see it as a chunk that is carved out of a particular Environment.
What about defaults for other things, e.g. ExpirationMode?
I'm not sure having multiple ExpirationMode makes sense. If, for some reason, two ExpirationModes need to be indicated, could this be done with two different StorageEnvironments?
No. I argued that in SRM v2.2 the ExpirationMode (a.k.a. Type) does not behave like RP and AL: a space that is defined by RP and AL may support multiple ExpirationModes. In a particular space a file may have a finite or an infinite lifetime; the SRM may support either or both. So, at least ExpirationMode we would want to be multi-valued within the Environment.
If a StorageEnvironment is advertised as having a RetentionPolicy Custodial (only) and has two StorageCapacity (a nearline one and an online one), would that be OK?
In WLCG it would seem natural to identify an Environment with a Storage Class, in which case the RP and AL need to be treated together.
Maybe we should look back at the proposal for Storage Components made by Flavia, Maarten et al in the 1.3 discussion, or has someone already done that?
I'm not sure ... I don't think I've seen it. Do you a copy somewhere?
https://forge.gridforum.org/sf/go/doc14619?nav=1
A StorageShare is a logical partitioning of one or more StorageEnvironments.
Maybe I'm missing something, but how could you have more than one Environment for a single Share?
I think this isn't something useful to EGEE and it comes from one of the other grids.
Certainly our current structure doesn't allow it (one SA per many VOInfos but not vice versa), although as I said above that might be misleading.
Yes, I believe this was to allow a new, non-WLCG use-case. I have a vague memory of Maarten mentioning this, but I could be wrong.
See earlier comments. Thanks, Maarten

On Friday 11 April 2008 03:34:55 Maarten.Litmaath@cern.ch wrote:
On Wed, 9 Apr 2008, Paul Millar wrote:
b. some places have operation practise that they "just add more tapes" when the space looks like its filling up,
In that case it may still be interesting to publish the current total.
Sure, that's an option that GLUE should support; but equally, a grid might choose not ... it's a policy decision. This information should be optional. I'm more concerned with problem a. (can't reliably get that information). [StorageEnvironment RP multiplicity]
I argued that GLUE probably should allow for the RP being multi-valued, and probably the AccessLatency as well. In WLCG/EGEE we would have a single value for each normally, so that the Storage Class is clear.
I'm not sure what AccessLatency as a multivalue means: I'd push this attribute down to the hardware layer (StorageDatastore/StorageMedia/StorageStorage) It's possible that we've been talking slightly at cross-purposes. It seems to me that what WLCG means by "access latency" is really the minimum (ie, fastest) guaranteed access latency (MGAL): D0T1 this is nearline D1T0 this is online D1T1 this is online AccessLatency is a property of the hardware, MGAL is a property of the StorageEnvironment. For example, a file, stored in D0T1, may (usually only a short period) be available with online latency, when it's available from the online Datastore (the cache). There's a similar distinction between what custodial means at the StorageEnvironment (or StorageShare) level compared to at the hardware level (Datastore, or whatever). I guess this is: an StorageEnvironment is considered custodial if files are always stored within at least one StorageMedia/Datastore that is considered custodial. So, WLCG is really a boolean attribute canStoreAsCustodial (or similar). I don't know if this helps clarify the multi-RP / multi-AP issue. [...]
I'm not sure having multiple ExpirationMode makes sense. If, for some reason, two ExpirationModes need to be indicated, could this be done with two different StorageEnvironments?
No. I argued that in SRM v2.2 the ExpirationMode (a.k.a. Type) does not behave like RP and AL: a space that is defined by RP and AL may support multiple ExpirationModes. In a particular space a file may have a finite or an infinite lifetime; the SRM may support either or both. So, at least ExpirationMode we would want to be multi-valued within the Environment.
OK, I think I understand.
If a StorageEnvironment is advertised as having a RetentionPolicy Custodial (only) and has two StorageCapacity (a nearline one and an online one), would that be OK?
In WLCG it would seem natural to identify an Environment with a Storage Class, in which case the RP and AL need to be treated together.
So, yes. That would be OK.
[proposal for Storage Components made by Flavia], I'm not sure ... I don't think I've seen it. Do you a copy somewhere? https://forge.gridforum.org/sf/go/doc14619?nav=1
Ta. Paul.

Paul Millar wrote:
I argued that GLUE probably should allow for the RP being multi-valued, and probably the AccessLatency as well. In WLCG/EGEE we would have a single value for each normally, so that the Storage Class is clear.
I'm not sure what AccessLatency as a multivalue means: I'd push this attribute down to the hardware layer (StorageDatastore/StorageMedia/StorageStorage)
It's possible that we've been talking slightly at cross-purposes. It seems to me that what WLCG means by "access latency" is really the minimum (ie, fastest) guaranteed access latency (MGAL): D0T1 this is nearline D1T0 this is online D1T1 this is online
Yes, in WLCG we want it like that, but I wondered whether it is necessary to insist there be exactly 1 access latency. It does seem to make sense with the current ordered set of values: Online < Nearline < Offline But what about an SRM v1 or a Classic SE that has unspecified subsets of its name space going to tape (current practice): one might want to list both Online and Nearline as potential guaranteed latencies - some files have Online, other files have Nearline as their _guaranteed_ latency. A single value Nearline might suggest that everything goes to tape. Probably devil's advocate, but in GLUE we should be careful with insisting that a particular property shall _always_ hold... Thanks, Maarten

Paul Millar [mailto:paul.millar@desy.de] said:
Maarten.Litmaath@cern.ch said: [StorageEnvironment RP multiplicity]
I argued that GLUE probably should allow for the RP being multi-valued, and probably the AccessLatency as well. In WLCG/EGEE we would have a single value for each normally, so that the Storage Class is clear.
I'm not sure what AccessLatency as a multivalue means: I'd push this attribute down to the hardware layer (StorageDatastore/StorageMedia/StorageStorage)
Still going through old mails ... I believe the decision on this was to allow RP to be multivalued, even if not with the current technology, but not AL. Re the discussion over the relational implementation we should remember that that has a big overhead for multivalues, particularly if you may want to select on them, so we should probably try not to introduce them too freely. I can just about imagine that RP could be dynamic, e.g. for custodial the system could make more copies, so maybe it is worth allowing for that, but I still think that AL (and Tag) should be single-valued.
It's possible that we've been talking slightly at cross-purposes. It seems to me that what WLCG means by "access latency" is really the minimum (ie, fastest) guaranteed access latency (MGAL):
Yes - we had something of a discussion about this and tried to improve the wording, maybe you can see if it helped. As you observe part of the problem is that with latency smaller is better, so "minimum" sounds bad but is actually good. However it isn't LCG that defines it, AIUI that's the general SRM definition. When you're actually reading or writing a file it's always online ... also files can be pinned online but that doesn't count as "guaranteed" for these purposes, pinned files are still just MGAL nearline.
AccessLatency is a property of the hardware,
Strictly speaking that isn't always true, a file on disk may still get copied to a different disk before becoming readable and that can take a non-negligible time, but it probably isn't possible to take account of that.
I guess this is: an StorageEnvironment is considered custodial if files are always stored within at least one StorageMedia/Datastore that is considered custodial.
Not necessarily; you might just implement custodial by making multiple copies (which might even be more secure than tape - tapes do get corrupted after all). It's just LCG that insists on tying it to the hardware. Stephen

OK, longest one last ... just focussing on what I think are still live issues. Paul Millar [mailto:paul.millar@desy.de] said:
As a proposal, if we decide to include non-carbon-based entities as members of UserDomain, we could use "agents" instead of end-users. An end-users being an example of agent.
Yes - my main point was that the text should indicate the possibility.
(Similarly a Disk1 storage system might make extra cache copies to help with load balancing.)
True, they might well do this (dCache certainly does under various conditions); but I'd say that this is purely an internal issue and shouldn't be published in GLUE.
As always, the possibility is there if anyone wants it, that doesn't imply any compulsion.
I don't think we have any use-cases for publishing this information.
I could probably come up with one if I tried - but it doesn't matter because as long as the schema is defined in this way (i.e. the Type is an open enumeration) you have the option of extending things without changing the schema, hence you don't need to know if there are use-cases or not. Given the enormous difficulty of changing the schema I think it's good to have flexibility if you can provide it without significant cost (contrast with the last issue in this mail where flexibility would have a prohibitive cost).
I think the phraseology should be something like "a common category of storage" (although maybe "category" still isn't the right word).
Well, maybe. I personally find "category" is too vague and a somewhat circular definition.
This is possibly moot anyway if the generic Capacity object has gone away - if we have objects specific to the use then the description can also be more specific. The general point is that a Capacity (as we had it) was defined by its context, hence the general description would be bound to be vague, the details would be with the parent object.
I'd also like to go back to the question I posed in one of the meetings ... say that a site implements Custodial/Online by ensuring three distinct disk copies, how would we represent that? What about mirrored RAID, how much space do we record?
Err, I don't see the problem here; they should report the numbers that make sense, no? I guess I'm missing something...
Err, yes, but what does make sense? Consider a custodial/online space implemented three different ways: 1) LCG-style D1T1, so you explicitly publish both an Online Size and a Nearline Size, at least conceptually (you may choose not to publish them in a particular case). Hence file sizes and free space are both counted twice, but in separate online and nearline attributes. 2) You use mirrored RAID to implement custodial. Probably the natural way to publish is that both the Used and Free space are published once, the fact that there are mirror disks is hidden behind the scenes just like RAID parity disks. VOs get charged more per unit of space to reflect the higher cost, but they don't explicitly see the duplication. 3) You have a uniform set of disk servers, and store custodial files twice on physically different servers while replica files only get stored once. This is basically the same as the LCG scenario except that the second copy is on disk instead of tape. Now at least the free space is likely to be the real space, i.e. a 1 Gb custodial file will drop the free space by 2 Gb, but what do you count for the used space? If it isn't also 2 Gb then you lose the normalisation (used + free = total). However, either way it's hard to know how the free space relates to the file size without external information. My main point here is that we should have a clear definition that lets people decide what to publish in a given case - the fact that we may get unusual results in an unusual case, e.g. 3) above, is secondary.
From the phone confersation, it seems Stephen you view this as simply a work-around because UML doesn't support complex data types (is that a fair summary?)
Yes, and it seems that Felix and Sergio see it the same way.
Not wishing to be seen promoting or defending UML particularly, but I suspect this is omission is deliberate: if one is modelling something that needs a complex data-type then that complex data-type *is* representing something.
I have no idea - maybe UML does support it, I'm not an expert, but anyway LDAP (which is where my main interest lies) doesn't. The basic point here, which occurs in quite a few places, is the fact that you can't represent a table, for example if you have something like this embedded in an object: Key=forename Value=Stephen Key=surname Value=Burke it doesn't work because you can't tell which key goes with which value. What you have to do (e.g. see GlueServiceData in the 1.3 schema) is to have a separate objectclass with an instance per key/value pair. In this case we have a key (online, nearline, offline, cache, ...) and four values (the four kinds of size). That's clumsy, but then everything about LDAP is clumsy! If LDAP attributes were ordered then at least as far as LDAP is concerned that could go away and the table could just be embedded in the parent object, but as it is we can't do it.
The description is not extending the concept, but better defining it; so, rather than saying "its a bunch of numbers we might want to record", the document offers an explanation of why the object class exists.
Well, since it was my idea in the first place I think I can reasonably claim to know what I meant! You could argue to change it, but I don't think you can say that you know what I was thinking better than I do ...
It doesn't define this: the (semi-) infinite tape is meant as an informative example where not publishing totalSize might make sense.
OK - but potentially almost anything could have examples like that. As I've said in other mails, my main concern is that if attributes are published they should follow a clear definition, in most cases it isn't a problem if they aren't there at all, and I don't think it needs any particular justification at the schema level. Anything not compulsory is optional :)
a. not all tape systems provide an easy (or sometimes, any) mechanism for discovering the current totalSize,
That's an argument for why you might not publish it in practice, not for why it can't be defined.
b. some places have operation practise that they "just add more tapes" when the space looks like its filling up,
Fine, so the free spaces increases. RAL is in the process of adding lots more disk, and the disk space allocated to various service classes changes fairly frequently, but no-one seems to have any problem with doing df or equivalent to report the current picture.
The argument for making totalSize optional is that a) sometimes it's impossible to discover, b) sometimes it's a meaningless concept.
We all agree that it's optional!!! We've been having this discussion for about three years now, could we just accept that optional really does mean optional and is not some sneaky plan to coerce people into publishing something against their will ... (I declare the concept of a phone number to be invalid because I might not want to tell people mine :) [Many Environment comments snipped as the Environment has gone away, at least for now - also some things are subsumed in the Datastore proposal]
StorageMappingPolicy:
The StorageMappingPolicy describes how a particular UserDomain is allowed to access a particular StorageShare.
Should we say how this relates to the AccessPolicy? (which doesn't seem to appear explicitly in either the Computing or Storage diagrams but is presumably there anyway.)
Well, I thought the idea was that we could get by without the access policies being published.
Perhaps you can, but it's there in the model at least. Also it may depend on how your tools work. For example, at the moment we (EGEE) have a service discovery tool where you can say "find all services of type x", where x may be SRM or anything else. If that checks authorisation it will be looking at the generic AccessPolicy, it won't know all the details specific to storage.
1.a A UserDomain has access to a StorageShare (discovered via a StorageMappingPolicy)
Well, formally in the model that would be wrong, you're supposed to discover access rights to a service through the AccessPolicy. It may happen to be the case that they are the same, but it may not - for example a VO may be authorised (AccessPolicy on the Endpoint) but not happen to have any Shares defined, or maybe it has some but they aren't published for some reason. Or maybe the Shares are defined but access is turned off (indeed this is how Freedom of Choice currently works, it edits the VO name out of the ACBR to stop the CE being discovered).
The things which should be true are that there is an agreed set of things (maybe per grid?) which are published, and that the published values should be a superset of the "real" permissions - i.e. the SRM may in fact not authorise me even if the published value says that it will, but the reverse shouldn't be true.
I think your example here doesn't contradict the above statement "no member of a UserDomain may interact with a StorageShare except as [...]".
It may not contradict it per se but I think the implications are different. Your statement reads like a statement about authorisation (anything not specifically allowed is forbidden). Mine is about expectations - if I expect a service to tell me if I'm allowed to do something then I may well ignore it if it doesn't, which is usually a bad thing (maybe not always). [If a country's web site says the country requires a visa when it actually doesn't it may deter would-be visitors - which would usually not be what the country would want.] [Shared Share comments snipped as this has moved on]
| Access to the interface may be localised; that is, only available | from certain computers. It may also be restricted to specified | UserDomains.
It might also only apply to certain storage components ...
"storage components" == StorageShares, right?
Well, possibly - what I was thinking was that you may have e.g. an OPN connection to some of the storage hardware but not all of it. Basically you can imagine almost arbitrarily complicated scenarios, which we have no chance of representing in a generic way, so we aren't even trying. If someone comes up with a real-world important use case then we can have a go at covering that specific case.
The student referred the examiner to approx. half way through the thesis, which included a short paragraph saying "if you mention this paragraph you get to keep the champaign" :-)
I did actually put a joke reference [1] in my thesis, fairly well hidden - and one of the examiners (Roger Barlow in fact) spotted it! Stephen [1] For the Standard Model - something like "S. Fox et al, Sun, p3" ...

Maarten Litmaath [mailto:Maarten.Litmaath@cern.ch] said:
Where 'S' stands for Samantha: I hope she is not into supergravity these days! :-)
Sorry, I hit send by accident there ... what I was going to say is that that was my other joke reference, albeit from a real journal: http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TVK-46G5CH1-8 3&_user=910841&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C00004784 1&_version=1&_urlVersion=0&_userid=910841&md5=43f1fafb71fb29d0abbdcf5428 1c2c59 Stephen
participants (6)
-
Burke, S (Stephen)
-
Felix Nikolaus Ehm
-
Maarten Litmaath
-
Maarten.Litmaath@cern.ch
-
Paul Millar
-
Sergio Andreozzi