
OK, longest one last ... just focussing on what I think are still live issues. Paul Millar [mailto:paul.millar@desy.de] said:
As a proposal, if we decide to include non-carbon-based entities as members of UserDomain, we could use "agents" instead of end-users. An end-users being an example of agent.
Yes - my main point was that the text should indicate the possibility.
(Similarly a Disk1 storage system might make extra cache copies to help with load balancing.)
True, they might well do this (dCache certainly does under various conditions); but I'd say that this is purely an internal issue and shouldn't be published in GLUE.
As always, the possibility is there if anyone wants it, that doesn't imply any compulsion.
I don't think we have any use-cases for publishing this information.
I could probably come up with one if I tried - but it doesn't matter because as long as the schema is defined in this way (i.e. the Type is an open enumeration) you have the option of extending things without changing the schema, hence you don't need to know if there are use-cases or not. Given the enormous difficulty of changing the schema I think it's good to have flexibility if you can provide it without significant cost (contrast with the last issue in this mail where flexibility would have a prohibitive cost).
I think the phraseology should be something like "a common category of storage" (although maybe "category" still isn't the right word).
Well, maybe. I personally find "category" is too vague and a somewhat circular definition.
This is possibly moot anyway if the generic Capacity object has gone away - if we have objects specific to the use then the description can also be more specific. The general point is that a Capacity (as we had it) was defined by its context, hence the general description would be bound to be vague, the details would be with the parent object.
I'd also like to go back to the question I posed in one of the meetings ... say that a site implements Custodial/Online by ensuring three distinct disk copies, how would we represent that? What about mirrored RAID, how much space do we record?
Err, I don't see the problem here; they should report the numbers that make sense, no? I guess I'm missing something...
Err, yes, but what does make sense? Consider a custodial/online space implemented three different ways: 1) LCG-style D1T1, so you explicitly publish both an Online Size and a Nearline Size, at least conceptually (you may choose not to publish them in a particular case). Hence file sizes and free space are both counted twice, but in separate online and nearline attributes. 2) You use mirrored RAID to implement custodial. Probably the natural way to publish is that both the Used and Free space are published once, the fact that there are mirror disks is hidden behind the scenes just like RAID parity disks. VOs get charged more per unit of space to reflect the higher cost, but they don't explicitly see the duplication. 3) You have a uniform set of disk servers, and store custodial files twice on physically different servers while replica files only get stored once. This is basically the same as the LCG scenario except that the second copy is on disk instead of tape. Now at least the free space is likely to be the real space, i.e. a 1 Gb custodial file will drop the free space by 2 Gb, but what do you count for the used space? If it isn't also 2 Gb then you lose the normalisation (used + free = total). However, either way it's hard to know how the free space relates to the file size without external information. My main point here is that we should have a clear definition that lets people decide what to publish in a given case - the fact that we may get unusual results in an unusual case, e.g. 3) above, is secondary.
From the phone confersation, it seems Stephen you view this as simply a work-around because UML doesn't support complex data types (is that a fair summary?)
Yes, and it seems that Felix and Sergio see it the same way.
Not wishing to be seen promoting or defending UML particularly, but I suspect this is omission is deliberate: if one is modelling something that needs a complex data-type then that complex data-type *is* representing something.
I have no idea - maybe UML does support it, I'm not an expert, but anyway LDAP (which is where my main interest lies) doesn't. The basic point here, which occurs in quite a few places, is the fact that you can't represent a table, for example if you have something like this embedded in an object: Key=forename Value=Stephen Key=surname Value=Burke it doesn't work because you can't tell which key goes with which value. What you have to do (e.g. see GlueServiceData in the 1.3 schema) is to have a separate objectclass with an instance per key/value pair. In this case we have a key (online, nearline, offline, cache, ...) and four values (the four kinds of size). That's clumsy, but then everything about LDAP is clumsy! If LDAP attributes were ordered then at least as far as LDAP is concerned that could go away and the table could just be embedded in the parent object, but as it is we can't do it.
The description is not extending the concept, but better defining it; so, rather than saying "its a bunch of numbers we might want to record", the document offers an explanation of why the object class exists.
Well, since it was my idea in the first place I think I can reasonably claim to know what I meant! You could argue to change it, but I don't think you can say that you know what I was thinking better than I do ...
It doesn't define this: the (semi-) infinite tape is meant as an informative example where not publishing totalSize might make sense.
OK - but potentially almost anything could have examples like that. As I've said in other mails, my main concern is that if attributes are published they should follow a clear definition, in most cases it isn't a problem if they aren't there at all, and I don't think it needs any particular justification at the schema level. Anything not compulsory is optional :)
a. not all tape systems provide an easy (or sometimes, any) mechanism for discovering the current totalSize,
That's an argument for why you might not publish it in practice, not for why it can't be defined.
b. some places have operation practise that they "just add more tapes" when the space looks like its filling up,
Fine, so the free spaces increases. RAL is in the process of adding lots more disk, and the disk space allocated to various service classes changes fairly frequently, but no-one seems to have any problem with doing df or equivalent to report the current picture.
The argument for making totalSize optional is that a) sometimes it's impossible to discover, b) sometimes it's a meaningless concept.
We all agree that it's optional!!! We've been having this discussion for about three years now, could we just accept that optional really does mean optional and is not some sneaky plan to coerce people into publishing something against their will ... (I declare the concept of a phone number to be invalid because I might not want to tell people mine :) [Many Environment comments snipped as the Environment has gone away, at least for now - also some things are subsumed in the Datastore proposal]
StorageMappingPolicy:
The StorageMappingPolicy describes how a particular UserDomain is allowed to access a particular StorageShare.
Should we say how this relates to the AccessPolicy? (which doesn't seem to appear explicitly in either the Computing or Storage diagrams but is presumably there anyway.)
Well, I thought the idea was that we could get by without the access policies being published.
Perhaps you can, but it's there in the model at least. Also it may depend on how your tools work. For example, at the moment we (EGEE) have a service discovery tool where you can say "find all services of type x", where x may be SRM or anything else. If that checks authorisation it will be looking at the generic AccessPolicy, it won't know all the details specific to storage.
1.a A UserDomain has access to a StorageShare (discovered via a StorageMappingPolicy)
Well, formally in the model that would be wrong, you're supposed to discover access rights to a service through the AccessPolicy. It may happen to be the case that they are the same, but it may not - for example a VO may be authorised (AccessPolicy on the Endpoint) but not happen to have any Shares defined, or maybe it has some but they aren't published for some reason. Or maybe the Shares are defined but access is turned off (indeed this is how Freedom of Choice currently works, it edits the VO name out of the ACBR to stop the CE being discovered).
The things which should be true are that there is an agreed set of things (maybe per grid?) which are published, and that the published values should be a superset of the "real" permissions - i.e. the SRM may in fact not authorise me even if the published value says that it will, but the reverse shouldn't be true.
I think your example here doesn't contradict the above statement "no member of a UserDomain may interact with a StorageShare except as [...]".
It may not contradict it per se but I think the implications are different. Your statement reads like a statement about authorisation (anything not specifically allowed is forbidden). Mine is about expectations - if I expect a service to tell me if I'm allowed to do something then I may well ignore it if it doesn't, which is usually a bad thing (maybe not always). [If a country's web site says the country requires a visa when it actually doesn't it may deter would-be visitors - which would usually not be what the country would want.] [Shared Share comments snipped as this has moved on]
| Access to the interface may be localised; that is, only available | from certain computers. It may also be restricted to specified | UserDomains.
It might also only apply to certain storage components ...
"storage components" == StorageShares, right?
Well, possibly - what I was thinking was that you may have e.g. an OPN connection to some of the storage hardware but not all of it. Basically you can imagine almost arbitrarily complicated scenarios, which we have no chance of representing in a generic way, so we aren't even trying. If someone comes up with a real-world important use case then we can have a go at covering that specific case.
The student referred the examiner to approx. half way through the thesis, which included a short paragraph saying "if you mention this paragraph you get to keep the champaign" :-)
I did actually put a joke reference [1] in my thesis, fairly well hidden - and one of the examiners (Roger Barlow in fact) spotted it! Stephen [1] For the Standard Model - something like "S. Fox et al, Sun, p3" ...