Defining various Size metrics

Dear all, I've started a wiki page that tries to give a more precise definition of the space capacity metrics for Store Element and Storage Area objects in Glue v1.3. The current content is very much work-in-progress (and missing lots of detail), but is available from: http://trac.dcache.org/trac.cgi/wiki/GLUE/capacity Comments are appreciated. This may end up being WLCG-specific (have other people adopted Glue v1.3?), but as many of the people on this mailing list have experience with Glue v1.3, I hope you don't mind me posting this here. Cheers, Paul.

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Paul Millar said: Comments are appreciated.
OK ... "Sometime after a successful upload of a file such that it uses online capacity, the UsedOnlineSize MUST increase." That would only be true if you insist that all storage managed by the SE has to be reflected in these numbers, and probably we can't say that - and if we did it should be added as a constraint for all the attributes. Also I'm not sure this would always be guaranteed to be true anyway, e.g. if you had the equivalent of a fragmented disk you might count deleted fragments as still used, but a new file might happen to fit within one (this is possibly more relevant for tape systems). "Q/ is the size of the increase in UsedOnlineSize for a successful file upload necessarily equal to the size of the file?" Modulo the comment above I think the answer is mostly "yes". These are supposed to be high-level summary numbers, so it should reflect what the users think they've stored and not what the system does internally. If the system makes multiple copies that should be reflected in what it "charges" per Gb stored. The exception might be if the users explicitly ask for multiple copies to be stored in a storage class which wouldn't otherwise do that. Similarly if the users do their own compression you would count the compressed size. "Q/ Is the value of TotalOnlineSize - UsedOnlineSize meaningful? Particular, can one conclude that, if TotalOnlineSize - UsedOnlineSize ≲ file_size, then attempting to upload a file of file_size will likely fail? This is related to the previous question." It's clearly somewhat meaningful - if a new file is much bigger than total - used then it's fairly likely to fail, but I don't think you can make a definite statement. Anyway tests of that kind shouldn't usually be made at this level, you should normally be looking at the SA when considering the storage of specific new files. At this level, if Total - Used is getting small it should be a flag (e.g. to people in a ROC or WLCG management) that the SE may be heading towards a problem. "Two access latencies are used: online and nearline" In principle we also have offline storage, e.g. tapes which have been removed from a robot and put in a cupboard, and would need a human operator to re-insert them. "In practice, online means the file is stored on a (spinning) magnetic hard-disks." You imply that spun-down discs would count as nearline, but I'm not entirely sure if that's true. Similarly I'm not sure how we would categorise a system where tapes were permanently mounted in drives. Stephen

Hi Stephen, Thanks for the valuable feedback. Comments below: On Thursday 19 June 2008 15:36:14 Burke, S (Stephen) wrote:
"Sometime after a successful upload of a file such that it uses online capacity, the UsedOnlineSize MUST increase."
That would only be true if you insist that all storage managed by the SE has to be reflected in these numbers, and probably we can't say that
OK. I've added an extra assertion: * The UsedOnlineSize metric SHOULD consider all storage managed by the SE that has online access latency. Although, perhaps this should be "MAY" if "SHOULD" too strong. I've also added a qualifier phrase ("considered by UsedOnlineSize") to the other assertions.
and if we did it should be added as a constraint for all the attributes.
OK, done.
Also I'm not sure this would always be guaranteed to be true anyway, e.g. if you had the equivalent of a fragmented disk you might count deleted fragments as still used, but a new file might happen to fit within one (this is possibly more relevant for tape systems).
I've tried to adjust the words to allow for this; for example, "previously unused online capacity". If the deleted fragments are considered still "used" and a file is uploaded into one of these fragmented "holes" then no additional "used" online capacity is used, so UsedOnlineSize should not increase. I've also added this as an open question. Although I don't there are likely online underlying storage that has this property. AFAIK, all filesystems support file fragmentation, so this isn't an issue in practise.
"Q/ is the size of the increase in UsedOnlineSize for a successful file upload necessarily equal to the size of the file?"
Modulo the comment above I think the answer is mostly "yes". These are supposed to be high-level summary numbers, so it should reflect what the users think they've stored and not what the system does internally. If the system makes multiple copies that should be reflected in what it "charges" per Gb stored. The exception might be if the users explicitly ask for multiple copies to be stored in a storage class which wouldn't otherwise do that. Similarly if the users do their own compression you would count the compressed size.
OK, I've taken out the question and added it as an assertion.
"Q/ Is the value of TotalOnlineSize - UsedOnlineSize meaningful? Particular, can one conclude that, if TotalOnlineSize - UsedOnlineSize ≲ file_size, then attempting to upload a file of file_size will likely fail? This is related to the previous question."
It's clearly somewhat meaningful - if a new file is much bigger than total - used then it's fairly likely to fail,
OK, added this as an assertion ...
but I don't think you can make a definite statement.
... with a "likely" qualifier (which is pretty vague). I think that's fair enough. We could say, for uploads: file_size >> total - used => transfer will likely fail file_size ~= total - used => transfer at risk of failing would that be better?
Anyway tests of that kind shouldn't usually be made at this level, you should normally be looking at the SA when considering the storage of specific new files.
Duly noted as a comment in the assertion.
At this level, if Total - Used is getting small it should be a flag (e.g. to people in a ROC or WLCG management) that the SE may be heading towards a problem.
Aye, although one has to define "small", which is probably a user- (or perhaps VO-) specific number, depending on the usage the VO makes of that SE. I think the strongest statement is that file uploads of a certain size will likely fail. As you say, if a ROC or WLCG know that their VOs work-flow involves uploading files of a certain size, they can (using that assertion) deduce that "something's up" and flag (via SAM or GGUS, say) that there's a problem.
"Two access latencies are used: online and nearline"
In principle we also have offline storage, e.g. tapes which have been removed from a robot and put in a cupboard, and would need a human operator to re-insert them.
Yes. I left it out as Glue doesn't mention it (iirc), but the concept exists so leaving it out might cause more confusion. I've now included a brief description of offline too.
"In practice, online means the file is stored on a (spinning) magnetic hard-disks."
You imply that spun-down discs would count as nearline, but I'm not entirely sure if that's true.
I believe the spin-up time is typically a few seconds; so, yes, that should be OK. The "(spinning)" has been removed.
Similarly I'm not sure how we would categorise a system where tapes were permanently mounted in drives.
Me neither. On the whole, I'd say tape-drives with a single tape are nearline as seeking to read a file might take a minute or so, depending on the average size of the files being stored. But, I've added it as an open question. Cheers, Paul.
participants (2)
-
Burke, S (Stephen)
-
Paul Millar