Re: [glue-wg] Datastore proposal

11 Apr 2008


      Hi,

most of the things I have commented should find their validity not only
in this [StorageStorage|StorageDatastore] discussion but also in the
general context of GLUE 2.0 storage schema.
...
-----Original Message-----
From: glue-wg-bounces@ogf.org 
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Burke, S (Stephen)
Sent: Freitag, 11. April 2008 23:42
To: Paul Millar; glue-wg@ogf.org
Subject: Re: [glue-wg] Datastore proposal
glue-wg-bounces@ogf.org
...
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Paul Millar said:
...
Name: Human-readable name (maybe indicating the technology, e.g.
StorageTek)
Yes, in principle; although I'd be weary of suggestion people put 
technology names into Name.
Yes, this is always difficult, and I think the schema 
document should make a general statement about IDs and Names 
that they should never contain any metadata to be parsed by a 
program. However, in practice they always do contain things 
that humans will recognise, so it may as well be something 
that helps a human understand what's going on - especially 
for Names which are explicitly supposed to be for human 
consumption, e.g. in monitoring displays. In practice sites 
probably have some internal name for many of these things anyway.
...
...
Type (disk, tape, ... - open enumeration) (or maybe call
this attribute
Medium?)
This is definitely nit-picking, but for many instances this 
would be 
more "media".  Using "type" would avoid the singular / 
plural issue (I 
think).
I think this is just because computer people misuse language 
- it's correct to describe tape as a storage medium, 
singular, and the fact that people refer to "media" meaning 
"a bunch of tapes" is incorrect.
I don't know what computer people are, but I agree that 'media' is the
appropriate word to use in such context.
...
...
...
Latency: Enumeration {Online, Nearline, Offline} (probably
no need to
make this open?)
Do we define what online, nearline and offline mean somewhere?
SRM does - probably we should copy it. However, I think it 
isn't really an SRM-specific term, it should be general 
enough to use with other things, hence my suggestion that the 
enumeration can be closed. In theory I suppose you could have 
levels of nearline-ness according to how long the latency 
really is, but I doubt that we need to worry about it.
...
Perhaps the "total amount of data that can stored without operator 
intervention and when operating correctly".  Would that be 
sufficient?
Something like that. In general we should spell things out as 
much as possible, people can be very creative in 
misinterpreting definitions :)
...
Aye.  We should record the storage actually available to end-users, 
the ten hot-spare disks shouldn't be recorded in that
number.
Indeed.
Yupp.
...
...
Stephen, your description seems to map to precious + 
precious&sticky + 
cache + cache&sticky.  However, for most systems this 
should be ~100% 
of totalSize most of the time, so I'm not sure how useful 
that number
is.
I'm inclined to think that's correct here: if we're 
describing the real hardware and it really is full of files 
then that's fine. The hardware doesn't care what the SRM 
thinks the files are for (or who they belong to). This kind 
of distinction is more of a problem for the other places we 
use Capacities, and indeed is the reason we get so much 
argument about what to publish.
...
Perhaps we should publish two numbers?
No, I don't think so; if we want this information at all it 
should be attached to the SRM-level objects like Share and 
Environment, because the SRM is what knows that one file is 
in a cache and another is precious.
...
...
FreeSize: TotalSize - UsedSize, i.e. the free space at the
filesystem
level.
Is this an axiomatic relationship?  If so, it probably isn't worth 
recording it.
That's another perennial debate - traditionally we've gone 
with publishing the complete set even if you can derive one 
of them from the others, rather than forcing clients to do 
the sums. You can see the same kind of thing on the CE side, 
e.g. TotalJobs = RunningJobs + WaitingJobs. As always it's 
optional so a given grid or info provider may not in fact 
publish all of them.
[Hardware compression]
...
OK, I think this is a can-o-worms that we don't want to open.
Well, we have to open it at least part way, otherwise we 
leave the definition ambiguous, and you can bet that 
different people would make different choices :)
...
2. for some tape systems, it would be very difficult to obtain the 
actual storage usage (the "tape occupancy"?).
Maybe, but then you just wouldn't publish it. The reason I 
went for that definition is that the alternative would create 
numbers which are hard to interpret, e.g. UsedSize > 
TotalSize - the compression factor is variable so it doesn't 
make sense to scale the TotalSize.
Also it seems to me that something in the system must know 
what the occupancy is, otherwise how can it decide whether a 
new file can be written to a given tape?
For CASTOR this information is available and kept in the "VolumeManager"
tables. I strongly assume that this information can be obtained somehow
from storage systems which have a tape backend.
(see also next comment)
...
...
3. sometimes a file store operation can fail.  If so, the tape 
software may retry, but some (potentially unknown) fraction of the 
file has been written to tape.  Does this count towards to actual 
occupancy?
In principle I'd argue that it should reduce the TotalSize, 
in practice you'd probably just ignore it - you can never 
expect to fill your storage 100%.
This tape space is used and is not available until the tape has been
repacked.
But I agree with Stephen and I am sure it is fine that in GLUE we don't
care about this number.
The opposite is that we would publish the 'lost' space (due to errors)
of such system in GLUE - ugly!
Also, HSM with tape backend do have monitoring tools to see how much
tape space is left (I don't think that e.g. CASTOR Tape Operations
considers the lost space as 'theoretically free'). This number can then
be published into GLUE.
...
...
4.  I believe Castor had an issue when deleting files (leading to 
"repacking"?)  If we're attempting to account for actual yardage of 
tape used, how would this be accounted?
well, its the nature of tapes and of how data is written. Other HSM with
tape should have the same problems. (Except they will seek over the tape
to find a suitable space for a given file)
...
I think that's the same kind of thing, it may be that some of 
your "free" space is not actually usable in practice. I think 
it would be much too complicated to try to represent that 
explicitly - bear in mind that this is just supposed to be a 
simple definition of an object we probably don't need at all!
...
I think the only thing we can publish is the (user-domain) 
file size 
that has been recorded to tape.  I believe this is the 
actual number 
people are interested.
It's what they're interested in when they look at e.g. the 
Share, and what they should find there. If they want to look 
at a hardware description (which they may well not) they 
should see the hardware numbers ...
As you say it: we want to cover in GLUE the big portion of use cases.
This hardware level view -especially from users- doesn't appear to me as
a main one. Please correct me if I am wrong. I'll then incorporate it
into the Use-Case document.


Cheers, 
	Felix

Re: [glue-wg] Datastore proposal

Felix Nikolaus Ehm