Hi all,

I asked Jean Yves Nief who's very active in the iRODS scene and works at IN2P3 where they have a large iRODS based data store to comment on the StAR doc.

Here's his feedback.

cheers
johnk



-------- Original Message --------
Subject: Re: Fwd: [UR-WG] EMI StAR – Definition of a Storage Accounting Record - ready for public comments
Date: Tue, 20 Mar 2012 18:03:16 +0100
From: Jean-Yves Nief <nief@cc.in2p3.fr>
To: john alan kennedy <jkennedy@rzg.mpg.de>


dear John,

            I finally had time to take a look at your document. I do 
not have much comments at this point as I don't know the exact context 
of the discussion.
In 5.18 (ResourceCapacityUsed), it is said that this information should 
contain the space used for redundancy in RAID setups for example. At the 
middleware level, I am not sure if this information is totally relevant. 
The important thing at the middleware level is the amount of disk space 
available for the grid users. The real amount of disks installed for 
example is some kind of internal cooking. On the other hand, it might be 
interesting to have some metadata information wrt the level of data 
security of a given storage resource (RAID level etc...). A storage 
resource could be less capacitive than an other one but provide more 
data security on the other hand: it could be an element used in the 
assessment of the level of service performed by a given storage resource 
(and not only the amount of space provided). Also the latency to 
retrieve the data should be documented (is it online, "nearline", 
offline ?), that's important when you are dealing with hybrid system 
with both storage resources and tapes for example.
In section 6 (Intentionally Left Out Properties): if the site id can be 
found somewhere else, that's fine. However, having infos spread over 
several information systems might be a bit dangerous. And one way or an 
other, you need to link what is available in a storage system (or part 
of it) with a given site.
The transfer information is said to be related to network resources, 
hence not in the scope of this document. However, it is an important 
feature. For example, serving 30 TBs of data for thousands of users is 
not going to be the same as having 30 TBs for archival purposes which is 
going to be accessed once in a while. Ie, the amount of transfer is 
going to have some consequences on your storage system design solutions 
(both hardware and software) and not only on the network resource, again 
the investment and the cost are not going to be the same. If you take 
the case of Amazon S3, they are charging also for the network usage: 
this is not separated from storage usage as it is defined in this 
document. Also, if you have an hybrid system, the amount of data 
transfer is directly connected to the number of cache disk that you 
have, the number of tape drives etc...
cheers,
JY