
Hi all, I asked Jean Yves Nief who's very active in the iRODS scene and works at IN2P3 where they have a large iRODS based data store to comment on the StAR doc. Here's his feedback. cheers johnk -------- Original Message -------- Subject: Re: Fwd: [UR-WG] EMI StAR – Definition of a Storage Accounting Record - ready for public comments Date: Tue, 20 Mar 2012 18:03:16 +0100 From: Jean-Yves Nief <nief@cc.in2p3.fr> To: john alan kennedy <jkennedy@rzg.mpg.de> dear John, I finally had time to take a look at your document. I do not have much comments at this point as I don't know the exact context of the discussion. In 5.18 (ResourceCapacityUsed), it is said that this information should contain the space used for redundancy in RAID setups for example. At the middleware level, I am not sure if this information is totally relevant. The important thing at the middleware level is the amount of disk space available for the grid users. The real amount of disks installed for example is some kind of internal cooking. On the other hand, it might be interesting to have some metadata information wrt the level of data security of a given storage resource (RAID level etc...). A storage resource could be less capacitive than an other one but provide more data security on the other hand: it could be an element used in the assessment of the level of service performed by a given storage resource (and not only the amount of space provided). Also the latency to retrieve the data should be documented (is it online, "nearline", offline ?), that's important when you are dealing with hybrid system with both storage resources and tapes for example. In section 6 (Intentionally Left Out Properties): if the site id can be found somewhere else, that's fine. However, having infos spread over several information systems might be a bit dangerous. And one way or an other, you need to link what is available in a storage system (or part of it) with a given site. The transfer information is said to be related to network resources, hence not in the scope of this document. However, it is an important feature. For example, serving 30 TBs of data for thousands of users is not going to be the same as having 30 TBs for archival purposes which is going to be accessed once in a while. Ie, the amount of transfer is going to have some consequences on your storage system design solutions (both hardware and software) and not only on the network resource, again the investment and the cost are not going to be the same. If you take the case of Amazon S3, they are charging also for the network usage: this is not separated from storage usage as it is defined in this document. Also, if you have an hybrid system, the amount of data transfer is directly connected to the number of cache disk that you have, the number of tape drives etc... cheers, JY