Hi Jon,
I'm sorry to say that I won't be able to attend the OGF
meeting. But I obviously look forward to seeing the results of
the discussions there.
I have read through the Doc and do have some comments. (I'm
obviously missing a lot since I have not attended an OGF for a
while but here goes)
General Comments:
1) w.r.t usage the ogf UR.
I don't think this is a new comment
If possible it would be ideal to ensure that the UR for
storage and the UR for compute can somehow be coupled
together.
This could either mean extending the UR so it could include
both storage and compute or ensuring that both records could
be identified as having the same logical origin (site and our
group) during the same time period.
This may be achievable in the process of transforming
accounting records into billing records. So you may be able to
argue it isn't needed in the UR but is part of the machinery
for processing the URs into billing records for funding.
I just feel that if possible we should ensure that both
compute and storage URs can be correctly associated and
processed together.
Anyway the StAR may be seen as stand alone and as a practical
forerunner which could help show the way to defining a more
global UR.
2) Section 2.1.1
Although I agree with you that storage accounting is more
problematic that cpu accounting I still have the feeling that
this is partially due to the storage systems themselves.
I feel you outline why things are difficult and then with StAR
you go about defining the best way to do storage accounting
with what we have at the moment.
On our batch system (sge) I have qstat which gives me a view
of current resource usage and I have qacct which gives me the
ability to get a summarised usage over a time period based on
user/group.
At some point I'd like to see if the storage engine providers
feel that this type of functionality could be added to their
systems.
That would make the job of providing storage accounting much
simpler for us.
It may be that the providers say it's simply too much.
Specific Comments:
1) Section 2.1.2
"Identity: Describes the person or group.... " should this be
and/or can you have person,person+group,group?
2) Section 2.1.3
Allowing additional records: I know some people are against
such practices since allowing this can break standards (people
put whatever they want in and things start to get incompatible
... I guess it's your design choice and it's up to you)
"This makes it possible to automatically remove user and group
information"
I'd suggest:
"This makes it possible to automatically remove user and group
information, a practice which may be needed for anonymization
purposes"
so I know why people may need to do this.
3) Section 2.2.2
"The specifications that are made in the following are based
on a context that the reader needs to comprehend."
I assume you mean that the following two
specifications/Definitions "A Storage Resource" and "Storage
Accounting" are used in this document and are important for
it's understanding.
I think this needs some re-wording for clarity.
4) Section 2.4.1
For the opening sentence: from the original UR doc I like the
sentence "The UsageRecord element encapsulates a single Usage
Record" And I don't mind stealing.
The term "property" is used here and at numerous other places
in the document.
I understand that this is since it's a property of your
record, however, I would suggest using the term "element"
since you have already defined the XML nature of the record
and I think this would make things a little easier to
understand.
"top container property" -> "top level element of the
storage record format"
(maybe keeping the "container" is ok? "container element" ?)
5) Section 2.4.3
If you wish to follow the "property -> element" suggestion.
"The field has two attributes" -> "The RecodrIdentity
element has two attributes"
"The field is similar to the field with the same name in the
Usage Record standard" - do you need to say this?
Saying "similar" makes me start to think,"can't be used with
UR", "why not the same?"
You explicitly stated earlier that you've taken steps to make
things similar but had problems so I am not sure if you need
to say this here.
6) Section 2.4.4
"The storage system value SHOULD be constructed in such a way
that it globally identifies the storage system"
I was originally not sure if MUST would be better but I assume
you have worries about the ability of people to enforce this?
Also "globally" I think this should be accompanied with a
"uniquely".
I believe here you wish to make the recommendation that people
use a unique global name?
"globally" is not "uniquely" but I think that is what you want
and I think you use the term "global" to mean this in a few
places.
I'd suggest skimming the doc and adding unique where you mean
this to make things more clear.
7) Section 2.4.5
"StorageShare" why "Share"? This makes me think of my share of
the pie or fair share and it's a touch misleading.
I'd suggest some thought about an alternative name, but I
don't have a good suggestion "StorageSubSystem"?
8) Section 2.4.9
"DirectoryPath" We all tend to think in unix terms and at
least the storage systems that I have met have a tendency to
expose their content in these terms as well but is it really a
directory path and not a namespace path. I fear there's some
storage system out there that I didn't meet yet that doesn't
have a /abc/def/file format for displaying the data
collections.
It's an optional element so this may be a moot point.
Either way I think it would be good to include a term such as
"logical namespace" within your description to clarify that
it's not physical but the storage systems logical namespace
that you are referring to.
w.r.t "the record should account for all usage in the
directory and only that directory".
do you really mean "only that dir"?
a) would this not limit it's usefulness?
If I have /atlas/data/2011 and I want a record that contains
atlas 2011 data usage I would need to sum through all subdirs
Is this what you really mean? (dir+subdirs)
b) would you allow a container that has a list of all subdirs?
e.g.
<SubDirs>
<SubDir>/atlas/data/2011/January</SubDir>
<SubDir>/atlas/data/2011/February</SubDir>
<SubDir>/atlas/data/2011/March</SubDir>
<SubDirs>
Would you consider allowing regexp in these definition (is
this possible?)
c) If you do mean "dir+subdirs" then any links which are made
within this tree could break you out and cause problems
(wrong/double accounting etc) so it would be good to be
explicit that these should be ignored.
9) Section 2.4.11
"MUST be under the SubjectIdentity" if you take my xml
elements comment from before (my 3rd comment) then this could
be "MUST be a child element of the SubjectIdentity element"
This and similar things happen a few time throughout the
document, in 2.4.12-2.4.13 etc... skim and change if you like.
10) Section 2.4.15
"GroupAttribute"
If you do use the XML context (my 3rd comment) then using the
term "Attribute" here may cause a little confusion.
So re-naming may help here (GroupProperty ?)
Additionally since you say "MUST be under the SubjectIdentity"
I would ask:
Should this be a child element of "Group" or a real XML
attribute? As you say Group needs to exist for GroupAttribute
to exist and so this would make things
easier if you ask me (more strongly defined).
These may be seen as XML style questions and looking up some
XML best practices (or asking someone who is an XML guru) may
help clear this point.
w.r.t. "The GroupAttribute property can be repeated", are
there any possible restrictions here? Could people have
several roles/subgroups and would this then lead to possible
confusion in the interpretation of the record?
I think youi may be able to argue that you just present the
information in the record and it's up to others to decide how
to interpret it (w.r.t billing etc).
11) Section 2.4.17
"ValidDuration"
To me this feels a bit artificial. I am not sure if it's
needed, how it's justified (is everyone free to make a guess
at the period of validity?).
If it's there to enable people to change accounting into
billing then I'd also suggest this is a policy issue to be
discussed between the sites providing the storage and those
using the storage.
It may be seen as a necessary measure by you but I feel a
little unsure.
If the records are indeed provided on a more frequent basis
that the duration time then it's invalidated and not needed.
The records themselves are snapshots by nature and the
interpretation of what happens in between them is open IMHO.
Even if you add a validity duration it has no meaning that the
situation wasn't completely different on the storage itself.
I think that the records are only really invalidated by a true
measure of the system state at a later time.
Any policy decisions regarding this can be made and applied
externally to the StAR itself.
I think this partially goes back to my 2nd General comment. We
can currently only get snapshots of the system state and we
may need to live with defining storage accounting based on
this.
12) Section 4.1.1
I guess you already saw this but there is an "Error: Reference
source not found" when you refer to Figure1.
I would like to re-read the appendix just to make sure I
understand what you're saying there and that it's clear enough
to me.
Maybe I read it too fast the first time but I was a little
confused with some points.
There are a few places where I would have liked to suggest
some slight modification to improve the English a little.
However I didn't think that was the real aim of your request
for comments.
My English isn't perfect but I'd be willing to help out a
little here if you like.
I also want to again I'm sorry i won't be attending the OGF. I
know comments to docs like this are welcome but I also know
I'm missing a lot of the discussion and so some comments may
be unneeded/outdated.
I hope you all enjoy Lyon.
Cheers
Johnk
On 09/08/2011 01:44 PM, Jon Kerr Nilsen wrote:
Hi all,
OGF 33 is approaching and there will be a working session for UR-WG. As you might be aware of, EMI has created a description for a storage accounting record (StAR) to be proposed as an OGF standard (or as input to a new usage record). I would therefor like to ask for some last comments on the StAR document, to be found here:
http://cdsweb.cern.ch/record/1352472?ln=en
I'd need input to it within September 16 to be able to discuss it at OGF.
thanks,
Jon
UR-WG co-chair
--
ur-wg mailing list
ur-wg@ogf.org
http://www.ogf.org/mailman/listinfo/ur-wg
--
+------------------------------------------------------------+
|Dr. John Alan Kennedy Rechenzentrum Garching (RZG) |
|Mail: jkennedy@rzg.mpg.de Boltzmannstrasse 2 |
|Phone: +49 89 3299 2694 85748 Garching |
|Fax: +49 89 3299 1301 |
+------------------------------------------------------------+
--
ur-wg mailing list
ur-wg@ogf.org
http://www.ogf.org/mailman/listinfo/ur-wg