UR-WG meeting Summary from OGF 28
Hi, Here is a summary of the discussion at OGF28. It was a good session with lots of open discussion, hence the summary rather than minutes which were a little more chaotic. In Brief: Andrea Presented some ideas about storage usage records reflecting on the existing UR format. We discussed about individual parameters and also had a much wider discussion about storage accounting in general. Jules also gave input about aggregate records - raising questions and making some points. I assume (and it seems to already be happening) that the discussion will continue via the mailing list. Many thanks to all who contributed! - and thanks to the UR-WG for allowing us to hold a session in their stead. cheers johnk Summary from ur-wg meeting OGF 28 Andrea Cristofori: Acting as chair John Kennedy: Taking notes Contributing to Discussion Andreas Cristofori, John Gordon, John Kennedy, Johannes Reetz, Jules Wolfrat 1) Storage Accounting: following presentation from Andrea Andrea gave a presentation during which there was much open discussion. The initial focus was on Storage Accounting in General followed by a step through the existing Usage Record Format to see if any of the Properties can be re-used and also if we would need to add new Properties.
From Andrea's list
-- Available for use --
*RecordIdentity
*LocalUserId
*Charge
*ProjectName
*Disk
*TimeDuration
*TimeInstant
*GlobalUsername
*Extension
-- New --
*GlobalField
*LocalField
-- Available but to be Changed --
*Network
*Status
*Host
*SubmitHost
*ServiceLevel
Comments to the properties:
*RecordIdentity*
A hash of the record?
*LocalUserID*
Should be fine
*Charge*
Seems quite difficult to tie down.
Does a site determine this.
Is this related to accounting or billing.
How do you define a charge - if not from the site.
*ProjectName*
Should be fine
*Disk*
May be able to use
*TimeDuration*
This would possibly be in accounting for jobs, and also aggregates.
Split accounting into jobs and traditional storage.
*Time Instant*
Unclear
*GlobalUsername*
Should be OK
*Extension*
Needs some discussion
*GlobalField* - global file identifier - only relevant if at a file level.
(back to the questions of file or user/group accounting)
*LocalFile* - similar to global file (local storage object)
*Network* - should we divide the network usage from the storage usage.
amount of data transferred, protocol, network characteristics...
*Status*
only makes sense at a file level - but what does it mean?
If the file exists on the system? or if it's available?
In comparison with jobs - jobs records not needed while a job is executing.
You could have 2 entries created/available and deleted.
Doesn't make sense in the aggregate(user+group) model.
*Host*
look at how this is handled in the glue schema.
(the whole host/resource/site info may be in glue)
*SubmitHost*
is this where the file came from?
also only makes sense for single files.
* Service Level*
permanent/temporary/volatile
Also possibly in glue.
Do we need to account for temporary/volatile storage.
Is the quality of storage already defined in the storage type - this implies the quality.
* Units *
We need to think about what units we use
what do we use - the disk element in the existing UR may give hints?
Or do we look at what's in glue?
-- General Discussion --
Basic thoughts about what is really needed
(File + type, owner, type of storage, storage class etc.)
Do we simply account for how long the space is used (FileSize * Time)?
Do we try to include Network related issues such as Number of times a file is accessed.
The more a file is accessed the more it's actual "cost" is to the system.
But if we do try to fold in access (and as such network) we have to consider that WAN access and storage network usage are different.
It should also be notes that this "cost" can also be recorded as part of the service used, i.e. the I/O load of a job or the I/O load of a data transfer.
It may be best/easiest to separate the Network Usage into a separate record.
Q) How do you determine the usage
Do you sample and if so at what rate (low rates may miss usage, high rates load the system)
Do you have triggers which update records on file creation/deletion
Sampling may be easier since triggers would require that all tools for adding/deleting data would need to cause these
triggers to fire.
Note) Accounting and billing are not the same thing. (or at least need not be)
Accounting data is static and should be recorded.
The billing data on the other hand is(can be) derived from the accounting data.
There can always be negotiation about the "cost/charge" when the billing data is being generated.
This may be feasible in smaller collaborations but as the number of sites increases it may become problematic.
So the question is do we see the usage record as being complete and including all the information required for billing
purposes or do we just account and derive billing data later?
The whole concept of a charge field is linked to this and due to many different possible storage types/classes seems like
it could become quite complicated.
Q) How do we account for reserved space.
In systems like SRM when a space token is created the reserved space should be accounted and billed and not just the used
space.
We'd need to be careful to ensure that no double counting takes place.
If a reservation of 10TB is made and 7TB is used we should ensure we don't account/bill for 17TB.
Q) How do we distinguish between different storage types?
saying Dell/Raid-5 doesn't really define the difference in what you should charge.
Need to think deeper about storage types/classes and also what is really used for charging/billing.
Would a site be able to advertise their charge per TB etc
Does the type/class define the quality of storage.
Q) Storage accounting for Running Jobs
Do we account for temporary storage usage for running jobs.
It was considered that this shouldn't be the case.
Q) File based or User-Group based accounting?
There are pros/cons for both and this remains an open topic.
File based
+ Allow for fine grained accounting
- Lead to very large number of records (many millions at each site)
User-Group based
+ Smaller number of records
+ Prob what funding bodies need to see/know.
- Loss of information
Could both be done?
File based URs and User-Group based aggregate records?
Q) Do we account for how much space is available?
How and also would we reject storage requests when not enough space is available?
Possibly beyond the scope of what we're aiming to do.
Q) Do we need a type field in the record so we can distinguish compute/storage/network URs?
Proposed to use UsageRecordType and define a suitable name for storage
Q) Cloud storage
It may be that the proposed storage record can be used for cloud storage.
It may however need some extensions.
This should be kept in mind and considered at a later date
2) Aggregate records - Jules wanted to discuss - following work in DEISA.
aggregation - is it the summary of all the records or is it a collection of all the records.
So do you get things like
User A ran 505 jobs and here's the summed info
or do you get a collection of all 505 URs.
So would the term summary record be better.
Input from Jules (via mail) about the aggregate records
Aggregated records.
1) A StartTime and EndTime (RecordStart and RecordEnd) is used to specify the range of aggregation. But is is not
specified if for which timestamp in the original records this range should be applied,
and if the start and end timestamps themselves are included and excluded.
Currently we use the EndTime of Jobs of UR-WG records to compare with start and end-time of ranges,
where the StartTime of the range is inclusive and EndTime of the range is exclusive .
2) WallDuration. The document states:
"
4.4 SumWallDuration
This element describes total CPU Wall Clock Time consumed by jobs aggregated.
"
- It makes no sense to add wall-duration of jobs with different number
of cores, processors,cpus. The reason is that mostly billing is based on the number of cores times the wall duration.
Therefore we have introduced the JobTime which is the WallDuration times the number of cores, processors,cpus.
This we put in the records as :
Dear all, Following our last meeting in Munich, I'd like to ask if any of you will be present at next OGF in Chicago. I think there has been some time now to think about what was discussed and we could go further in the planning. Maybe someone else would like to present his idea. What do you think? Cheers Andrea -- Andrea Cristofori INFN-CNAF Viale Berti Pichat 6/2 40127 Bologna Italy Tel. : +39-051-6092920 Skype: andrea-cnaf
I can't make this next OGF.
-----Original Message----- From: ur-wg-bounces@ogf.org [mailto:ur-wg-bounces@ogf.org] On Behalf Of Andrea Cristofori Sent: 25 May 2010 11:12 To: ur-wg@ogf.org Subject: [UR-WG] OGF 29
Dear all,
Following our last meeting in Munich, I'd like to ask if any of you will be present at next OGF in Chicago. I think there has been some time now to think about what was discussed and we could go further in the planning. Maybe someone else would like to present his idea. What do you think?
Cheers Andrea
-- Andrea Cristofori INFN-CNAF Viale Berti Pichat 6/2 40127 Bologna Italy Tel. : +39-051-6092920 Skype: andrea-cnaf
-- ur-wg mailing list ur-wg@ogf.org http://www.ogf.org/mailman/listinfo/ur-wg -- Scanned by iCritical.
On 25/05/2010 11:11, Andrea Cristofori wrote:
Following our last meeting in Munich, I'd like to ask if any of you will be present at next OGF in Chicago. I think there has been some time now to think about what was discussed and we could go further in the planning. Maybe someone else would like to present his idea. What do you think?
I encourage you to meet (and I'm very happy to help as I can set up a meeting) but again, I will not be attending. The problem is I've got a project review that week! :-( Donal.
Hi Andrea, I agree that a followup discussion would be good. We (Johannes Reetz and I) had some discussion about this internally at RZG. And could possibly put something together to clarify our thoughts. I won't be able to make the meeting though. I am currently on paternity leave and can't make it. How about we try to document the ideas we have and pass them on? cheers johnk On 05/25/2010 12:11 PM, Andrea Cristofori wrote:
Dear all,
Following our last meeting in Munich, I'd like to ask if any of you will be present at next OGF in Chicago. I think there has been some time now to think about what was discussed and we could go further in the planning. Maybe someone else would like to present his idea. What do you think?
Cheers Andrea
Hi John, In fact up to now all the people that answered seems that won't be able to get there. So I think we can start discussing on the mailing list. Maybe you could try to sum up your idea so that everyone we'll be able to add some comment. Or if there will be enough people and things to discuss we could also think to start some phone conference. Once a month? What do you think? Cheers Andrea On 05/25/2010 10:13 PM, john alan kennedy wrote:
Hi Andrea,
I agree that a followup discussion would be good.
We (Johannes Reetz and I) had some discussion about this internally at RZG. And could possibly put something together to clarify our thoughts.
I won't be able to make the meeting though. I am currently on paternity leave and can't make it.
How about we try to document the ideas we have and pass them on?
cheers johnk
On 05/25/2010 12:11 PM, Andrea Cristofori wrote:
Dear all,
Following our last meeting in Munich, I'd like to ask if any of you will be present at next OGF in Chicago. I think there has been some time now to think about what was discussed and we could go further in the planning. Maybe someone else would like to present his idea. What do you think?
Cheers Andrea
-- Andrea Cristofori INFN-CNAF Viale Berti Pichat 6/2 40127 Bologna Italy Tel. : +39-051-6092920 Skype: andrea-cnaf
Hi Andrea, I agree that some discussion would be good. I'm sorry but currently I have very little time to do much. Busy with the baby while trying to juggle a bit of work too. I will talk to my colleagues in RZG to summarise our ideas but this may take a few weeks. We should aim to discuss more as you say. I am officially back at work on the 28th June and hope to be more useful then. cheers johnk On 05/31/2010 05:16 PM, Andrea Cristofori wrote:
Hi John,
In fact up to now all the people that answered seems that won't be able to get there. So I think we can start discussing on the mailing list. Maybe you could try to sum up your idea so that everyone we'll be able to add some comment.
Or if there will be enough people and things to discuss we could also think to start some phone conference. Once a month? What do you think?
Cheers Andrea
On 05/25/2010 10:13 PM, john alan kennedy wrote:
Hi Andrea,
I agree that a followup discussion would be good.
We (Johannes Reetz and I) had some discussion about this internally at RZG. And could possibly put something together to clarify our thoughts.
I won't be able to make the meeting though. I am currently on paternity leave and can't make it.
How about we try to document the ideas we have and pass them on?
cheers johnk
On 05/25/2010 12:11 PM, Andrea Cristofori wrote:
Dear all,
Following our last meeting in Munich, I'd like to ask if any of you will be present at next OGF in Chicago. I think there has been some time now to think about what was discussed and we could go further in the planning. Maybe someone else would like to present his idea. What do you think?
Cheers Andrea
participants (5)
-
Andrea Cristofori
-
Donal K. Fellows
-
john alan kennedy
-
John Alan Kennedy
-
john.gordon@stfc.ac.uk