Horizontal & vertical scalability in the cloud

Sam Johnston

25 Oct 2009 25 Oct '09

7:38 p.m.

Morning all, So I'm looking at how best to handle horizontal & vertical scalability following an hour or two on the phone with Andy earlier which got me thinking. Currently scaling in services like Amazon EC2 involves having a service like RightScale monitor your application (e.g. response time, load average, etc.) and start new instances when (or ideally before) the figures reach unacceptable levels, track them while they're running, and kill them off again when the load returns to the baseline. This is pretty inefficient because each instance is its own first-class citizen and there is no connection between them (aside from the fact that they were started from the same image) - you have to handle them individually which in itself generates flurries of unnecessary API calls. A better approach to scalability is to have a single object which you can both adjust the resources of (vertical scalability) and adjust the number of instances of (horizontal scalability). That is, you start a single instance with 1 core and 1Gb, then while it's running you crank it up to 2 cores and 2Gb. Eventually you max out at say 8 cores and 16Gb so you need to go horizontal at some point. Rather than create new unlinked instances the idea is that you would simply adjust the number of requested instances and let the infrastructure take care of making reality match the request (if the available resources allow for it). When things calm down you can back off the number of requested instances and watch the (immutable) number of actual instances fall back as machines are gracefully shut down. Similarly one can tweak the allocated resources (RAM, CPU, etc.) and, provided the infrastructure supports it, the changes will be applied to all the "shadow" instances. Ultimately the "governor" (e.g. RightScale) could constantly tune the application based on the amount and type of load (a fair amount of artificial intelligence could go into these decisions but that's a subject for a different forum) and wouldn't have to worry about the mechanics of managing the resources (which arguably should be done locally anyway). Of course the infrastructure would also have the option of creating many separate instances if it wanted to (e.g. so providers like Amazon can still use OCCI if they want to). If anyone has any thoughts about this then I'd be interested to hear them... Sam

Attachments:

attachment.html (text/html — 2.5 KB)

Show replies by date

Randy Bias

25 Oct 25 Oct

8:26 p.m.

On Oct 25, 2009, at 5:38 PM, Sam Johnston wrote:

...

A better approach to scalability is to have a single object which you can both adjust the resources of (vertical scalability) and adjust the number of instances of (horizontal scalability). That is, you start a single instance with 1 core and 1Gb, then while it's running you crank it up to 2 cores and 2Gb. Eventually you max out at say 8 cores and 16Gb so you need to go horizontal at some point. Rather than create new unlinked instances the idea is that you would simply adjust the

I agree. This is the future. Dials for 'horizontal' and for 'vertical', probably attached to a given tier of an application. Just as an FYI, I think 'scale-up' VMs are going to be more and more common. We'll see VMs with a *lot* more RAM and cores very soon now. Most of the modern OSes handle hotplug of CPU/RAM pretty well. Best, --Randy Randy Bias, Founder & Cloud Strategist, Cloudscaling +1 (415) 939-8507 [m], randyb@cloudscaling.com BLOG: http://cloudscaling.com/blog

Sam Johnston

9:15 p.m.

On Mon, Oct 26, 2009 at 3:26 AM, Randy Bias <randyb@cloudscaling.com> wrote:

...

On Oct 25, 2009, at 5:38 PM, Sam Johnston wrote:

A better approach to scalability is to have a single object which you can both adjust the resources of (vertical scalability) and adjust the number of instances of (horizontal scalability). That is, you start a single instance with 1 core and 1Gb, then while it's running you crank it up to 2 cores and 2Gb. Eventually you max out at say 8 cores and 16Gb so you need to go horizontal at some point. Rather than create new unlinked instances the idea is that you would simply adjust the

I agree. This is the future. Dials for 'horizontal' and for 'vertical', probably attached to a given tier of an application.

Exactly. I'm hoping then that it is also far to say that all the "clones" should be identical - and adjusting resource utilisation for one adjusts it for all? If not we may have to expose links to [a collection of] the children.

...

Just as an FYI, I think 'scale-up' VMs are going to be more and more common. We'll see VMs with a *lot* more RAM and cores very soon now. Most of the modern OSes handle hotplug of CPU/RAM pretty well.

Ok, good to know. Were it not for the usual limitations (e.g. cores per socket and sockets per board as well as chip density, chips per dimm and dimms per board) then many workloads would scale far higher (vertically) than they do today. I'm anticipating improvement in the areas of both chip and memory density as well as interconnect bandwidth and latency (that is, I expect to see boxes with 10s if not 100s or 1000s of cores and many gigabytes of RAM) so it's conceivable that we could be seeing some very powerful machines indeed in the cloud before long - users will have the choice of scaling up and/or out. Similarly I think we'll be seeing more rather than less chroots/zones/containers (OS-level virtualisation where a single kernel is shared between all instances) as well as "bare metal" deployments (where you'll upload something like an Altiris image rather than a virtual machine). There's also the possibility of HPC equipment such as OpenCL interfaces, FPGAs, etc. (all of which I believe we already handle reasonably well with "arch" attributes etc.). Interesting times ahead - I wouldn't believe anyone who told me they knew what a datacenter would look like in 2020. Sam

Gary Mazz

9:22 p.m.

'horizontal' and 'vertical' dials is a good idea to define. @Andy, I'm a little confused on the definition of horizontal saleability. Aren't the cpus in a single operating image a vertical workload capacity much like the amount RAM . If the number of images scaled, that would be horizontal because there is no necessity for the images to be the same workload set. I would prefer to see the dials tied to a standard "meter of work". An efficiency metric instead of an "equivalence" of cpu count and ghz and RAM amount. Juggling these dials may not be as effectual as the consumer perceives when a provider decide to throttle back performance and starts dropping workload requests. Without a referenced "effective workload" metric, it may be tough to ascertain if the dials effect anything, other than the charge to the customer. gary Randy Bias wrote:

...

On Oct 25, 2009, at 5:38 PM, Sam Johnston wrote:

...
A better approach to scalability is to have a single object which you can both adjust the resources of (vertical scalability) and adjust the number of instances of (horizontal scalability). That is, you start a single instance with 1 core and 1Gb, then while it's running you crank it up to 2 cores and 2Gb. Eventually you max out at say 8 cores and 16Gb so you need to go horizontal at some point. Rather than create new unlinked instances the idea is that you would simply adjust the

I agree. This is the future. Dials for 'horizontal' and for 'vertical', probably attached to a given tier of an application.

Just as an FYI, I think 'scale-up' VMs are going to be more and more common. We'll see VMs with a *lot* more RAM and cores very soon now. Most of the modern OSes handle hotplug of CPU/RAM pretty well.

Best,

--Randy

Randy Bias, Founder & Cloud Strategist, Cloudscaling +1 (415) 939-8507 [m], randyb@cloudscaling.com <mailto:randyb@neotactics.com> BLOG: http://cloudscaling.com/blog

------------------------------------------------------------------------

_______________________________________________ occi-wg mailing list occi-wg@ogf.org http://www.ogf.org/mailman/listinfo/occi-wg

Sam Johnston

9:56 p.m.

Gary, I think you've touched on an interesting point there which ties in to the "need" for a universal compute unit. More specifically, "cores" aren't a standard unit of measurement (at least without arch and speed), and in any cloud that's not brand new you're going to end up with a mix of core speeds depending on what presented the best value at build/replacement/expansion/failure time. If you have a mix of core speeds at a given tier without sufficiently intelligent load balancing (e.g. response time based) then you'll end up with some cores being underutilised and/or finishing jobs faster, and others being unable to keep up. If you're applying the buffalo theory (e.g. round robin) then you're only as fast as your slowest machines. Simple fix is to ensure that "clones" or "shadows" of a given compute resource are all identical, but it's worth keeping in mind nonetheless. Sam On Mon, Oct 26, 2009 at 4:22 AM, Gary Mazz <garymazzaferro@gmail.com> wrote:

...

'horizontal' and 'vertical' dials is a good idea to define.

@Andy, I'm a little confused on the definition of horizontal saleability. Aren't the cpus in a single operating image a vertical workload capacity much like the amount RAM . If the number of images scaled, that would be horizontal because there is no necessity for the images to be the same workload set.

I would prefer to see the dials tied to a standard "meter of work". An efficiency metric instead of an "equivalence" of cpu count and ghz and RAM amount. Juggling these dials may not be as effectual as the consumer perceives when a provider decide to throttle back performance and starts dropping workload requests. Without a referenced "effective workload" metric, it may be tough to ascertain if the dials effect anything, other than the charge to the customer.

gary

Randy Bias wrote:

...
On Oct 25, 2009, at 5:38 PM, Sam Johnston wrote:

...
A better approach to scalability is to have a single object which you can both adjust the resources of (vertical scalability) and adjust the number of instances of (horizontal scalability). That is, you start a single instance with 1 core and 1Gb, then while it's running you crank it up to 2 cores and 2Gb. Eventually you max out at say 8 cores and 16Gb so you need to go horizontal at some point. Rather than create new unlinked instances the idea is that you would simply adjust the

I agree. This is the future. Dials for 'horizontal' and for 'vertical', probably attached to a given tier of an application.

Just as an FYI, I think 'scale-up' VMs are going to be more and more common. We'll see VMs with a *lot* more RAM and cores very soon now. Most of the modern OSes handle hotplug of CPU/RAM pretty well.

Best,

--Randy

Randy Bias, Founder & Cloud Strategist, Cloudscaling +1 (415) 939-8507 [m], randyb@cloudscaling.com <mailto:randyb@neotactics.com> BLOG: http://cloudscaling.com/blog

------------------------------------------------------------------------

_______________________________________________ occi-wg mailing list occi-wg@ogf.org http://www.ogf.org/mailman/listinfo/occi-wg

_______________________________________________ occi-wg mailing list occi-wg@ogf.org http://www.ogf.org/mailman/listinfo/occi-wg

Gary Mazz

10:27 p.m.

I believe we had discussed this issue some months ago (CCIF ?) and reached agreement that none of us wanted to be business of formulating cloud benchmarks. :-) Is something like TCPC effective or any of the apache server workload stuff applicable, I really couldn't say. I like the approach the UNSW took, it looks like a good starting point. I don't have time to contact UNSW this week, but it may be work while to approach them. Agreed on keeping the clones to identical characteristics, I'm not sure how feasible that is today. But, it a good, practical way to initially define it. gary Sam Johnston wrote:

...

Gary,

I think you've touched on an interesting point there which ties in to the "need" for a universal compute unit. More specifically, "cores" aren't a standard unit of measurement (at least without arch and speed), and in any cloud that's not brand new you're going to end up with a mix of core speeds depending on what presented the best value at build/replacement/expansion/failure time.

If you have a mix of core speeds at a given tier without sufficiently intelligent load balancing (e.g. response time based) then you'll end up with some cores being underutilised and/or finishing jobs faster, and others being unable to keep up. If you're applying the buffalo theory (e.g. round robin) then you're only as fast as your slowest machines.

Simple fix is to ensure that "clones" or "shadows" of a given compute resource are all identical, but it's worth keeping in mind nonetheless.

Sam

On Mon, Oct 26, 2009 at 4:22 AM, Gary Mazz <garymazzaferro@gmail.com <mailto:garymazzaferro@gmail.com>> wrote:

'horizontal' and 'vertical' dials is a good idea to define.

@Andy, I'm a little confused on the definition of horizontal saleability. Aren't the cpus in a single operating image a vertical workload capacity much like the amount RAM . If the number of images scaled, that would be horizontal because there is no necessity for the images to be the same workload set.

I would prefer to see the dials tied to a standard "meter of work". An efficiency metric instead of an "equivalence" of cpu count and ghz and RAM amount. Juggling these dials may not be as effectual as the consumer perceives when a provider decide to throttle back performance and starts dropping workload requests. Without a referenced "effective workload" metric, it may be tough to ascertain if the dials effect anything, other than the charge to the customer.

gary

Randy Bias wrote: > > On Oct 25, 2009, at 5:38 PM, Sam Johnston wrote: >> A better approach to scalability is to have a single object which you >> can both adjust the resources of (vertical scalability) and adjust >> the number of instances of (horizontal scalability). That is, you >> start a single instance with 1 core and 1Gb, then while it's running >> you crank it up to 2 cores and 2Gb. Eventually you max out at say 8 >> cores and 16Gb so you need to go horizontal at some point. Rather >> than create new unlinked instances the idea is that you would simply >> adjust the > > I agree. This is the future. Dials for 'horizontal' and for > 'vertical', probably attached to a given tier of an application. > > Just as an FYI, I think 'scale-up' VMs are going to be more and more > common. We'll see VMs with a *lot* more RAM and cores very soon now. > Most of the modern OSes handle hotplug of CPU/RAM pretty well. > > > Best, > > > --Randy > > > Randy Bias, Founder & Cloud Strategist, Cloudscaling > +1 (415) 939-8507 [m], randyb@cloudscaling.com <mailto:randyb@cloudscaling.com> > <mailto:randyb@neotactics.com <mailto:randyb@neotactics.com>> > BLOG: http://cloudscaling.com/blog > > > > ------------------------------------------------------------------------ > > _______________________________________________ > occi-wg mailing list > occi-wg@ogf.org <mailto:occi-wg@ogf.org> > http://www.ogf.org/mailman/listinfo/occi-wg >

_______________________________________________ occi-wg mailing list occi-wg@ogf.org <mailto:occi-wg@ogf.org> http://www.ogf.org/mailman/listinfo/occi-wg

Sam Johnston

10:39 p.m.

Gary, On Mon, Oct 26, 2009 at 5:27 AM, Gary Mazz <garymazzaferro@gmail.com> wrote:

...

I believe we had discussed this issue some months ago (CCIF ?) and reached agreement that none of us wanted to be business of formulating cloud benchmarks. :-)

I think cloud benchmarks are relatively safe provided they are not considered "universal" - there are a myriad tests for PC performance today and each represents a different workload. Microsoft have got about as close as you will ever get with their Windows 7 performance indexes, but even they are specific to the task of running an interactive windowing interface.

...

Is something like TCPC effective or any of the apache server workload stuff applicable, I really couldn't say. I like the approach the UNSW took, it looks like a good starting point. I don't have time to contact UNSW this week, but it may be work while to approach them.

I think the best approach is to have trusted third-parties (like Anna's team) conducting batteries of tests with the approval (but probably not direct knowledge of which accounts/when) of the cloud providers. It's certainly not realistic to have every tire kicker running their own suite of tests and indeed to do so without prior notice could reasonably be prohibited by terms of service. Each service would then have a set of figures and users could use those most appropriate for their workload. Agreed on keeping the clones to identical characteristics, I'm not sure how

...

feasible that is today. But, it a good, practical way to initially define it.

I'd be satisfied with a "should" requirement level for clones being identical - it's almost always going to be better that a request be satisfied with a mix of hardware than not at all. Sam

...

Sam Johnston wrote:

...
Gary,

I think you've touched on an interesting point there which ties in to the "need" for a universal compute unit. More specifically, "cores" aren't a standard unit of measurement (at least without arch and speed), and in any cloud that's not brand new you're going to end up with a mix of core speeds depending on what presented the best value at build/replacement/expansion/failure time.

If you have a mix of core speeds at a given tier without sufficiently intelligent load balancing (e.g. response time based) then you'll end up with some cores being underutilised and/or finishing jobs faster, and others being unable to keep up. If you're applying the buffalo theory (e.g. round robin) then you're only as fast as your slowest machines.

Simple fix is to ensure that "clones" or "shadows" of a given compute resource are all identical, but it's worth keeping in mind nonetheless.

Sam

On Mon, Oct 26, 2009 at 4:22 AM, Gary Mazz <garymazzaferro@gmail.com<mailto: garymazzaferro@gmail.com>> wrote:

'horizontal' and 'vertical' dials is a good idea to define.

@Andy, I'm a little confused on the definition of horizontal saleability. Aren't the cpus in a single operating image a vertical workload capacity much like the amount RAM . If the number of images scaled, that would be horizontal because there is no necessity for the images to be the same workload set.

I would prefer to see the dials tied to a standard "meter of work". An efficiency metric instead of an "equivalence" of cpu count and ghz and RAM amount. Juggling these dials may not be as effectual as the consumer perceives when a provider decide to throttle back performance and starts dropping workload requests. Without a referenced "effective workload" metric, it may be tough to ascertain if the dials effect anything, other than the charge to the customer.

gary

Randy Bias wrote:

...
On Oct 25, 2009, at 5:38 PM, Sam Johnston wrote:

...
A better approach to scalability is to have a single object

which you

...
...
can both adjust the resources of (vertical scalability) and adjust the number of instances of (horizontal scalability). That is, you start a single instance with 1 core and 1Gb, then while it's running you crank it up to 2 cores and 2Gb. Eventually you max out at say 8 cores and 16Gb so you need to go horizontal at some point. Rather than create new unlinked instances the idea is that you would simply adjust the

I agree. This is the future. Dials for 'horizontal' and for 'vertical', probably attached to a given tier of an application.

Just as an FYI, I think 'scale-up' VMs are going to be more and more common. We'll see VMs with a *lot* more RAM and cores very soon now. Most of the modern OSes handle hotplug of CPU/RAM pretty well.

Best,

--Randy

Randy Bias, Founder & Cloud Strategist, Cloudscaling +1 (415) 939-8507 [m], randyb@cloudscaling.com <mailto:randyb@cloudscaling.com> <mailto:randyb@neotactics.com <mailto:randyb@neotactics.com>>

...
BLOG: http://cloudscaling.com/blog

------------------------------------------------------------------------

...
_______________________________________________ occi-wg mailing list occi-wg@ogf.org <mailto:occi-wg@ogf.org>

...
http://www.ogf.org/mailman/listinfo/occi-wg

_______________________________________________ occi-wg mailing list occi-wg@ogf.org <mailto:occi-wg@ogf.org>

http://www.ogf.org/mailman/listinfo/occi-wg

Gary Mazz

10:56 p.m.

Agreed..., especially with the third party validation. Maybe socialize this on the "Doug Tidwell" mailing list ? cloud-computing-use-cases@googlegroups.com It may be good to get a other's opinions. Think this was a discussion on there at one time. have to find my glasses, can't see a thing...:-) gary Sam Johnston wrote:

...

Gary,

On Mon, Oct 26, 2009 at 5:27 AM, Gary Mazz <garymazzaferro@gmail.com <mailto:garymazzaferro@gmail.com>> wrote:

I believe we had discussed this issue some months ago (CCIF ?) and reached agreement that none of us wanted to be business of formulating cloud benchmarks. :-)

I think cloud benchmarks are relatively safe provided they are not considered "universal" - there are a myriad tests for PC performance today and each represents a different workload. Microsoft have got about as close as you will ever get with their Windows 7 performance indexes, but even they are specific to the task of running an interactive windowing interface.

Is something like TCPC effective or any of the apache server workload stuff applicable, I really couldn't say. I like the approach the UNSW took, it looks like a good starting point. I don't have time to contact UNSW this week, but it may be work while to approach them.

I think the best approach is to have trusted third-parties (like Anna's team) conducting batteries of tests with the approval (but probably not direct knowledge of which accounts/when) of the cloud providers. It's certainly not realistic to have every tire kicker running their own suite of tests and indeed to do so without prior notice could reasonably be prohibited by terms of service. Each service would then have a set of figures and users could use those most appropriate for their workload.

Agreed on keeping the clones to identical characteristics, I'm not sure how feasible that is today. But, it a good, practical way to initially define it.

I'd be satisfied with a "should" requirement level for clones being identical - it's almost always going to be better that a request be satisfied with a mix of hardware than not at all.

Sam

Sam Johnston wrote:

Gary,

I think you've touched on an interesting point there which ties in to the "need" for a universal compute unit. More specifically, "cores" aren't a standard unit of measurement (at least without arch and speed), and in any cloud that's not brand new you're going to end up with a mix of core speeds depending on what presented the best value at build/replacement/expansion/failure time.

If you have a mix of core speeds at a given tier without sufficiently intelligent load balancing (e.g. response time based) then you'll end up with some cores being underutilised and/or finishing jobs faster, and others being unable to keep up. If you're applying the buffalo theory (e.g. round robin) then you're only as fast as your slowest machines.

Simple fix is to ensure that "clones" or "shadows" of a given compute resource are all identical, but it's worth keeping in mind nonetheless.

Sam

On Mon, Oct 26, 2009 at 4:22 AM, Gary Mazz <garymazzaferro@gmail.com <mailto:garymazzaferro@gmail.com> <mailto:garymazzaferro@gmail.com <mailto:garymazzaferro@gmail.com>>> wrote:

'horizontal' and 'vertical' dials is a good idea to define.

@Andy, I'm a little confused on the definition of horizontal saleability. Aren't the cpus in a single operating image a vertical workload capacity much like the amount RAM . If the number of images scaled, that would be horizontal because there is no necessity for the images to be the same workload set.

I would prefer to see the dials tied to a standard "meter of work". An efficiency metric instead of an "equivalence" of cpu count and ghz and RAM amount. Juggling these dials may not be as effectual as the consumer perceives when a provider decide to throttle back performance and starts dropping workload requests. Without a referenced "effective workload" metric, it may be tough to ascertain if the dials effect anything, other than the charge to the customer.

gary

Randy Bias wrote: > > On Oct 25, 2009, at 5:38 PM, Sam Johnston wrote: >> A better approach to scalability is to have a single object which you >> can both adjust the resources of (vertical scalability) and adjust >> the number of instances of (horizontal scalability). That is, you >> start a single instance with 1 core and 1Gb, then while it's running >> you crank it up to 2 cores and 2Gb. Eventually you max out at say 8 >> cores and 16Gb so you need to go horizontal at some point. Rather >> than create new unlinked instances the idea is that you would simply >> adjust the > > I agree. This is the future. Dials for 'horizontal' and for > 'vertical', probably attached to a given tier of an application. > > Just as an FYI, I think 'scale-up' VMs are going to be more and more > common. We'll see VMs with a *lot* more RAM and cores very soon now. > Most of the modern OSes handle hotplug of CPU/RAM pretty well. > > > Best, > > > --Randy > > > Randy Bias, Founder & Cloud Strategist, Cloudscaling > +1 (415) 939-8507 [m], randyb@cloudscaling.com <mailto:randyb@cloudscaling.com> <mailto:randyb@cloudscaling.com <mailto:randyb@cloudscaling.com>> > <mailto:randyb@neotactics.com <mailto:randyb@neotactics.com> <mailto:randyb@neotactics.com <mailto:randyb@neotactics.com>>>

> BLOG: http://cloudscaling.com/blog > > > >

------------------------------------------------------------------------ > > _______________________________________________ > occi-wg mailing list > occi-wg@ogf.org <mailto:occi-wg@ogf.org> <mailto:occi-wg@ogf.org <mailto:occi-wg@ogf.org>>

> http://www.ogf.org/mailman/listinfo/occi-wg >

_______________________________________________ occi-wg mailing list occi-wg@ogf.org <mailto:occi-wg@ogf.org> <mailto:occi-wg@ogf.org <mailto:occi-wg@ogf.org>>

http://www.ogf.org/mailman/listinfo/occi-wg

Randy Bias

11:44 p.m.

This is hard. CPU 'clock cycles' are not equivalent. This is why Amazon uses a very specific processor and year to create their ECU. The 2007 1.2Ghz processors all road on 800Mhz FSBs, which limited the amount of memory bandwidth (among other things). Whereas modern CPUs and the much better/faster busses of today mean that you can feed the CPU much faster. My point isn't that you shouldn't do it, it's simply that it's tricky. If I had to make a recommendation it would be to baseline off of the Amazon ECU. --Randy On Oct 25, 2009, at 7:56 PM, Sam Johnston wrote:

...

I think you've touched on an interesting point there which ties in to the "need" for a universal compute unit

Randy Bias, Founder & Cloud Strategist, Cloudscaling +1 (415) 939-8507 [m], randyb@cloudscaling.com BLOG: http://cloudscaling.com/blog

Sam Johnston

26 Oct 26 Oct

5:05 a.m.

On Mon, Oct 26, 2009 at 6:44 AM, Randy Bias <randyb@cloudscaling.com> wrote:

...

This is hard. CPU 'clock cycles' are not equivalent. This is why Amazon uses a very specific processor and year to create their ECU. The 2007 1.2Ghz processors all road on 800Mhz FSBs, which limited the amount of memory bandwidth (among other things). Whereas modern CPUs and the much better/faster busses of today mean that you can feed the CPU much faster.

Now this is relevant because there was some contention (for reasons unknown) over the inclusion of quantitative measurements of performance characteristics such as memory bandwidth<http://en.wikipedia.org/wiki/List_of_device_bandwidths#Memory_Interconnect.2FRAM_buses>. Surely if some providers (or individual nodes) are using slow RAM, buses, storage devices, etc. then as a consumer I should be able to find out about it and/or set parameters on it? Conversely if I have an application that requires ridiculously fast storage (say, SSD) then I should be able to request this based on raw performance figures (the "what" rather than the "how").

...

My point isn't that you shouldn't do it, it's simply that it's tricky.

If I had to make a recommendation it would be to baseline off of the Amazon ECU.

Interesting idea but surely that too is a moving target? Would it not also favour Intel over AMD (or vice versa)? Having a standard unit to measure against is an interesting idea, like the standard kilogram<http://en.wikipedia.org/wiki/File:CGKilogram.jpg>, and perhaps it's something that could be built from commodity components. Sam On Oct 25, 2009, at 7:56 PM, Sam Johnston wrote:

...

I think you've touched on an interesting point there which ties in to the "need" for a universal compute unit

Randy Bias, Founder & Cloud Strategist, Cloudscaling +1 (415) 939-8507 [m], randyb@cloudscaling.com <randyb@neotactics.com> BLOG: http://cloudscaling.com/blog

Alexander Papaspyrou

11:31 a.m.

Just as a side note: finding such a unit is an open research problem for the last twenty or so years. So I wouldn't bet on finding such a thing in the near future -- and arguably not within OCCI. -Alexander Am 26.10.2009 um 11:05 schrieb Sam Johnston:

...

On Mon, Oct 26, 2009 at 6:44 AM, Randy Bias <randyb@cloudscaling.com> wrote: This is hard. CPU 'clock cycles' are not equivalent. This is why Amazon uses a very specific processor and year to create their ECU. The 2007 1.2Ghz processors all road on 800Mhz FSBs, which limited the amount of memory bandwidth (among other things). Whereas modern CPUs and the much better/faster busses of today mean that you can feed the CPU much faster.

Now this is relevant because there was some contention (for reasons unknown) over the inclusion of quantitative measurements of performance characteristics such as memory bandwidth. Surely if some providers (or individual nodes) are using slow RAM, buses, storage devices, etc. then as a consumer I should be able to find out about it and/or set parameters on it? Conversely if I have an application that requires ridiculously fast storage (say, SSD) then I should be able to request this based on raw performance figures (the "what" rather than the "how").

My point isn't that you shouldn't do it, it's simply that it's tricky.

If I had to make a recommendation it would be to baseline off of the Amazon ECU.

Interesting idea but surely that too is a moving target? Would it not also favour Intel over AMD (or vice versa)? Having a standard unit to measure against is an interesting idea, like the standard kilogram, and perhaps it's something that could be built from commodity components.

Sam

On Oct 25, 2009, at 7:56 PM, Sam Johnston wrote:

...
I think you've touched on an interesting point there which ties in to the "need" for a universal compute unit

Randy Bias, Founder & Cloud Strategist, Cloudscaling +1 (415) 939-8507 [m], randyb@cloudscaling.com BLOG: http://cloudscaling.com/blog

_______________________________________________ occi-wg mailing list occi-wg@ogf.org http://www.ogf.org/mailman/listinfo/occi-wg

-- Alexander Papaspyrou alexander.papaspyrou@tu-dortmund.de

Sam Johnston

11:35 a.m.

On Mon, Oct 26, 2009 at 5:31 PM, Alexander Papaspyrou < alexander.papaspyrou@tu-dortmund.de> wrote:

...

Just as a side note: finding such a unit is an open research problem for the last twenty or so years. So I wouldn't bet on finding such a thing in the near future -- and arguably not within OCCI.

Thanks Alexander - I tend to agree with you and propose instead that we simply cater for these by way of categories (e.g. performance "bands") and attributes (e.g. specific benchmark figures) that are TBD and out of scope for OCCI. Sam

...

Am 26.10.2009 um 11:05 schrieb Sam Johnston:

On Mon, Oct 26, 2009 at 6:44 AM, Randy Bias <randyb@cloudscaling.com>

...
wrote: This is hard. CPU 'clock cycles' are not equivalent. This is why Amazon uses a very specific processor and year to create their ECU. The 2007 1.2Ghz processors all road on 800Mhz FSBs, which limited the amount of memory bandwidth (among other things). Whereas modern CPUs and the much better/faster busses of today mean that you can feed the CPU much faster.

Now this is relevant because there was some contention (for reasons unknown) over the inclusion of quantitative measurements of performance characteristics such as memory bandwidth. Surely if some providers (or individual nodes) are using slow RAM, buses, storage devices, etc. then as a consumer I should be able to find out about it and/or set parameters on it? Conversely if I have an application that requires ridiculously fast storage (say, SSD) then I should be able to request this based on raw performance figures (the "what" rather than the "how").

My point isn't that you shouldn't do it, it's simply that it's tricky.

If I had to make a recommendation it would be to baseline off of the Amazon ECU.

Interesting idea but surely that too is a moving target? Would it not also favour Intel over AMD (or vice versa)? Having a standard unit to measure against is an interesting idea, like the standard kilogram, and perhaps it's something that could be built from commodity components.

Sam

On Oct 25, 2009, at 7:56 PM, Sam Johnston wrote:

I think you've touched on an interesting point there which ties in to the

...
"need" for a universal compute unit

Randy Bias, Founder & Cloud Strategist, Cloudscaling +1 (415) 939-8507 [m], randyb@cloudscaling.com BLOG: http://cloudscaling.com/blog

_______________________________________________ occi-wg mailing list occi-wg@ogf.org http://www.ogf.org/mailman/listinfo/occi-wg

-- Alexander Papaspyrou alexander.papaspyrou@tu-dortmund.de

Sam Johnston

1:04 p.m.

On Mon, Oct 26, 2009 at 5:35 PM, Sam Johnston <samj@samj.net> wrote:

...

On Mon, Oct 26, 2009 at 5:31 PM, Alexander Papaspyrou < alexander.papaspyrou@tu-dortmund.de> wrote:

...
Just as a side note: finding such a unit is an open research problem for the last twenty or so years. So I wouldn't bet on finding such a thing in the near future -- and arguably not within OCCI.

Thanks Alexander - I tend to agree with you and propose instead that we simply cater for these by way of categories (e.g. performance "bands") and attributes (e.g. specific benchmark figures) that are TBD and out of scope for OCCI.

While researching this further today I discovered that the Open Cloud Consortium <http://opencloudconsortium.org/> has a working group<http://opencloudconsortium.org/working-groups.html>dedicated to this topic:

...

*Standard Cloud Performance Measurement & Rating System Working Group*

What if there was a simple way to compare the performance, security and quality of various cloud computing providers? When comparing traditional hardware vendors, there are standardized specifications (GHZ, GB, etc) and a variety of basic and application specific benchmarks, but in the cloud world there was no easy way for to compare "apples to apples". For many organizations looking to get into the cloud, predicting the performance of an application across two or more cloud vendors is not practical. The purpose of this working group is to work with the community to refine use cases, gather requirements, and develop benchmarks for comparing the performance of two different clouds.

This working group is a collaborative activity with the Cloud Computing Interoperability Forum (CCIF).

That said, like many here I'm a "member" of the CCIF and this is the first I've heard of this "collaborative activity". Tempted to sign up with OCC as an "Expert" member <http://opencloudconsortium.org/membership.html> to find out more but it doesn't look like one gets much access at this level. Sam

5884

Age (days ago)

5884

Last active (days ago)

List overview

Download

12 comments

4 participants

participants (4)

Alexander Papaspyrou
Gary Mazz
Randy Bias
Sam Johnston