
On Thu, Apr 16, 2009 at 11:55 AM, Richard Davies < richard.davies@elastichosts.com> wrote:
The variants which we see today are: <snip>
ElasticHosts - any number of drives per server, exposed separately in API - persistent across reboots and server deletion - block device which can be partitioned into multiple filesystems - includes OS kernel and boot loader
We suggest taking the most general approach in the standard: - any number of drivers per server, exposed as first-class API objects
Agreed we should support all possible configurations including zero or more storage resources (e.g. caters for netboot) and zero or more network resources (e.g. caters for offline batch jobs). Not sure about the need for the resources to be appearing as "first-class API objects" (that is, being burnt into urls etc.) though.
- persistent across reboots and server deletion
If a machine is stopped it would be up to the implementation to decide whether it remains visible in a "stopped" state (e.g. persistent) or whether it vanishes from view (e.g. ephemeral). I'm not sure what I make of machines that can't be stopped (seems just like normal hosting because you can't stop using the service without dropping your bundle).
- AMI-like templates implemented as imaging/duplicating one drive from another
Sure, I had planned to expose templates as machines that cannot be started, only cloned. Same could apply to public storage devices (e.g. appliances) and private storage devices (e.g. SOE images).
- block device which can be partitioned into multiple filesystems (just like a hard disk)
Sure, but sounds like we're getting down to the nitty gritty - there's an argument for the block devices themselves being opaque and I'd almost rather steer clear of details like MBR vs GPT.
- includes OS kernel and boot block (just like a hard disk)
See above re: opaque block devices, though Amazon's separation of AKI (kernel images), ARI (ramdisk images) and AMI (machine images) is something we'll probably want to look at supporting if we expect them to implement it and/or if we ever want to see an adapter.
guests - virtual servers booted from and accessing drives. Our guests exist as objects only when they are running, similar to Amazon's instances, but it may be more general to allow guests in stopped and suspended states in addition, as GoGrid currently do.
The variants which we see today are:
EC2 instances and ElasticHosts servers (as seen from the API) are ephemeral and no longer exist once they are stopped. GoGrid servers persist in a stopped state, and can be restarted from that state.
We suggest taking the more general approach in the standard:
Servers would include non-running servers in the same way as GoGrid. Perhaps whether a server persists or not when it is shut down is an option when creating a server?
I think it's more often than not a capability dictated by the service provider. Where workloads are only ephemeral there's no choice but for them to vanish when stopped. Where workloads are persistent I don't see a problem with having a two-phase "stop" and "destroy" step, or exposing a "destroy" actuator for a running instance and having the "stop" being implicit. I think that makes more sense - these decisions are more useful at end-of-life than start-of-life (what if I change my mind?). Perhaps the default could be an optional parameter or somesuch.
guests create (takes simple description, e.g. including attached drives and network interfaces)
It's worth commenting here on the granularity of VM specification. Both GoGrid servers and EC2 instances are available in a small number of fixed sizes, whereas ElasticHosts servers are continuously variable in CPU and RAM, and our drives are continuously variable in size.
Right, this is an important point. Basically I was thinking that you could grab a template, set your values to suit, and POST it to the API. This could be simplified for dumb clients by exposing a "clone", "deploy" or "instantiate" actuator that would return the handle of the running machine. Optionally specifying min and max instances ala Amazon should probably be supported (nobody wants to do 1,000 "clone" calls now, do they - performance is important for this API too). Again, taking the more general approach, we suggest that servers should be
specified in terms of continuous quantities of CPU, RAM and disk, with a provider 'rounding up' to the nearest available specification if their granularity is coarser than the standard API.
I'm not sure about rounding up as such - feels a bit messy (exceed the limit by 1 byte and your cost will double for example). If the provider has templates these should be advertised but ultimately it's up to the implementor to decide whether to reject a request for an unsupported configuration with an error or whether to satisfy it anyway with a larger device. I think the key thing for most of these points is giving implementors flexibility without sacrificing interoperability, and fortunately I think we can have our cake and eat it here. Sam