On Fri, Apr 17, 2009 at 2:39 PM, Chris Webb <chris.webb@elastichosts.com> wrote:
Sam Johnston <samj@samj.net> writes:

> For example I should be
> able to rsync the raw block device of a physical server to the cloud

Off-topic: rsyncing giant files/block devices does reduce network traffic,
but it involves an end-to-end read at both ends as well as any writes. The
thing which is most painfully constrained for cloud infrastructure providers
is disk IO, especially with traditional hard drives with long seek times, so
we're unlikely to provide such an interface for fear of encouraging its use!

rsyncing the contents of filesystems within block devices is far, far more
friendly to shared storage, because by default files are skipped completely
on the basis of size and mtime.

Interesting (albeit off topic)... a very long time ago I went for a [suite of] interviews at Google for a Senior SRE position. One of the questions was about distributing updates for a large site to a bunch of servers. I was chuffed becuase I knew all about rsync (actually I've met Andrew Tridgell - a fellow Aussie - a handful of times, read the paper, used Samba since it was born, etc.), so I said I'd do a recursive copy of the site using hardlinks, rsync to that copy and update a symlink to the root. This guy had obviously planned to talk about it for the full 40 minutes becase he proceeded to tell me I was wrong, that I should in fact be rsyncing the block device, and then wanted to talk about specifics of how the protocol worked. Serves me right for not reeling the answer out slowly :P

Moral of the story: this is the kind of place providers will (safely) innovate and differentiate... we need to let them.

> Sure, snapshot's easy... you just have a "snapshot" actuator which returns a
> new resource (complete with a pointer to the "live" version).

Snapshots are easy if we're starting from scratch and are allowed to define
their semantics, for instance a snapshot operation on a drive gives a new
drive with copy-on-write behaviour between them.

The problem comes when you want to retrofit this to (say) Amazon EBS, where
snapshots are second class objects, which can be generated from and imaged
to a block device, but are not a block device in themselves.

If, on the other hand, you treat snapshots as a distinct type of object,
functionality is lost from the interfaces of people who currently implement
them better, so they do appear exactly like on-demand clones of drives.
 
True, but easily fixed by tracking a state and having e.g. a "mount" actuator.

Sam