
Richard, Just when you thought you had enough mail from me already it seems I missed one... On Thu, Apr 16, 2009 at 6:05 PM, Richard Davies < richard.davies@elastichosts.com> wrote:
Sam Johnston wrote:
Here's a first pass at flattening the Atom into INI file format (basically what you had but with "=" for human & computer readability):
Great stuff - I think this is a big step forward to be able to express everything as a simple list of objects, each specified by simple key-value pairs. Hopefully we can also similarly add a JSON version using the same simple data structures, e.g.:
{"category":"server", "title":"Debian...", "mc.state":"running", ... }
JSON/YAML's on my todo list for this morning.
I've got two specific comments on the example you give:
1) I'm not sure INI format is actually the best text format for key-value. I'd prefer something easier to parse from Unix shell, which is where I imagine most simple scripts will be written. ElasticHosts went with
"key" (without spaces), <space>, "value" (any characters including spaces)
since this can be parsed with
cat file | while read key value ; do ... ; done
I've found the tinydns-data <http://cr.yp.to/djbdns/tinydns-data.html>format a pleasure to work with as well, but in any case INI files are simple, standard across platforms, well defined, etc. You can parse them in shell like this<http://www.debian-administration.org/articles/55#comment_24> : #!/bin/sh [ -z "$1" ] || [ -z "$2" ] && exit 1 sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \ -e 's/;.*$//' \ -e 's/[[:space:]]*$//' \ -e 's/^[[:space:]]*//' \ -e "s/^\(.*\)=\([^\"']*\)$/\1=\"\2\"/" \ < $1 \ | sed -n -e "/^\[$2\]/,/^\s*\[/{/^[^;].*\=.*/p;}" For python you have ConfigParser<http://docs.python.org/library/configparser.html>, PHP has parse_ini_file <http://fr3.php.net/parse_ini_file>, Perl (as per usual) has a dozen or so options<http://win32.perl.org/wiki/index.php?title=INI-file_Modules>then there's libini <http://sourceforge.net/projects/libini/> and its ilk. We need a way to group lines together: - INI style headers (e.g [decca5a5-8952-4004-9793-cdbbf05c3c63]) - ID prefixes (e.g. decca5a5-8952-4004-9793-cdbbf05c3c63.content.cpu.cores = 2) - Blank line separators, with ID specified as an attribute (e.g. id = decca5a5-8952-4004-9793-cdbbf05c3c63) Except in the case where you retrieve a single object this is always going to add parsing complexity... but perhaps it's worth it just for the (common) case of dealing with a single object. 2) Going through the keys and values in detail:
[decca5a5-8952-4004-9793-cdbbf05c3c63]
I like UUIDs and ElasticHosts also uses them, but I might loosen the requirement to any unique string of hex and dashes (since other vendors may prefer to number sequentially, etc.)
There's that "enough rope" problem again, and the alias option discussed elsewhere. Another (significant) bonus is that they allow you to migrate resources, collections or even merge entire clouds without re-mapping, breaking any object references, etc. There really is huge value here.
category = server title = Debian GNU/Linux 5.0 Virtual Appliance summary = Base installation of Debian GNU/Linux 5.0
Do we need both a title ('name' with ElasticHosts at present) and a summary or can we just have one of these?
Most collections tend to have an official title and an additional (optional) explanation. If you don't use it then that's fine too (actually the title/summary terminology comes from Atom).
content.cpu = 2 content.memory = 4Gb
We need to agree units here! Presumably memory would be specified in 'GB' or alternatively 'MB', 'kB' or nothing. Is CPU the speed quota or the number of virtual cores? I recommend cores=<integer> and an additional key for speed quota (ElasticHosts uses cpu=<total MHz to divide across all cores>)
Sure, or we just say everything's in bytes/megahertz/etc. and worry about how to render it in the UI (where it arguably belongs). Internally I'd say we should deal with raw numbers (that's how it will be represented in databases anyway) and do the mapping as close as possible to the surface. Defining units is (probably) acceptable though... assuming there's not a standard for this we can refer to (surely there is somewhere).
Can we cut the namespace and just write:
cores = 2 cpu = 4000MHz mem = 4GB
Dispensing with ambiguous terminology is a good idea, but the namespaces are actually quite important for e.g. extensibility.
link.disk[0].id = 4696b561-a253-42b4-bd27-7aa4950e0a60 link.disk[0].dev = sda link.network[0].id = 45a73b80-c957-4ae1-97c6-b70652eba1d1 link.network[0].dev = eth0
This is good - a mapping between hardware devices and uuids of the storage or network objects.
We don't need the [0] indices, since the 'dev' specifiers are already fully unique. Taking those out and cutting the namespace gives something like:
disk.sda = 4696b561-a253-42b4-bd27-7aa4950e0a60 network.eth0 = 45a73b80-c957-4ae1-97c6-b70652eba1d1
Good point and nice optimisation, but what if we want to capture other information like "starting state = disconnected" etc?
mc.state = RUNNING br.meter.rate = 0.10 br.meter.currency = USD br.meter.unit = hours br.meter.total = 35.27 pm.monitor.cpu = 75.2 pm.monitor.mem = 1059374258
All look reasonable, but again I would cut the namespaces:
state = RUNNING br.rate = 0.10 br.currency = SD br.unit = hours br.total = 35.27 pm.cpu = 75.2 pm.mem = 1059374258
Sure, namespaces within extensions can be safely dropped. Top level namespaces less so.
mc.ops.start = http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/start mc.ops.stop = http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/stop mc.ops.restart = http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/restart mc.ops.suspend = http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/suspend
Do we need these at all? Surely these will always be the operations which are possible on a RUNNING server, and so can always be constructed based on the UUID.
HATEOAS <http://www.stucharlton.com/blog/archives/000141.html> is a carry over from the Sun Cloud API (as explained by Sun here<http://blogs.sun.com/craigmcc/entry/why_hateoas>). I like it because from the single entry point you can obtain every URL you should ever need to use, and those that you can't you don't even see (e.g. because you can't "start" an abstract template, or simply because as a disaster recovery operator you're only allowed to start but not stop machines). If you don't like these you can always ignore them, but your users will probably get bored of receiving errors when they try to conduct invalid operations.
Also, why have 'ops' in the URLs? Why not just
http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/start
Interesting question. This was another carry over from Sun but a better approach is to leave it to the extension: http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/mc/start The question is, are you starting the machine? Its firewall? Billing? Backup? Failover? Disaster recovery?
[4696b561-a253-42b4-bd27-7aa4950e0a60]
I guess storage needs a 'title' (or 'name') too?
You're probably right... these are common for all resources.
category = storage content.size = 148251374
Why not just 'size'?
The "content" namespace is from Atom... it serves to bundle the "payload" of the resource together without interfering with other elements of it. OVF could well have a "title" for example, and what if your attribute clashes with the name of an extension? Let's try to keep the core nice and clean.
link.self = virtual-disk.vmdk
Not sure what this is?
It's a link to itself (e.g. a storage resource pointing at its VMDK). I'd suggest a pass over Atom (RFC 4287 <http://tools.ietf.org/html/rfc4287>) to see how links work (and how flexible they are).
[45a73b80-c957-4ae1-97c6-b70652eba1d1]
Again, maybe a 'name'?
No problem.
category = network content.vlan = 4095 content.dhcp = true content.subnet = 192.168.0.0 content.netmask = 255.255.0.0 content.gateway = 192.168.0.1
Once again, I'd take the 'content' prefix off all of these.
See above... we need to work out how/if this can be done safely (and whether it's worth doing).
The keys you list here work when the network interface is on a private VLAN, but are the wrong set when it is on the public internet.
It's just an example, but I do wonder how much detail we're going to want to get into here. We should probably support arbitrary attributes for whatever cruft the network guys want to carry (e.g. frame sizes, etc.) but treat it as opaque for now.
On the public internet, the cloud vendor, not the user, defines most of these parameters and need to be able to control the customer VM from "stealing" IPs from other customers.
The customer has access to a defined set of static IPs which they have purchased or alternatively a free dynamic IP assigned at boot, and all they should be able to specify is which of these they want on this particular interface, and whether they want to receive a DHCP for it.
For instance, ElasticHosts currently specifies as:
ip = <specified static IP address or 'auto' to assign dynamically at boot> dhcp = <ip address to send by dhcp or 'auto'; no dhcp if not present>
Given that the customer will have a set of static IPs which they have purchased (common concept across Amazon, ElasticHosts, GoGrid, etc.), the API also needs an ability for them to list what these are!
I would suggest that these be advertised in the "network" resources so the customer can choose one that's already allocated (assuming they don't just rely on DHCP for this). Another interesting use case incidentally is that of machines doing introspection - a machine (authenticated by IP?) should be able to hit OCCI for information about itself (such as its name? IP address? SSH keys? application configuration?). Even basic attribute-value pairs being settable via management interfaces would be incredibly powerful (and we get this for free already). Sam