Richard,

Just when you thought you had enough mail from me already it seems I missed one...

On Thu, Apr 16, 2009 at 6:05 PM, Richard Davies <richard.davies@elastichosts.com> wrote:
Sam Johnston wrote:
> Here's a first pass at flattening the Atom into INI file format
> (basically what you had but with "=" for human & computer readability):

Great stuff - I think this is a big step forward to be able to express
everything as a simple list of objects, each specified by simple key-value
pairs. Hopefully we can also similarly add a JSON version using the same
simple data structures, e.g.:

{"category":"server", "title":"Debian...", "mc.state":"running", ... }

JSON/YAML's on my todo list for this morning.
 
I've got two specific comments on the example you give:

1) I'm not sure INI format is actually the best text format for key-value.
I'd prefer something easier to parse from Unix shell, which is where I
imagine most simple scripts will be written. ElasticHosts went with

 "key" (without spaces), <space>, "value" (any characters including spaces)

since this can be parsed with

 cat file | while read key value ; do ... ; done

I've found the tinydns-data format a pleasure to work with as well, but in any case INI files are simple, standard across platforms, well defined, etc.

You can parse them in shell like this:
#!/bin/sh
[ -z "$1" ] || [ -z "$2" ] && exit 1
sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
-e 's/;.*$//' \
-e 's/[[:space:]]*$//' \
-e 's/^[[:space:]]*//' \
-e "s/^\(.*\)=\([^\"']*\)$/\1=\"\2\"/" \
< $1 \
| sed -n -e "/^\[$2\]/,/^\s*\[/{/^[^;].*\=.*/p;}"
For python you have ConfigParser, PHP has parse_ini_file, Perl (as per usual) has a dozen or so options then there's libini and its ilk.

We need a way to group lines together:
Except in the case where you retrieve a single object this is always going to add parsing complexity... but perhaps it's worth it just for the (common) case of dealing with a single object.

2) Going through the keys and values in detail:

>  [decca5a5-8952-4004-9793-cdbbf05c3c63]

I like UUIDs and ElasticHosts also uses them, but I might loosen the
requirement to any unique string of hex and dashes (since other vendors may
prefer to number sequentially, etc.)

There's that "enough rope" problem again, and the alias option discussed elsewhere. Another (significant) bonus is that they allow you to migrate resources, collections or even merge entire clouds without re-mapping, breaking any object references, etc. There really is huge value here.
 
>  category = server
>  title = Debian GNU/Linux 5.0 Virtual Appliance
>  summary = Base installation of Debian GNU/Linux 5.0

Do we need both a title ('name' with ElasticHosts at present) and a summary
or can we just have one of these?

Most collections tend to have an official title and an additional (optional) explanation. If you don't use it then that's fine too (actually the title/summary terminology comes from Atom).
 
>  content.cpu = 2
>  content.memory = 4Gb

We need to agree units here! Presumably memory would be specified in 'GB' or
alternatively 'MB', 'kB' or nothing. Is CPU the speed quota or the number of
virtual cores? I recommend cores=<integer> and an additional key for speed
quota (ElasticHosts uses cpu=<total MHz to divide across all cores>)

Sure, or we just say everything's in bytes/megahertz/etc. and worry about how to render it in the UI (where it arguably belongs). Internally I'd say we should deal with raw numbers (that's how it will be represented in databases anyway) and do the mapping as close as possible to the surface. Defining units is (probably) acceptable though... assuming there's not a standard for this we can refer to (surely there is somewhere).
 
Can we cut the namespace and just write:

cores = 2
cpu = 4000MHz
mem = 4GB

Dispensing with ambiguous terminology is a good idea, but the namespaces are actually quite important for e.g. extensibility.
 
>  link.disk[0].id = 4696b561-a253-42b4-bd27-7aa4950e0a60
>  link.disk[0].dev = sda
>  link.network[0].id = 45a73b80-c957-4ae1-97c6-b70652eba1d1
>  link.network[0].dev = eth0

This is good - a mapping between hardware devices and uuids of the storage
or network objects.

We don't need the [0] indices, since the 'dev' specifiers are already fully
unique. Taking those out and cutting the namespace gives something like:

disk.sda = 4696b561-a253-42b4-bd27-7aa4950e0a60
network.eth0 = 45a73b80-c957-4ae1-97c6-b70652eba1d1

Good point and nice optimisation, but what if we want to capture other information like "starting state = disconnected" etc?
 
>  mc.state = RUNNING
>  br.meter.rate = 0.10
>  br.meter.currency = USD
>  br.meter.unit = hours
>  br.meter.total = 35.27
>  pm.monitor.cpu = 75.2
>  pm.monitor.mem = 1059374258

All look reasonable, but again I would cut the namespaces:

state = RUNNING
br.rate = 0.10
br.currency = SD
br.unit = hours
br.total = 35.27
pm.cpu = 75.2
pm.mem = 1059374258

Sure, namespaces within extensions can be safely dropped. Top level namespaces less so.
 
Do we need these at all? Surely these will always be the operations which
are possible on a RUNNING server, and so can always be constructed based on
the UUID.

HATEOAS is a carry over from the Sun Cloud API (as explained by Sun here). I like it because from the single entry point you can obtain every URL you should ever need to use, and those that you can't you don't even see (e.g. because you can't "start" an abstract template, or simply because as a disaster recovery operator you're only allowed to start but not stop machines).

If you don't like these you can always ignore them, but your users will probably get bored of receiving errors when they try to conduct invalid operations.
 
Also, why have 'ops' in the URLs? Why not just

http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/start

Interesting question. This was another carry over from Sun but a better approach is to leave it to the extension:

http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/mc/start

The question is, are you starting the machine? Its firewall? Billing? Backup? Failover? Disaster recovery?

>  [4696b561-a253-42b4-bd27-7aa4950e0a60]

I guess storage needs a 'title' (or 'name') too?

You're probably right... these are common for all resources.
 
>  category = storage
>  content.size = 148251374

Why not just 'size'?

The "content" namespace is from Atom... it serves to bundle the "payload" of the resource together without interfering with other elements of it. OVF could well have a "title" for example, and what if your attribute clashes with the name of an extension? Let's try to keep the core nice and clean.
 
>  link.self = virtual-disk.vmdk

Not sure what this is?

It's a link to itself (e.g. a storage resource pointing at its VMDK). I'd suggest a pass over Atom (RFC 4287) to see how links work (and how flexible they are).
 
>  [45a73b80-c957-4ae1-97c6-b70652eba1d1]

Again, maybe a 'name'?

No problem.
 
>  category = network
>  content.vlan = 4095
>  content.dhcp = true
>  content.subnet = 192.168.0.0
>  content.netmask = 255.255.0.0
>  content.gateway = 192.168.0.1

Once again, I'd take the 'content' prefix off all of these.

See above... we need to work out how/if this can be done safely (and whether it's worth doing).
 
The keys you list here work when the network interface is on a private VLAN,
but are the wrong set when it is on the public internet.

It's just an example, but I do wonder how much detail we're going to want to get into here. We should probably support arbitrary attributes for whatever cruft the network guys want to carry (e.g. frame sizes, etc.) but treat it as opaque for now.
 
On the public internet, the cloud vendor, not the user, defines most of
these parameters and need to be able to control the customer VM from
"stealing" IPs from other customers.

The customer has access to a defined set of static IPs which they have
purchased or alternatively a free dynamic IP assigned at boot, and all they
should be able to specify is which of these they want on this particular
interface, and whether they want to receive a DHCP for it.

For instance, ElasticHosts currently specifies as:

ip = <specified static IP address or 'auto' to assign dynamically at boot>
dhcp = <ip address to send by dhcp or 'auto'; no dhcp if not present>

Given that the customer will have a set of static IPs which they have
purchased (common concept across Amazon, ElasticHosts, GoGrid, etc.), the
API also needs an ability for them to list what these are!

I would suggest that these be advertised in the "network" resources so the customer can choose one that's already allocated (assuming they don't just rely on DHCP for this).

Another interesting use case incidentally is that of machines doing introspection - a machine (authenticated by IP?) should be able to hit OCCI for information about itself (such as its name? IP address? SSH keys? application configuration?). Even basic attribute-value pairs being settable via management interfaces would be incredibly powerful (and we get this for free already).

Sam