Re: [occi-wg] Syntax of OCCI API

17 Apr 2009

      Richard,

Just when you thought you had enough mail from me already it seems I missed
one...

On Thu, Apr 16, 2009 at 6:05 PM, Richard Davies <
richard.davies@elastichosts.com> wrote:
...
Sam Johnston wrote:
...
Here's a first pass at flattening the Atom into INI file format
(basically what you had but with "=" for human & computer readability):
Great stuff - I think this is a big step forward to be able to express
everything as a simple list of objects, each specified by simple key-value
pairs. Hopefully we can also similarly add a JSON version using the same
simple data structures, e.g.:
{"category":"server", "title":"Debian...", "mc.state":"running", ... }
JSON/YAML's on my todo list for this morning.
...
I've got two specific comments on the example you give:
1) I'm not sure INI format is actually the best text format for key-value.
I'd prefer something easier to parse from Unix shell, which is where I
imagine most simple scripts will be written. ElasticHosts went with
"key" (without spaces), <space>, "value" (any characters including spaces)
since this can be parsed with
cat file | while read key value ; do ... ; done
I've found the tinydns-data
<http://cr.yp.to/djbdns/tinydns-data.html>format a pleasure to work
with as well, but in any case INI files are
simple, standard across platforms, well defined, etc.

You can parse them in shell like
this<http://www.debian-administration.org/articles/55#comment_24>
:

#!/bin/sh
[ -z "$1" ] || [ -z "$2" ] && exit 1
sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
    -e 's/;.*$//' \
    -e 's/[[:space:]]*$//' \
    -e 's/^[[:space:]]*//' \
    -e "s/^\(.*\)=\([^\"']*\)$/\1=\"\2\"/" \
   < $1 \
   | sed -n -e "/^\[$2\]/,/^\s*\[/{/^[^;].*\=.*/p;}"

For python you have
ConfigParser<http://docs.python.org/library/configparser.html>,
PHP has parse_ini_file <http://fr3.php.net/parse_ini_file>, Perl (as per
usual) has a dozen or so
options<http://win32.perl.org/wiki/index.php?title=INI-file_Modules>then
there's
libini <http://sourceforge.net/projects/libini/> and its ilk.

We need a way to group lines together:

   - INI style headers (e.g  [decca5a5-8952-4004-9793-cdbbf05c3c63])
   - ID prefixes (e.g.
   decca5a5-8952-4004-9793-cdbbf05c3c63.content.cpu.cores = 2)
   - Blank line separators, with ID specified as an attribute (e.g. id =
   decca5a5-8952-4004-9793-cdbbf05c3c63)

Except in the case where you retrieve a single object this is always going
to add parsing complexity... but perhaps it's worth it just for the (common)
case of dealing with a single object.

2) Going through the keys and values in detail:
...
...
[decca5a5-8952-4004-9793-cdbbf05c3c63]
I like UUIDs and ElasticHosts also uses them, but I might loosen the
requirement to any unique string of hex and dashes (since other vendors may
prefer to number sequentially, etc.)
There's that "enough rope" problem again, and the alias option discussed
elsewhere. Another (significant) bonus is that they allow you to migrate
resources, collections or even merge entire clouds without re-mapping,
breaking any object references, etc. There really is huge value here.
...
...
category = server
 title = Debian GNU/Linux 5.0 Virtual Appliance
 summary = Base installation of Debian GNU/Linux 5.0
Do we need both a title ('name' with ElasticHosts at present) and a summary
or can we just have one of these?
Most collections tend to have an official title and an additional (optional)
explanation. If you don't use it then that's fine too (actually the
title/summary terminology comes from Atom).
...
...
content.cpu = 2
 content.memory = 4Gb
We need to agree units here! Presumably memory would be specified in 'GB'
or
alternatively 'MB', 'kB' or nothing. Is CPU the speed quota or the number
of
virtual cores? I recommend cores=<integer> and an additional key for speed
quota (ElasticHosts uses cpu=<total MHz to divide across all cores>)
Sure, or we just say everything's in bytes/megahertz/etc. and worry about
how to render it in the UI (where it arguably belongs). Internally I'd say
we should deal with raw numbers (that's how it will be represented in
databases anyway) and do the mapping as close as possible to the surface.
Defining units is (probably) acceptable though... assuming there's not a
standard for this we can refer to (surely there is somewhere).
...
Can we cut the namespace and just write:
cores = 2
cpu = 4000MHz
mem = 4GB
Dispensing with ambiguous terminology is a good idea, but the namespaces are
actually quite important for e.g. extensibility.
...
...
link.disk[0].id = 4696b561-a253-42b4-bd27-7aa4950e0a60
 link.disk[0].dev = sda
 link.network[0].id = 45a73b80-c957-4ae1-97c6-b70652eba1d1
 link.network[0].dev = eth0
This is good - a mapping between hardware devices and uuids of the storage
or network objects.
We don't need the [0] indices, since the 'dev' specifiers are already fully
unique. Taking those out and cutting the namespace gives something like:
disk.sda = 4696b561-a253-42b4-bd27-7aa4950e0a60
network.eth0 = 45a73b80-c957-4ae1-97c6-b70652eba1d1
Good point and nice optimisation, but what if we want to capture other
information like "starting state = disconnected" etc?
...
...
mc.state = RUNNING
 br.meter.rate = 0.10
 br.meter.currency = USD
 br.meter.unit = hours
 br.meter.total = 35.27
 pm.monitor.cpu = 75.2
 pm.monitor.mem = 1059374258
All look reasonable, but again I would cut the namespaces:
state = RUNNING
br.rate = 0.10
br.currency = SD
br.unit = hours
br.total = 35.27
pm.cpu = 75.2
pm.mem = 1059374258
Sure, namespaces within extensions can be safely dropped. Top level
namespaces less so.
...
...
mc.ops.start =
http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/start
 mc.ops.stop =
http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/stop
 mc.ops.restart =
http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/restart
 mc.ops.suspend =
http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/ops/suspend
Do we need these at all? Surely these will always be the operations which
are possible on a RUNNING server, and so can always be constructed based on
the UUID.
HATEOAS <http://www.stucharlton.com/blog/archives/000141.html> is a carry
over from the Sun Cloud API (as explained by Sun
here<http://blogs.sun.com/craigmcc/entry/why_hateoas>).
I like it because from the single entry point you can obtain every URL you
should ever need to use, and those that you can't you don't even see (e.g.
because you can't "start" an abstract template, or simply because as a
disaster recovery operator you're only allowed to start but not stop
machines).

If you don't like these you can always ignore them, but your users will
probably get bored of receiving errors when they try to conduct invalid
operations.
...
Also, why have 'ops' in the URLs? Why not just
http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/start
Interesting question. This was another carry over from Sun but a better
approach is to leave it to the extension:

http://example.com/decca5a5-8952-4004-9793-cdbbf05c3c63/mc/start

The question is, are you starting the machine? Its firewall? Billing?
Backup? Failover? Disaster recovery?
...
[4696b561-a253-42b4-bd27-7aa4950e0a60]
I guess storage needs a 'title' (or 'name') too?
You're probably right... these are common for all resources.
...
...
category = storage
 content.size = 148251374
Why not just 'size'?
The "content" namespace is from Atom... it serves to bundle the "payload" of
the resource together without interfering with other elements of it. OVF
could well have a "title" for example, and what if your attribute clashes
with the name of an extension? Let's try to keep the core nice and clean.
...
...
link.self = virtual-disk.vmdk
Not sure what this is?
It's a link to itself (e.g. a storage resource pointing at its VMDK). I'd
suggest a pass over Atom (RFC 4287 <http://tools.ietf.org/html/rfc4287>) to
see how links work (and how flexible they are).
...
...
[45a73b80-c957-4ae1-97c6-b70652eba1d1]
Again, maybe a 'name'?
No problem.
...
...
category = network
 content.vlan = 4095
 content.dhcp = true
 content.subnet = 192.168.0.0
 content.netmask = 255.255.0.0
 content.gateway = 192.168.0.1
Once again, I'd take the 'content' prefix off all of these.
See above... we need to work out how/if this can be done safely (and whether
it's worth doing).
...
The keys you list here work when the network interface is on a private
VLAN,
but are the wrong set when it is on the public internet.
It's just an example, but I do wonder how much detail we're going to want to
get into here. We should probably support arbitrary attributes for whatever
cruft the network guys want to carry (e.g. frame sizes, etc.) but treat it
as opaque for now.
...
On the public internet, the cloud vendor, not the user, defines most of
these parameters and need to be able to control the customer VM from
"stealing" IPs from other customers.
The customer has access to a defined set of static IPs which they have
purchased or alternatively a free dynamic IP assigned at boot, and all they
should be able to specify is which of these they want on this particular
interface, and whether they want to receive a DHCP for it.
For instance, ElasticHosts currently specifies as:
ip = <specified static IP address or 'auto' to assign dynamically at boot>
dhcp = <ip address to send by dhcp or 'auto'; no dhcp if not present>
Given that the customer will have a set of static IPs which they have
purchased (common concept across Amazon, ElasticHosts, GoGrid, etc.), the
API also needs an ability for them to list what these are!
I would suggest that these be advertised in the "network" resources so the
customer can choose one that's already allocated (assuming they don't just
rely on DHCP for this).

Another interesting use case incidentally is that of machines doing
introspection - a machine (authenticated by IP?) should be able to hit OCCI
for information about itself (such as its name? IP address? SSH keys?
application configuration?). Even basic attribute-value pairs being settable
via management interfaces would be incredibly powerful (and we get this for
free already).

Sam