On Tue, May 5, 2009 at 3:44 PM, Richard Davies <richard.davies@elastichosts.com> wrote:

> Amazon use a simple but sprawling XML based API
...

> there are a raft of intellectual property issues as well:

We definitely agree with you that OCCI can produce a better API than Amazon,
both in terms of IP issues and also a cleaner design. If I were Amazon and
wanted to play hardball, I would (a) allow Eucalyptus, etc to copy my API
for now (since it only helps me gain traction as a defacto standard), (b)
remain vague about IP for now, (c) not support any other API standards while
I remember the defacto standard and (d) later halt or charge fees to any
competition which becomes serious.

(a), (b) and (c) seem well executed to date ;-)

Indeed, and until such time as Amazon work out what they want to do with their APIs they're essentially a no-go-zone (for us at least).

> I am now 100% convinced that the best results are to be had with a variant
> of XML over HTTP

...

> support alternative formats including HTML, JSON and TXT via XML
> Stylesheets

As you're well aware, we're less in favour of XML. However support for
alternative formats with automatic cross-conversion makes all the versions
equivalently first-class citizens, which is good enough for us.

The XSLT convertors start with XML and convert to the other formats.
In practice, ElasticHosts will likely start with TXT, and convert from there
to the JSON, XML, etc - it would be great to see automatic convertors in
this opposite direction too, to validate that it can be done. Writing these
will also impose discipline and prevent creation of unnecessarily complex
datastructures, which is always a risk in XML.

Agreed, but one of the primary drivers for XML is the ease at which one can transform from it into $PREFERRED_FORMAT - the same cannot necessarily be said in reverse. That's not to say I won't have a crack at it, nor that it would necessarily be all that difficult, but there's no generic xsltjson (and things like BadgerFish) to do that mechanically for example.

> You can see the basics in action thanks to my Google App Engine reference
> implementation at http://occitest.appspot.com/ (as well as HTML, JSON and
> TXT versions of same)
>

> http://occitest.appspot.com/
> http://www.w3.org/2005/08/online_xslt/xslt?xslfile=http%3A%2F%2Focci.googlecode.com%2Fsvn%2Ftrunk%2Fxml%2Focci-to-html.xsl&xmlfile=http%3A%2F%2Foccitest.appspot.com
> http://www.w3.org/2005/08/online_xslt/xslt?xslfile=http%3A%2F%2Focci.googlecode.com%2Fsvn%2Ftrunk%2Fxml%2Focci-to-json.xsl&xmlfile=http%3A%2F%2Foccitest.appspot.com
> http://www.w3.org/2005/08/online_xslt/xslt?xslfile=http%3A%2F%2Focci.googlecode.com%2Fsvn%2Ftrunk%2Fxml%2Focci-to-text.xsl&xmlfile=http%3A%2F%2Foccitest.appspot.com

I'm not going to comment on the XML, since Enterprise XML design is not our
forte. I will go through the TEXT and JSON versions in some detail.

No problem - it's not that difficult to understand but I'm guessing a fairly introductory explanation would be a useful addition to the spec.

Here's a sample excerpt in XML:

> <entry>
> <id>urn:uuid:47bb7df8-587e-47fa-bd89-6f2f81c14b19</id>
> <title>Virtual Machine #1</title>
> <summary>Sample Compute Resource</summary>
> <updated>2009-05-04T09:52:37Z</updated>
> <link href="/47bb7df8-587e-47fa-bd89-6f2f81c14b19" />
> <link rel="http://purl.org/occi/storage#device" href="urn:uuid:4cc8cf62-69a4-4650-9e8c-7d4c516884df" title="Hard Drive"/>
> <link rel="http://purl.org/occi/network#interface" href="urn:uuid:dc88b244-145f-49e4-be7c-0880dcad42e9" title="Internet Connection"/>
> <link rel="http://purl.org/occi/network#interface" href="urn:uuid:253d83dd-e417-4e1f-9958-8c0a63120475" title="Private Network"/>
> </entry>

in JSON:
[I assume the '\n' and '\/' in the strings are mistakes. It'd also be good
if the XSLT-converted JSON had nice indentation, like I've produced below]

Yes, I was assuming that would go unnoticed but apparently people are paying attention... I was using TextMate to clean it up but the same could easy enough be done in the XSLT.

> {
> "id":"urn:uuid:47bb7df8-587e-47fa-bd89-6f2f81c14b19",
> "title":"Virtual\nMachine\n#1",
> "summary":"Sample\nCompute\nResource",
> "updated":"2009-05-04T09:52:37Z",
> "link":[
> {
> "href":"\/47bb7df8-587e-47fa-bd89-6f2f81c14b19"
> },
> {
> "rel":"http:\/\/purl.org\/occi\/storage#device",
> "href":"urn:uuid:4cc8cf62-69a4-4650-9e8c-7d4c516884df",
> "title":"Hard\nDrive"
> },
> {
> "rel":"http:\/\/purl.org\/occi\/network#interface",
> "href":"urn:uuid:dc88b244-145f-49e4-be7c-0880dcad42e9",
> "title":"Internet\nConnection"
> },
> {
> "rel":"http:\/\/purl.org\/occi\/network#interface",
> "href":"urn:uuid:253d83dd-e417-4e1f-9958-8c0a63120475",
> "title":"Private\nNetwork"
> }
> ]
> }

and in TXT:

> [47bb7df8-587e-47fa-bd89-6f2f81c14b19]
> title|Virtual Machine #1
> summary|Sample Compute Resource
> updated|2009-05-04T09:52:37Z
> link|||/47bb7df8-587e-47fa-bd89-6f2f81c14b19
> link|http://purl.org/occi/storage#device|Hard Drive|urn:uuid:4cc8cf62-69a4-4650-9e8c-7d4c516884df
> link|http://purl.org/occi/network#interface|Internet Connection|urn:uuid:dc88b244-145f-49e4-be7c-0880dcad42e9
> link|http://purl.org/occi/network#interface|Private Network|urn:uuid:253d83dd-e417-4e1f-9958-8c0a63120475
> etag|

Going through the text version...

> [47bb7df8-587e-47fa-bd89-6f2f81c14b19]

To simplify parsing even further, I'd write this as:

id|47bb7df8-587e-47fa-bd89-6f2f81c14b19

Right, Chris and I discussed this before and agreed that an INI format wasn't ideal - changing it is easy enough to do (even for an XSLT newbie like myself).

We also spoke about putting the ID on every line ala tinydns-data format (which is nice for multithreaded/asynchronous implementations which may need to pull data from billing systems, performance monitor counters, etc)... this is trivial to parse into a hash [of hashes] and a pleasure to work with.

Another thing to note is the use of the pipe (|) character as a separator since colons (:) are present in URLs... we may also be able to move to CSV which is something I believe GoGrid are already supporting. This is a detail that makes little difference though, and probably something we can agree on later.

and have the blank line also as the separator between objects

> title|Virtual Machine #1
> summary|Sample Compute Resource
> updated|2009-05-04T09:52:37Z

These look good - simple key-value pairs.

Presumably we'll also add some actual parameters of the virtual machine,
e.g.

smp|2
cpu|2000
mem|1024

Right, and then there's just the question of whether these go into the top level namespace or e.g. "compute:cores", "compute:arch", "compute:speed", "memory:size", etc. (I'm thinking burning a few more characters is worth it, and the mechanical transformations are easier then too... avoiding confusion e.g. "cpu" means speed or cores).

It would be good to work these into the examples, TXT, JSON and XML.

> link|||/47bb7df8-587e-47fa-bd89-6f2f81c14b19

It's good that we have an end-point for editing the object directly. I'd
make this a simple key-value rather than having blank fields:

link|/47bb7df8-587e-47fa-bd89-6f2f81c14b19

The blank fields are not [really] intentional, but there's a good reason for their existence. "link" is a generic Atom construct (see here for details) that can point back to the object, to an alternate (e.g. PDF) representation of it, to another object (e.g. a parent, child, sibling, etc.) or pretty much whatever we want (e.g. think screenshots, snapshots, build documentation, etc. - again, where a lot of the innovation will take place).

Perhaps we can move the URL to the front and only show the empty fields when there are gaps... need to get me a good XSLT book (if anyone has a PDF or even a recommendation I'd be happy to hear it).

> link|http://purl.org/occi/storage#device|Hard Drive|urn:uuid:4cc8cf62-69a4-4650-9e8c-7d4c516884df
> link|http://purl.org/occi/network#interface|Internet Connection|urn:uuid:dc88b244-145f-49e4-be7c-0880dcad42e9
> link|http://purl.org/occi/network#interface|Private Network|urn:uuid:253d83dd-e417-4e1f-9958-8c0a63120475

There are four fields here: 'link', a schema, a name and an object's UUID.
However there's an important one missing - the specification of _how_ the
other object is bound to the virtual machine (e.g. is the drive bound as IDE
or SCSI, and to which bus? Which network is which virtual NIC?).

Again "link" is a generic Atom construct that can apply to anything... usually it is used to point at the content of the object where none is embedded (remembering you can embed XML e.g. OVF or any other data including binary if need be).

This extra field will be needed in the XML too, e.g. as 'type':

<link type="ide:0:0" rel="http://purl.org/occi/storage#device" href="urn:uuid:4cc8cf62-69a4-4650-9e8c-7d4c516884df" title="Hard Drive"/>

One this is specified, the first three fields aren't actually necessary in
the TXT version, which can just be:

ide:0:0|4cc8cf62-69a4-4650-9e8c-7d4c516884df
nic:0|dc88b244-145f-49e4-be7c-0880dcad42e9
nic:1|253d83dd-e417-4e1f-9958-8c0a63120475

There's no need to specify the schema, since this is uniquely determined by
the link type (and in any case only relevant to the XML), no need to specify
'link', since it always is for this key, and no need to specify the name,
since that's the name of the object with the given UUID.

Yes, this makes a lot of sense, but it also means you've got logic in your transforms which is something I would very much like to avoid. We also start to impose on the format of device identifiers which doesn't necessarily work in large, complex and/or heterogeneous environments (you say eth0, I say en1 and others say "Local Area Connection"... and that's before you start talking about paravirtualised block devices and so on) and short of revising the spec we're at a dead end if implementors need to carry other information such as block or frame size. Human friendly "title"s are incredibly useful too by the way, and while one could use the title of the storage or network resource that doesn't always work (think shared networks and cluster storage).

I admit there's probably a better way to do it and to be honest I've not had a chance to spend a lot of time on the problem (nor am I an XSLT wizard, yet).

> etag|

What's this doing? Can it be deleted?

Nothing at the moment because I haven't worked out a sensible way to generate eTags, however this is part of the magic fairy dust that Google added to Atom to make GData and it's very important for performance/caching/proxying/gatewaying/concurrency/conflict resolution/etc. (but also usually safe to ignore for those who don't care about such things).

Putting these together, the original:

> [47bb7df8-587e-47fa-bd89-6f2f81c14b19]
> title|Virtual Machine #1
> summary|Sample Compute Resource
> updated|2009-05-04T09:52:37Z
> link|||/47bb7df8-587e-47fa-bd89-6f2f81c14b19
> link|http://purl.org/occi/storage#device|Hard Drive|urn:uuid:4cc8cf62-69a4-4650-9e8c-7d4c516884df
> link|http://purl.org/occi/network#interface|Internet Connection|urn:uuid:dc88b244-145f-49e4-be7c-0880dcad42e9
> link|http://purl.org/occi/network#interface|Private Network|urn:uuid:253d83dd-e417-4e1f-9958-8c0a63120475
> etag|

Turns into:

id|47bb7df8-587e-47fa-bd89-6f2f81c14b19
title|Virtual Machine #1
summary|Sample Compute Resource
updated|2009-05-04T09:52:37Z
smp|2
cpu|2000
mem|1024
link|/47bb7df8-587e-47fa-bd89-6f2f81c14b19
ide:0:0|4cc8cf62-69a4-4650-9e8c-7d4c516884df
nic:0|dc88b244-145f-49e4-be7c-0880dcad42e9
nic:1|253d83dd-e417-4e1f-9958-8c0a63120475

Which is shorter, simpler and describes the virtual server more fully (3
extra properties + the link attachment points).

Sure this looks nicer/neater for us humans but it's less flexible and makes more work for the machines (and more importantly, programmers thereof). We need to find a balance and I'll certainly be working with you and Chris to do that.

We can also render this same thing in JSON:

{
"id": "47bb7df8-587e-47fa-bd89-6f2f81c14b19",
"title": "Virtual Machine #1",
"summary": "Sample Compute Resource",
"updated": "2009-05-04T09:52:37Z",
"smp": 2,
"cpu": 2000,
"mem": 1024,
"link": "/47bb7df8-587e-47fa-bd89-6f2f81c14b19",
"ide:0:0": "4cc8cf62-69a4-4650-9e8c-7d4c516884df",
"nic:0": "dc88b244-145f-49e4-be7c-0880dcad42e9",
"nic:1": "253d83dd-e417-4e1f-9958-8c0a63120475"
}

Which again is shorter, simpler and more descriptive than the original.

Agreed.

Sam: please can you take these changes on board:
- update the XML example with the 3 extra properties, the link attachment
points and probably some actuators too (like start and stop)
- update the XSLT conversions to produce the improved TXT and JSON formats

No problem. I wanted to have the actuators in place already but also wanted to make them responsive (that is, so you can actually test your clients against this and so the general public can kick the tires).

Sam