Sam Johnston <samj@samj.net> writes:

> I'm leaning towards the time/space tradeoff of including the ID in each
> row somehow (in which case parsing into a hash of hashes is trivial
> again).

That works.

<snip>

> I've found the tinydns-data
> <http://cr.yp.to/djbdns/tinydns-data.html>format a pleasure to work
> with as well

Yes, DJB knows how to design a decent data format, and doesn't succumb to
the over-engineering fetish predominant as one moves up the software stack.
I wouldn't be upset by KEY:VALUE in place of KEY VALUE. That's also easily
parseable by read or strsep().

Ok so taking this a little further to tie off the formats discussion, combining the two ideas (tinydns-data w/ id on every line) gives us:

decca5a5-8952-4004-9793-cdbbf05c3c63:category:server

decca5a5-8952-4004-9793-cdbbf05c3c63:title:Debian GNU/Linux 5.0 Virtual Appliance

Having worked with this format for what... a decade now... I can tell you that it is an absolute dream and even things that weren't even conceived of at the time (e.g. SRV records) are easily supported... the whole while avoiding annoying/dangerous parsing problems due to greedy regexps (which are surprisingly common) and the like. It also allows us to cater for simple structures like arrays later if need be:

decca5a5-8952-4004-9793-cdbbf05c3c63:interfaces:eth0:eth1:eth2

Perhaps more importantly though it trivialises both generation and parsing of content by allowing you to do it in any order. This is particularly important for scalability (allowing for multiple threads querying mutliple servers and feeding back into a shared writer).

I think then that the formats discussion is pretty much done, at least for the time being. On with the verbs and nouns...

Sam