Update of relational rendering of Glue 2.0 draft 33

Dear all, please find the latest rendering for the relational model (ref 33) here: https://twiki.cern.ch/twiki/bin/view/EGEE/ExampleGlue20SQL Unfortunatly, I didn't find time to update the examples on the page - but its on my list. Comments/Questions are appreciated. Cheers, Felix --- Felix Ehm IT-GD tel : +41 22 7674580 CERN, Switzerland -----------------------------------------

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Felix Nikolaus Ehm said: please find the latest rendering for the relational model (ref 33) here: https://twiki.cern.ch/twiki/bin/view/EGEE/ExampleGlue20SQL
Unfortunatly, I didn't find time to update the examples on the page - but its on my list.
Comments/Questions are appreciated.
Have you had comments from Steve Fisher or Antony Wilson, who did the R-GMA implementation [1] in the past? I think they're on the glue list but they may not be paying attention ... One comment is that your proposal to have all multivalued attributes in one huge table doesn't seem very good to me. Why do you rule out having separate tables for each attribute? That is basically how it's done in R-GMA at the moment. Stephen [1] See e.g. https://eg.nikhef.nl:8443/R-GMA/table.html

Hi Stephen, I put some comments/arguments for this (current) decision onto the relational model twiki page (https://twiki.cern.ch/twiki/bin/view/EGEE/ExampleGlue20SQL). Generally, there are two extrems for modelling a db schema: One is that you push everthing into one huge table with key-type-value colums (which is the most flexible solution) and the other one is to model all entites having every multivalued attributes/entity explicit in a child table. This implies a lot of joins over several tables which affects quite heavily the usability. The Endpoint in the current schema for example, would have another set of 4 child tables. For sure, if you want to model everything in detail and nicely normalized - this is the solution. I am currently not really convinced this is neccessary. Someone may tell me better.. Also, inserting/updating data for several tables is a more heavy-weight operation than for say, two tables. Consider that the DB systems needs to update more index files. I've spoken with some DB admins from CERN and they confirmed that look-ups within one table are quite (very) fast which supports the argument of putting some attributes into this ValueTable. I've done this concept with the current schema (1.3) as well and here the MultiValued table has around 120K entries (I would claim this as easy work for modern DB systems - e.g. Google uses MySQL in !much! larger scale). It has the advantage of allowing queries to be quite easy (less joins), providing some schema flexibility at the same time. At the time when I created this relational model (3 weeks ago :-) )we had more multivalued attributes (each resulting in a child table) in the schema. This fact, however, might have changed and I may reevaluate this decision. Thinking about this, I would propose to try a mix-solution: all multivalued attributes of one entity (say Endpoint) go into ONE child table (not 4). I would also like to influence Timo's experience into this decision since they currently try to write a info provider for the proposed schema. Cheers, Felix --- Felix Ehm IT-GD tel : +41 22 7674580 CERN, Switzerland -----------------------------------------
-----Original Message----- From: glue-wg-bounces@ogf.org [mailto:glue-wg-bounces@ogf.org] On Behalf Of Burke, S (Stephen) Sent: Montag, 14. April 2008 18:09 To: Felix Nikolaus Ehm; glue-wg@ogf.org Cc: Steve Fisher; a.j.wilson@rl.ac.uk; timo.baur@lrz-muenchen.de Subject: Re: [glue-wg] Update of relational rendering of Glue 2.0 draft 33
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Felix Nikolaus Ehm said: please find the latest rendering for the relational model (ref 33) here: https://twiki.cern.ch/twiki/bin/view/EGEE/ExampleGlue20SQL
Unfortunatly, I didn't find time to update the examples on
glue-wg-bounces@ogf.org the page -
but its on my list.
Comments/Questions are appreciated.
Have you had comments from Steve Fisher or Antony Wilson, who did the R-GMA implementation [1] in the past? I think they're on the glue list but they may not be paying attention ...
One comment is that your proposal to have all multivalued attributes in one huge table doesn't seem very good to me. Why do you rule out having separate tables for each attribute? That is basically how it's done in R-GMA at the moment.
Stephen
[1] See e.g. https://eg.nikhef.nl:8443/R-GMA/table.html _______________________________________________ glue-wg mailing list glue-wg@ogf.org http://www.ogf.org/mailman/listinfo/glue-wg

Felix Nikolaus Ehm [mailto:Felix.Ehm@cern.ch] said:
the other one is to model all entites having every multivalued attributes/entity explicit in a child table. This implies a lot of joins over several tables which affects quite heavily the usability. The Endpoint in the current schema for example, would have another set of 4 child tables.
That depends on what queries you make. Three of those (WSDL, SupportedProfile and Semantics) aren't things you would query on, so the only real issue is Capability. In that case I'd think it would be less efficient to query via a huge table with everything in it than to join with a specific EndpointCapability table. At a quick look through the current draft I don't see anything else likely to cause problems in practice. Actually the most difficult queries may be on the AccessPolicy, depending on how complex we make it, but that isn't directly relevant to this argument as it's a separate object anyway.
Also, inserting/updating data for several tables is a more heavy-weight operation than for say, two tables. Consider that the DB systems needs to update more index files.
I'm not terribly convinced by that - GLUE schema data isn't all that big by database standards, so I don't see why the technology shouldn't be able to cope. Compare with e.g. the LFC which may have to deal with hundreds of millions of files. At worst we may have thousands of endpoints with O(10) multivalued attributes. Stephen

glue-wg-bounces@ogf.org
[mailto:glue-wg-bounces@ogf.org] On Behalf Of Burke, S (Stephen) said: That depends on what queries you make. Three of those (WSDL, SupportedProfile and Semantics) aren't things you would query on, so the only real issue is Capability.
Another point is the question of dynamic vs static attributes. All of those are static, i.e. they would change only when the site is reconfigured O(1/month?), and probably that's true of most multivalued attributes, so speed of insertion isn't relevant if you have a system that only updates things when necessary. Stephen
participants (2)
-
Burke, S (Stephen)
-
Felix Nikolaus Ehm