
Weijian Fang wrote:
We are looking at GLUE2 XML realizations, i.e., the official but a little obsolete one (http://schemas.ogf.org/glue/2008/05/spec_2.0_d42_r01) and the NorduGrid one (http://svn.nordugrid.org/trac/nordugrid/browser/arc1/trunk/doc/tech_doc/info.... (We are also aware of the TeraGrid GLUE2 XML schema.)
The official schema and the NorduGrid schema are similar. Both define only a single XML element: <Domains>. And all the other entities are defined as XML types instead of elements. Thus they can be included in <Domains> but can never stand on their own. Therefore, under this design, in order to update a single piece of information, for instance, Domains/AdminDomain/Services/ComputingService/RunningJobs, one has to re-publish the whole AdminDomain.
Our observation is that the current design of GLUE2 XML schema is not optimised for updating part of the information. Is this because updating part of the information is never an intended usage pattern of GLUE? A validity attribute is defined for each entity. We assume the intended usage pattern of GLUE information model (including the XML realization) is to PERIODICALLY publish ALL the information once the validity period expires. Are we correct? Many thanks!
I only have experience with the LDAP schema, but it may also be relevant in this discussion: yes, we periodically re-calculate and publish all the information. For the EGEE/WLCG information system the amount of information that the resources of one site need to provide is not very large. That information is collected and served by the site's information system endpoint ("site BDII"). A grid-wide information service instance ("top BDII") collects and serves the information from all such endpoints. For EGEE/WLCG the combined information currently amounts to 62 MB from 354 sites and is updated every few minutes. Middleware clients execute queries that are optimized to return as little superfluous information as possible. The information services have indices on popular attributes, to speed up the processing of the client queries. Does the XML schema hamper such a strategy?